Back to Search
Start Over
SCIM: universal single-cell matching with unpaired feature sets
- Source :
- Bioinformatics, Bioinformatics, 36 (S2)
- Publication Year :
- 2020
-
Abstract
- Motivation: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. Results: We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively.<br />Bioinformatics, 36 (S2)<br />ISSN:1367-4803<br />ISSN:1460-2059
- Subjects :
- Statistics and Probability
1303 Biochemistry
AcademicSubjects/SCI01060
Computer science
610 Medicine & health
computer.software_genre
Biochemistry
03 medical and health sciences
0302 clinical medicine
Text mining
1312 Molecular Biology
1706 Computer Science Applications
Humans
Profiling (information science)
2613 Statistics and Probability
Molecular Biology
030304 developmental biology
Data
0303 health sciences
Sequence Analysis, RNA
business.industry
Gene Expression Profiling
Autoencoder
Computer Science Applications
Computational Mathematics
Computational Theory and Mathematics
10032 Clinic for Oncology and Hematology
Bipartite graph
Data mining
Single-Cell Analysis
business
computer
2605 Computational Mathematics
Algorithms
Software
030217 neurology & neurosurgery
Data integration
1703 Computational Theory and Mathematics
Subjects
Details
- Language :
- English
- ISSN :
- 13674803 and 14602059
- Database :
- OpenAIRE
- Journal :
- Bioinformatics, Bioinformatics, 36 (S2)
- Accession number :
- edsair.doi.dedup.....4925abb7a5d25d3c8452a01e981de017