Back to Search
Start Over
Probabilistic latent variable models for unsupervised many-to-many object matching
- Source :
- Information Processing & Management. 52:682-697
- Publication Year :
- 2016
- Publisher :
- Elsevier BV, 2016.
-
Abstract
- We propose a probabilistic model for matching clusters in different domains without correspondence information.The proposed method can handle data with more than two domains, and the number of objects in each domain can be different.We extend the proposed method for a semi-supervised setting.We demonstrate that the proposed method achieve better matching performance than existing methods using synthetic and real-world data sets. Object matching is an important task for finding the correspondence between objects in different domains, such as documents in different languages and users in different databases. In this paper, we propose probabilistic latent variable models that offer many-to-many matching without correspondence information or similarity measures between different domains. The proposed model assumes that there is an infinite number of latent vectors that are shared by all domains, and that each object is generated from one of the latent vectors and a domain-specific projection. By inferring the latent vector used for generating each object, objects in different domains are clustered according to the vectors that they share. Thus, we can realize matching between groups of objects in different domains in an unsupervised manner. We give learning procedures of the proposed model based on a stochastic EM algorithm. We also derive learning procedures in a semi-supervised setting, where correspondence information for some objects are given. The effectiveness of the proposed models is demonstrated by experiments on synthetic and real data sets.
- Subjects :
- Matching (statistics)
02 engineering and technology
Latent variable
Library and Information Sciences
Management Science and Operations Research
computer.software_genre
01 natural sciences
010104 statistics & probability
Expectation–maximization algorithm
0202 electrical engineering, electronic engineering, information engineering
Media Technology
0101 mathematics
Latent variable model
Mathematics
Probabilistic latent semantic analysis
business.industry
Probabilistic logic
Pattern recognition
Object (computer science)
Mixture model
Computer Science Applications
020201 artificial intelligence & image processing
Data mining
Artificial intelligence
business
computer
Information Systems
Subjects
Details
- ISSN :
- 03064573
- Volume :
- 52
- Database :
- OpenAIRE
- Journal :
- Information Processing & Management
- Accession number :
- edsair.doi...........0b0db35df52f88a939a3ade2c5bdf1fd
- Full Text :
- https://doi.org/10.1016/j.ipm.2015.12.013