1. Efficient Detection of Soft Concatenation Mapping.
- Author
-
Liu, Hao, Xiao, Jiang, Tan, Haoyu, Luo, Qiong, Zhao, Jintao, and Ni, Lionel M.
- Subjects
DATA warehousing ,DATA mining ,BIG data ,DATA compression ,EMAIL ,DATA integration - Abstract
In modern big data warehouse systems, we observe a common phenomenon that a column of data values can be derived from one or several other columns by transforming and concatenating these columns. We call this relationship between columns a Soft Concatenation Mapping (SCM). SCMs imply significant redundancy in the schema or data, and therefore can be exploited for data integration or data compression. In this paper, we formalize the problem of SCM detection and prove it is NP-hard. We then propose efficient approximate algorithms to detect all SCMs or an optimal set of SCMs in a table. Our experiments on both real-world and synthetic datasets show promising results. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF