1. A block-based support vector machine approach to the protein homology prediction task in KDD Cup 2004
- Author
-
Junfa Liu, Rui-Xiang Sun, Chunli Wang, Yan Fu, Si-Min He, Haipeng Wang, Qiang Yang, Wen Gao, and Shiguang Shan
- Subjects
Training set ,Computer science ,business.industry ,Geography, Planning and Development ,Supervised learning ,Machine learning ,computer.software_genre ,Task (project management) ,Support vector machine ,Statistical classification ,ComputingMethodologies_PATTERNRECOGNITION ,Key (cryptography) ,General Earth and Planetary Sciences ,Protein homology ,Artificial intelligence ,Data mining ,business ,computer ,Water Science and Technology ,Block (data storage) - Abstract
This paper describes our solution for the protein homology prediction task in KDD Cup 2004 competition. This task is modeled as a supervised learning problem with multiple performance metrics. Several key characteristics make the problem both novel and challenging, including the concept of data blocks and the presence of large-scale and imbalanced training data. These features make a naive application of the traditional classification algorithms infeasible. Our approach focuses on making full use of the abundant information within the blocks, and developing a new technique for reducing and balancing training data to make the support vector machine applicable to this kind of large-scale and imbalanced learning tasks.
- Published
- 2004
- Full Text
- View/download PDF