Back to Search Start Over

USC-DCT: A Collection of Diverse Classification Tasks

Authors :
Adam M. Jones
Gozde Sahin
Zachary W. Murdock
Yunhao Ge
Ao Xu
Yuecheng Li
Di Wu
Shuo Ni
Po-Hsuan Huang
Kiran Lekkala
Laurent Itti
Source :
Data, Vol 8, Iss 10, p 153 (2023)
Publication Year :
2023
Publisher :
MDPI AG, 2023.

Abstract

Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.

Details

Language :
English
ISSN :
23065729
Volume :
8
Issue :
10
Database :
Directory of Open Access Journals
Journal :
Data
Publication Type :
Academic Journal
Accession number :
edsdoj.ffcfdf6c11474d1ab95939ca9f44423d
Document Type :
article
Full Text :
https://doi.org/10.3390/data8100153