Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for recommending the best hierarchical classification algorithm to a hierarchical classification dataset. This work’s contributions are: 1) proposing an algorithm for splitting hierarchical datasets into new datasets to increase the number of meta-instances, 2) proposing meta-features for hierarchical classification, and 3) interpreting decision-tree meta-models for hierarchical classification algorithm recommendation., {"references":["L. Schietgat, C. Vens, J. Struyf, H. Blockeel, D. Kocev, and S. Dzeroski,\n\"Predicting gene function using hierarchical multi-label decision tree\nensembles.\" BMC Bioinformatics, vol. 11, no. 2, pp. 1–14, Jan. 2010.","D. Delen, G. Walker, and A. Kadam, \"Predicting breast cancer\nsurvivability: a comparison of three data mining methods,\" Artificial\nIntelligence in Medicine, vol. 34, no. 2, pp. 113–127, 2005.","C. N. Silla Jr. and A. A. Freitas, \"A Survey of Hierarchical Classification\nAcross Different Application Domains,\" Data Mining and Knowledge\nDiscovery, vol. 44, no. 1-2, pp. 31–72, 2011.","P. Brazdil, C. G. Carrier, C. Soares, and R. Vilalta, Metalearning:\nApplications to data mining. Springer, 2008.","C. Vens, L. Schietgat, J. Struyf, H. Blockeel, and D. Kocev, \"Predicting\nGene Function using Predictive Clustering Trees,\" BMC Bioinformatics,\nvol. 11, no. 2, pp. 1–25, 2010.","D. Koller and M. Sahami, \"Hierarchically Classifying Documents Using\nVery Few Words,\" in Proceedings of the 14th International Conference\non Machine Learning, ser. ICML '97. San Francisco, CA, USA:\nMorgan Kaufmann Publishers Inc., 1997, pp. 170—-178.","M. A. Harris, J. Clark, A. Ireland, J. Lomax et al., \"The Gene Ontology\n(GO) database and informatics resource.\" Nucleic Acids Research,\nvol. 32, pp. D258–61, Jan. 2004.","H. Blockeel, M. Bruynooghe, S. Dzeroski, J. Ramon, and J. Struyf,\n\"Hierarchical Multi-Classification,\" in Proceedings of the ACM SIGKDD\n2002 workshop on multi-relational data mining (MRDM 2002), 2002,\npp. 21–35.","C. Vens, J. Struyf, L. Schietgat, S. Dzeroski, and H. Blockeel, \"Decision\nTrees for Hierarchical Multi-label Classification,\" Machine Learning,\nvol. 73, no. 2, pp. 185–214, Aug. 2008.\n[10] F. Fabris and A. A. Freitas, \"Dependency Network Methods\nfor Hierarchical Multi-label Classification of Gene Functions,\"\nin Proceedings of the 2014 IEEE International Conference on\nComputational Intelligence and Data Mining, Orlando, Florida, Dec.\n2014, pp. 241–248.\n[11] F. Fabris, A. Freitas, and J. Tullet, \"An Extensive Empirical\nComparison of Probabilistic Hierarchical Classifiers in Datasets of\nAgeing-Related Genes,\" IEEE/ACM transactions on computational\nbiology and bioinformatics/IEEE, ACM, pp. 1–14, dec 2015. [Online].\nAvailable: http://europepmc.org/abstract/MED/26661786\n[12] F. Fabris and A. A. Freitas, \"A Novel Extended Hierarchical Dependence\nNetwork Method Based on non-Hierarchical Predictive Classes and\nApplications to Ageing-Related Data,\" in Proceedings of the 2015\nIEEE 27th International Conference on Tools with Artificial Intelligence\n(ICTAI). IEEE, 2015, pp. 294–301.\n[13] L. d. C. Merschmann and A. A. Freitas, \"An Extended Local\nHierarchical Classifier for Prediction of Protein and Gene Functions,\"\nin Data Warehousing and Knowledge Discovery, ser. Lecture Notes in\nComputer Science. Springer, 2013, vol. 8057, pp. 159–171.\n[14] A. A. Freitas, \"Comprehensible Classification Models - a position\npaper,\" ACM SIGKDD Explor. Newsl., vol. 15, no. 1, pp. 1–10, 2014.\n[15] A. Vellido, J. D. Mart´ın-Guerrero, and P. J. Lisboa, \"Making machine\nlearning models interpretable,\" in In Proc. European Symposium on\nArtificial Neural Networks, Computational Intelligence and Machine\nLearning, vol. 12, 2012, pp. 163–172.\n[16] K. Boyd, K. H. Eng, and C. D. Page, \"Area Under the Precision-Recall\nCurve: Point Estimates and Confidence Intervals,\" in Machine Learning\nand Knowledge Discovery in Databases, ser. Lecture Notes in Computer\nScience. Springer, 2013, vol. 8190, pp. 451–466.\n[17] Y. Peng, P. A. Flach, C. Soares, and P. B. Brazdil, \"Improved dataset\ncharacterisation for meta-learning,\" ser. Lecture Notes in Computer\nScience. Springer, 2002, vol. 2534, pp. 141–152.\n[18] R. Leite and Pavel Brazdil, \"Active Testing Strategy to Predict the\nBest Classification Algorithm via Sampling and Meta-Learning,\" in\nProceedings of the 2010 conference on ECAI 2010: 19th European\nConference on Artificial Intelligence. IOS Press, 2010, pp. 309–314.\n[19] Q. Sun and B. Pfahringer, \"Pairwise meta-rules for better\nmeta-learning-based algorithm ranking,\" Machine Learning, vol. 93,\nno. 1, pp. 141–161, jul 2013.\n[20] J. N. van Rijn, S. M. Abdulrahman, P. Brazdil, and J. Vanschoren, \"Fast\nalgorithm selection using learning curves,\" in International Symposium\non Intelligent Data Analysis. Springer, 2015, pp. 298–309.\n[21] R. Leite, P. Brazdil, and J. Vanschoren, \"Selecting classification\nalgorithms with active testing,\" in Machine Learning and Data Mining\nin Pattern Recognition, ser. Lecture Notes in Computer Science, 2012,\nvol. 7376, pp. 117–131.\n[22] S. M. Abdulrahman and P. Brazdil, \"Measures for combining accuracy\nand time for meta-learning,\" in Proceedings of the 2014 International\nConference on Meta-learning and Algorithm Selection (MLAS'14), vol.\n1201, 2014, pp. 49–50.\n[23] I. Partalas, R. Babbar, E. Gaussier, and C. Amblard, \"Adaptive classifier\nselection in large-scale hierarchical classification,\" in Lecture Notes in\nComputer Science, vol. 7665, no. 3, 2012, pp. 612–619.\n[24] G. Tsoumakas, I. Katakis, and I. Vlahavas, \"Mining Multi-label Data,\"\nin Data Mining and Knowledge Discovery Handbook, O. Maimon and\nL. Rokach, Eds., 2010, pp. 667–685.\n[25] A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani et al., \"The\nFunCat, a functional annotation scheme for systematic classification of\nproteins from whole genomes,\" Nucleic Acids Research, vol. 32, no. 18,\npp. 5539–5545, 2004.\n[26] R. Tacutu, T. Craig, A. Budovsky, D. Wuttke, G. Lehmann,\nD. Taranukha, J. Costa, V. E. Fraifeld, and J. a. P. de Magalh˜aes,\n\"Human Ageing Genomic Resources: integrated databases and tools for\nthe biology and genetics of ageing.\" Nucleic Acids Research, vol. 41,\nno. Database issue, pp. D1027–D1033, Jan. 2013.\n[27] F. Fabris and A. A. Freitas, \"New KEGG pathway-based interpretable\nfeatures for classifying ageing-related mouse proteins,\" Bioinformatics,\nvol. 32, no. 19, pp. 2988–2995, jun 2016.\n[28] \"HMC Software and Datasets,\" https://dtai.cs.kuleuven.be/clus/\nhmcdatasets/, accessed: 2016-09-23.\n[29] \"Other Bioinformatics Datasets, including ageing-related datasets with\nGO and FunCat classes,\" https://www.cs.kent.ac.uk/people/rpg/ff79/\nFabris Datasets.tar.gz, accessed: 2016-09-23.\n[30] M. Lichman, \"UCI machine learning repository\nhttp://archive.ics.uci.edu/ml,\" 2013. [Online]. Available: http:\n//archive.ics.uci.edu/ml\n[31] B. E. Boser, I. M. Guyon, and V. N. Vapnik, \"A training algorithm for\noptimal margin classifiers,\" in Proceedings of the Fifth Annual Workshop\non Computational Learning Theory, ser. COLT '92. New York, NY,\nUSA: ACM, 1992, pp. 144–152.\n[32] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco,\nCA, USA: Morgan Kaufmann Publishers Inc., 1993.\n[33] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning\nTools and Techniques with Java Implementations. San Francisco, CA,\nUSA: Morgan Kaufmann Publishers Inc., 2000.\n[34] C.-C. Chang and C.-J. Lin, \"LIBSVM: A library for support vector\nmachines,\" ACM Transactions on Intelligent Systems and Technology,\nvol. 2, no. 3, pp. 1–27, 2011.\n[35] T. D. Gautheir, \"Detecting Trends Using Spearman's Rank Correlation\nCoefficient,\" Environmental Forensics, vol. 2, no. 4, pp. 359–362, 2001.\n[36] P. B. Brazdil, C. Soares, and J. P. Da Costa, \"Ranking learning\nalgorithms: Using IBL and meta-learning on accuracy and time results,\"\nMachine Learning, vol. 50, no. 3, pp. 251–277, 2003.\n[37] J. Demsar, \"Statistical Comparisons of Classifiers over Multiple Data\nSets,\" Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006."]}