1. Machado: open source genomics data integration framework
- Author
-
Mudadu, Mauricio de Alvarenga, Zerlotini, Adhemar, MAURICIO DE ALVARENGA MUDADU, CNPTIA, and ADHEMAR ZERLOTINI NETO, CNPTIA.
- Subjects
0106 biological sciences ,Computer science ,AcademicSubjects/SCI02254 ,Health Informatics ,Genomics ,computer.software_genre ,01 natural sciences ,Transcriptome ,World Wide Web ,Chado ,03 medical and health sciences ,Annotation ,Software ,Databases, Genetic ,Technical Note ,Web navigation ,database ,030304 developmental biology ,computer.programming_language ,0303 health sciences ,Genome ,business.industry ,Database schema ,Python (programming language) ,Multiomics ,Computer Science Applications ,Generic Model Organism Database ,Search box ,Dados genômicos ,AcademicSubjects/SCI00960 ,Base de Dados ,Haystack ,business ,computer ,multiomics ,Python ,010606 plant biology & botany ,Data integration - Abstract
BackgroundGenome projects and multiomics experiments generate huge volumes of data that must be stored, mined and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for over a decade and have been implementing software libraries, toolkits, platforms, and databases to succeed in this matter. The GMOD’s (Generic Model Organism Database project) biological relational database schema, known as Chado, is one of the few successful open source initiatives, it is widely adopted and many softwares are able to connect to it.ResultsWe have been developing an open source software named Machado (https://github.com/lmb-embrapa/machado), a genomics data integration framework implemented in Python, to enable research groups to both store and browse, query, and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on the top of already existing databases. It has several data loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL and LSTrAP. There is an API to connect to JBrowse and a web browsing visualisation tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a google-like search i.e. single auto-complete search box that provides fast results and incremental filters.ConclusionMachado aims to be a modern object-relational framework that uses the latests Python libraries to produce an effective open source resource for genomics research.
- Published
- 2020
- Full Text
- View/download PDF