10 results on '"Mattia Prosperi"'
Search Results
2. Annotations of Virus Data for Knowledge Enrichment
- Author
-
Patrizia Vizza, Giuseppe Tradigo, Pietro H. Guzzi, Barbara Puccio, Mattia Prosperi, Carlo Torti, and Pierangelo Veltri
- Published
- 2022
- Full Text
- View/download PDF
3. Experimental Survey on Power Dissipation of k-mer-Handling Data Structures for Mobile Bioinformatics
- Author
-
Franco Milicchio and Mattia Prosperi
- Published
- 2021
- Full Text
- View/download PDF
4. KARGA: Multi-platform Toolkit for k-mer-based Antibiotic Resistance Gene Analysis of High-throughput Sequencing Data
- Author
-
Simone Marini and Mattia Prosperi
- Subjects
Mutation rate ,Contig ,Metagenomics ,Computer science ,k-mer ,False positive paradox ,Computational biology ,Genome ,Article ,DNA sequencing ,Resistome - Abstract
High-throughput sequencing is widely used for strain detection and characterization of antibiotic resistance in microbial metagenomic samples. Current analytical tools use curated antibiotic resistance gene (ARG) databases to classify individual sequencing reads or assembled contigs. However, identifying ARGs from raw read data can be time consuming (especially if assembly or alignment is required) and challenging, due to genome rearrangements and mutations. Here, we present the k-mer-based antibiotic gene resistance analyzer (KARGA), a multi-platform Java toolkit for identifying ARGs from metagenomic short read data. KARGA does not perform alignment; it uses an efficient double-lookup strategy, statistical filtering on false positives, and provides individual read classification as well as covering of the database resistome. On simulated data, KARGA’s antibiotic resistance class recall is 99.89% for error/mutation rates within 10%, and of 83.37% for error/mutation rates between 10% and 25%, while it is 99.92% on ARGs with rearrangements. On empirical data, KARGA provides higher hit score (≥1.5-fold) than AMRPlusPlus, DeepARG, and MetaMARC. KARGA has also faster runtimes than all other tools (2x faster than AMRPlusPlus, 7x than DeepARG, and over 100x than MetaMARC). KARGA is available under the MIT license at https://github.com/DataIntellSystLab/KARGA.
- Published
- 2021
- Full Text
- View/download PDF
5. On the use of clinical based infection data for pandemic case studies
- Author
-
Maria Mazzitelli, Gabriel Gabriele, Pietro Hiram Guzzi, Patrizia Vizza, Mattia Prosperi, Pierangelo Veltri, Carlo Torti, and Giuseppe Tradigo
- Subjects
Computer science ,05 social sciences ,Context (language use) ,computer.software_genre ,Data science ,Data modeling ,Data set ,03 medical and health sciences ,Identification (information) ,Chronic infection ,0302 clinical medicine ,Data access ,0502 economics and business ,Pandemic ,050211 marketing ,030212 general & internal medicine ,computer ,Data integration - Abstract
Epidemiological models are relevant to study and analyze clinical as well as environmental and behavioural data, useful to support health studies. The target is to perform epidemiological analysis producing fast and reliable data access useful to guide prevention and curing processes. This is currently true in pandemic emergency as the current Covid-19 context. Epidemiological models should support in the early identification of pandemic phenomena and in making available data set for studying more accurate drug-based strategy for vaccines or virus containment.In this contribution we present an epidemiology database which integrates different types of clinical data to support research, follow-up and patient monitoring. The idea starts from an hospital databases cooperation integration where virus available data have been integrated to support statistical based studies. Starting from an available database containing 5 years data of infection related viruses (such as HPC, hepatitis) and patient anonymous data, the proposed system provide an integrated data access able to (i) extracting data filtered by means of clinical hypothesis based on patient profiles, environment and drugs and (ii) allowing to build large scale geographical data mappings in order to study correlations among chronic infection diseases and their relations with upcoming pandemic phenomena. Even if the application is in its infancy, the application is relevant with high very important applications.
- Published
- 2020
- Full Text
- View/download PDF
6. Multivariate Independence Set Search via Progressive Addition for Conditional Markov Acyclic Networks
- Author
-
Mo Wang, Mattia Prosperi, and Jian-Guo Bian
- Subjects
Markov blanket ,0303 health sciences ,Theoretical computer science ,Markov chain ,Computer science ,Dimensionality reduction ,Bayesian network ,Markov process ,01 natural sciences ,010104 statistics & probability ,03 medical and health sciences ,symbols.namesake ,Joint probability distribution ,Kernel (statistics) ,symbols ,0101 mathematics ,Independence (probability theory) ,030304 developmental biology - Abstract
Estimation of conditional dependencies over a joint multivariate probability distribution is a difficult task for big data, e.g. –omics datasets, and it becomes quickly intractable when the number of variables involved grows large. For instance, structure learning in Bayesian networks has super-exponential complexity. Dimension reduction techniques such as principal component analysis can be useful but often transform the original space and can still pose problems with scalability. This substantially limits characterization of joint probability: in general, only pairwise or k-level correlations can be analyzed efficiently. We introduce the Multivariate Independence Set Search via Progressive Addition for Conditional Markov Acyclic Networks (MISS-PACMAN), which operates a greedy selection of jointly independent feature sets from a larger set of covariates, given a variable ordering. The method is non-parametric and can be used with any kernel function. MISS-PACMAN is therefore suitable for heterogeneous big data, as it combines flexibility and scalability. In our tests on both simulated and real-world data, using random forests as kernel, MISS-PACMAN was able to select independence feature sets linearly with the number of features. Further, by combining multiple independence sets, MISS-PACMAN well-approximates the underlying conditional structure of data variables (according to the generating Bayesian network), and compares favorably with other network structure discovery algorithms, such as the Peter-Clark and the fast incremental association Markov blanket.
- Published
- 2020
- Full Text
- View/download PDF
7. On the identification of long non-coding RNAs from RNA-seq
- Author
-
Giuseppe Tradigo, Francesca Cristiano, Mattia Prosperi, and Pierangelo Veltri
- Subjects
0301 basic medicine ,Genetics ,03 medical and health sciences ,030104 developmental biology ,RNA ,Identification (biology) ,RNA-Seq ,Biology ,Non-coding RNA ,DNA sequencing ,Coding (social sciences) - Abstract
Long non-coding RNAs (lncRNAs) are molecules more than 200 nucleotides involved in several biological processes. Next Generation Sequencing allows to identify transcripts containing both coding and non-coding RNAs, but no strategies have been identified so far to discover ncRNA (non-coding RNA) biological functions; thus, most of the ncRNA functionalities are still unknown. We propose a new approach to detect putative lncRNAs transcripts starting from an RNA-seq analysis performed by a reference-based assembly. The extracted transcripts are then analyzed to filter out protein transcripts, detecting putative, thus interesting, lncRNAs submitted to biologists for further validations.
- Published
- 2016
- Full Text
- View/download PDF
8. A fast and scalable high-throughput sequencing data error correction via oligomers
- Author
-
Iain Buchan, Mattia Prosperi, Franco Milicchio, Franco Milicchio, ain E. Buchan, MattiaProsperi, Milicchio, Franco, Buchan, Iain E., and Prosperi, Mattia C. F.
- Subjects
error correction ,0301 basic medicine ,Computer science ,0206 medical engineering ,Hash function ,Inference ,Genomics ,02 engineering and technology ,computer.software_genre ,De Bruijn graph ,03 medical and health sciences ,symbols.namesake ,Genetic ,Artificial Intelligence ,next generation sequencing ,Health Informatic ,Sanger sequencing ,Agricultural and Biological Sciences (miscellaneous) ,Range (mathematics) ,030104 developmental biology ,Computational Mathematic ,Scalability ,symbols ,Data mining ,Error detection and correction ,computer ,020602 bioinformatics ,de Bruijn graph ,Biotechnology - Abstract
Next-generation sequencing (NGS) technologies have superseded traditional Sanger sequencing approach in many experimental settings, given their tremendous yield and affordable cost. Nowadays it is possible to sequence any microbial organism or meta-genomic sample within hours, and to obtain a whole human genome in weeks. Nonetheless, NGS technologies are error-prone. Correcting errors is a challenge due to multiple factors, including the data sizes, the machine-specific and non-at-random characteristics of errors, and the error distributions. Errors in NGS experiments can hamper the subsequent data analysis and inference. This work proposes an error correction method based on the de Bruijn graph that permits its execution on Gigabyte-sized data sets using normal desktop/laptop computers, ideal for genome sizes in the Megabase range, e.g. bacteria. The implementation makes extensive use of hashing techniques, and implements an A∗ algorithm for optimal error correction, minimizing the distance between an erroneous read and its possible replacement with the Needleman-Wunsch score. Our approach outperforms other popular methods both in terms of random access memory usage and computing times.
- Published
- 2016
- Full Text
- View/download PDF
9. HErCoOl: High-Throughput Error Correction by Oligomers
- Author
-
Mattia Prosperi and Franco Milicchio
- Subjects
Computer science ,Graph (abstract data type) ,Sequence assembly ,Word error rate ,Genomics ,Data mining ,Ion semiconductor sequencing ,computer.software_genre ,Error detection and correction ,Data science ,Throughput (business) ,computer ,Genome - Abstract
Next-generation sequencing (NGS) technologies are marking the foundations for a new paradigm in genomics and transcriptomics. Nowadays is possible to sequence any microbial organism or meta-genomic sample within hours, and to obtain a whole human genome in less than a month. The sequencing prices are decreasing dramatically, opening to actual personalised medicine. NGS technologies however are error-prone, and correcting errors is a challenge due to multiple factors, including the data sizes (gigabyte scale) and the machine-specific, non-at-random, characteristics of errors and error distributions. Several approaches have been proposed, but yet the problem is a challenge, especially when analysing mixtures of (closely related) species, e.g., highly variable viruses infecting in a host as a swarm, like hepatitis C or human immunodeficiency virus. This work presents a novel error correction algorithm based on k-mer strings with their associated overlap graph, along with an open-source, multi-threaded, implementation. The algorithm, named Her Cool (High-throughput Error Correction by Oligomers), needs minimal tuning, only an overall error rate and -optionally- information about the genome sizes. Her Cool was compared against other state-of-the art methods, using empirical NGS data obtained with Roche 454 technology, focusing the benchmarks on mixtures of related species. Results show that Her Cool improves significantly over the current methods, and the parallelisation scales well with the size of input NGS genome producing long sequence reads, such as Roche 454 or Ion Torrent. Her Cool provides a fast and efficient error correction of NGS data, especially for mixed samples. Its platform-independent, open-source, multi-threaded implementation assures flexibility for being employed and integrated in any NGS data analysis software.
- Published
- 2014
- Full Text
- View/download PDF
10. HIV-1 Coreceptor Usage Prediction via Indexed Local Kernel Smoothing Methods and Grid-Based Multiple Statistical Validation'
- Author
-
I. Fanti, Giovanni Ulivi, A. Micarelli, Mattia Prosperi, Fanti, I, Micarelli, A, Prosperi, M, Ulivi, Giovanni, Prosperi M. C., F, Ulivi, G, and Micarelli, Alessandro
- Subjects
Sequence ,Computer science ,Robustness (computer science) ,Multithreading ,Kernel (statistics) ,Search engine indexing ,virus diseases ,Genomics ,Data mining ,computer.software_genre ,Grid ,computer ,Smoothing - Abstract
Human immunodeficiency virus type 1 (HIV-1) isolates differ in their use of coreceptors to enter target cells. This has important implications for both viral pathogenicity and susceptibility to entry inhibitors under development. Predicting HIV-1 coreceptor usage on the basis of sequence information is a challenging task due to the high variability of the HIV-1 genome. We present an efficient local smoothing kernel method, enhanced with a BLAST-based distance function, implemented by usage of multithreading grid procedures and indexing. Robust validation of the model is achieved through multiple cross-validation, along with statistical comparisons of results for performance assessment.
- Published
- 2007
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.