Artigos

URI permanente para esta coleçãohttps://locus.ufv.br/handle/123456789/11798

Navegar

Resultados da Pesquisa

Agora exibindo 1 - 4 de 4
  • Imagem de Miniatura
    Item
    Erratum to: Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction
    (BMC Bioinformatics, 2017) Marques, Yuri Bento; Oliveira, Alcione de Paiva; Vasconcelos, Ana Tereza Ribeiro; Cerqueira, Fabio Ribeiro
    MicroRNAs (miRNAs) are key gene expression regulators in plants and animals. Therefore, miRNAs are involved in several biological processes, making the study of these molecules one of the most relevant topics of molecular biology nowadays. However, characterizing miRNAs in vivo is still a complex task. As a consequence, in silico methods have been developed to predict miRNA loci. A common ab initio strategy to find miRNAs in genomic data is to search for sequences that can fold into the typical hairpin structure of miRNA precursors (pre-miRNAs). The current ab initio approaches, however, have selectivity issues, i.e., a high number of false positives is reported, which can lead to laborious and costly attempts to provide biological validation. This study presents an extension of the ab initio method miRNAFold, with the aim of improving selectivity through machine learning techniques, namely, random forest combined with the SMOTE procedure that copes with imbalance datasets. By comparing our method, termed Mirnacle, with other important approaches in the literature, we demonstrate that Mirnacle substantially improves selectivity without compromising sensitivity. For the three datasets used in our experiments, our method achieved at least 97% of sensitivity and could deliver a two-fold, 20-fold, and 6-fold increase in selectivity, respectively, compared with the best results of current computational tools. The extension of miRNAFold by the introduction of machine learning techniques, significantly increases selectivity in pre-miRNA ab initio prediction, which optimally contributes to advanced studies on miRNAs, as the need of biological validations is diminished. Hopefully, new research, such as studies of severe diseases caused by miRNA malfunction, will benefit from the proposed computational tool.
  • Imagem de Miniatura
    Item
    NICeSim: An open-source simulator based on machine learning techniques to support medical research on prenatal and perinatal care decision making
    (Artificial Intelligence in Medicine, 2014-10-05) Cerqueira, Fabio Ribeiro; Ferreira, Tiago Geraldo; Oliveira, Alcione de Paiva; Augusto, Douglas Adriano; Krempser, Eduardo; Barbosa, Helio José Corrêa; Franceschini, Sylvia do Carmo Castro; Freitas, Brunnella Alcantara Chagas de; Gomes, Andreia Patricia; Siqueira-Batista, Rodrigo
    This paper describes NICeSim, an open-source simulator that uses machine learning (ML) techniques to aid health professionals to better understand the treatment and prognosis of premature newborns. The application was developed and tested using data collected in a Brazilian hospital. The available data were used to feed an ML pipeline that was designed to create a simulator capable of predicting the outcome (death probability) for newborns admitted to neonatal intensive care units. However, unlike previous scoring systems, our computational tool is not intended to be used at the patients bedside, although it is possible. Our primary goal is to deliver a computational system to aid medical research in understanding the correlation of key variables with the studied outcome so that new standards can be established for future clinical decisions. In the implemented simulation environment, the values of key attributes can be changed using a user-friendly interface, where the impact of each change on the outcome is immediately reported, allowing a quantitative analysis, in addition to a qualitative investigation, and delivering a totally interactive computational tool that facilitates hypothesis construction and testing. Our statistical experiments showed that the resulting model for death prediction could achieve an accuracy of 86.7% and an area under the receiver operating characteristic curve of 0.84 for the positive class. Using this model, three physicians and a neonatal nutritionist performed simulations with key variables correlated with chance of death. The results indicated important tendencies for the effect of each variable and the combination of variables on prognosis. We could also observe values of gestational age and birth weight for which a low Apgar score and the occurrence of respiratory distress syndrome (RDS) could be less or more severe. For instance, we have noticed that for a newborn with 2000 g or more the occurrence of RDS is far less problematic than for neonates weighing less. The significant accuracy demonstrated by our predictive model shows that NICeSim might be used for hypothesis testing to minimize in vivo experiments. We observed that the model delivers predictions that are in very good agreement with the literature, demonstrating that NICeSim might be an important tool for supporting decision making in medical practice. Other very important characteristics of NICeSim are its flexibility and dynamism. NICeSim is flexible because it allows the inclusion and deletion of variables according to the requirements of a particular study. It is also dynamic because it trains a just-in-time model. Therefore, the system is improved as data from new patients become available. Finally, NICeSim can be extended in a cooperative manner because it is an open-source system.
  • Imagem de Miniatura
    Item
    Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction
    (BMC Bioinformatics, 2016-12-15) Marques, Yuri Bento; Oliveira, Alcione de Paiva; Vasconcelos, Ana Tereza Ribeiro; Cerqueira, Fabio Ribeiro
    MicroRNAs (miRNAs) are key gene expression regulators in plants and animals. Therefore, miRNAs are involved in several biological processes, making the study of these molecules one of the most relevant topics of molecular biology nowadays. However, characterizing miRNAs in vivo is still a complex task. As a consequence, in silico methods have been developed to predict miRNA loci. A common ab initio strategy to find miRNAs in genomic data is to search for sequences that can fold into the typical hairpin structure of miRNA precursors (pre-miRNAs). The current ab initio approaches, however, have selectivity issues, i.e., a high number of false positives is reported, which can lead to laborious and costly attempts to provide biological validation. This study presents an extension of the ab initio method miRNAFold, with the aim of improving selectivity through machine learning techniques, namely, random forest combined with the SMOTE procedure that copes with imbalance datasets. By comparing our method, termed Mirnacle, with other important approaches in the literature, we demonstrate that Mirnacle substantially improves selectivity without compromising sensitivity. For the three datasets used in our experiments, our method achieved at least 97% of sensitivity and could deliver a two-fold, 20-fold, and 6-fold increase in selectivity, respectively, compared with the best results of current computational tools. The extension of miRNAFold by the introduction of machine learning techniques, significantly increases selectivity in pre-miRNA ab initio prediction, which optimally contributes to advanced studies on miRNAs, as the need of biological validations is diminished. Hopefully, new research, such as studies of severe diseases caused by miRNA malfunction, will benefit from the proposed computational tool.
  • Imagem de Miniatura
    Item
    MUMAL2: Improving sensitivity in shotgun proteomics using cost sensitive artificial neural networks and a threshold selector algorithm
    (BMC Bioinformatics, 2016-12-15) Cerqueira, Fabio Ribeiro; Ricardo, Adilson Mendes; Oliveira, Alcione de Paiva; Graber, Armin; Baumgartner, Christian
    This work presents a machine learning strategy to increase sensitivity in tandem mass spectrometry (MS/MS) data analysis for peptide/protein identification. MS/MS yields thousands of spectra in a single run which are then interpreted by software. Most of these computer programs use a protein database to match peptide sequences to the observed spectra. The peptide-spectrum matches (PSMs) must also be assessed by computational tools since manual evaluation is not practicable. The target-decoy database strategy is largely used for error estimation in PSM assessment. However, in general, that strategy does not account for sensitivity. In a previous study, we proposed the method MUMAL that applies an artificial neural network to effectively generate a model to classify PSMs using decoy hits with increased sensitivity. Nevertheless, the present approach shows that the sensitivity can be further improved with the use of a cost matrix associated with the learning algorithm. We also demonstrate that using a threshold selector algorithm for probability adjustment leads to more coherent probability values assigned to the PSMs. Our new approach, termed MUMAL2, provides a two-fold contribution to shotgun proteomics. First, the increase in the number of correctly interpreted spectra in the peptide level augments the chance of identifying more proteins. Second, the more appropriate PSM probability values that are produced by the threshold selector algorithm impact the protein inference stage performed by programs that take probabilities into account, such as ProteinProphet. Our experiments demonstrate that MUMAL2 reached around 15% of improvement in sensitivity compared to the best current method. Furthermore, the area under the ROC curve obtained was 0.93, demonstrating that the probabilities generated by our model are in fact appropriate. Finally, Venn diagrams comparing MUMAL2 with the best current method show that the number of exclusive peptides found by our method was nearly 4-fold higher, which directly impacts the proteome coverage. The inclusion of a cost matrix and a probability threshold selector algorithm to the learning task further improves the target-decoy database analysis for identifying peptides, which optimally contributes to the challenging task of protein level identification, resulting in a powerful computational tool for shotgun proteomics.