Geminivirus data warehouse: a database enriched with machine learning approaches

dc.contributor.authorSilva, Jose Cleydson F.
dc.contributor.authorCarvalho, Thales F. M.
dc.contributor.authorBasso, Marcos F.
dc.contributor.authorDeguchi, Michihito
dc.contributor.authorPereira, Welison A.
dc.contributor.authorVidigal, Pedro M. P.
dc.contributor.authorBrustolini, Otávio J. B.
dc.contributor.authorSilva, Fabyano F.
dc.contributor.authorDal-Bianco, Maximiller
dc.contributor.authorFontes, Renildes L. F.
dc.contributor.authorSantos, Anésia A.
dc.contributor.authorZerbini, Francisco Murilo
dc.contributor.authorCerqueira, Fabio R.
dc.contributor.authorFontes, Elizabeth P. B.
dc.contributor.authorR. Sobrinho, Roberto
dc.date.accessioned2017-11-06T09:22:19Z
dc.date.available2017-11-06T09:22:19Z
dc.date.issued2017-05-05
dc.description.abstractThe Geminiviridae family encompasses a group of single-stranded DNA viruses with twinned and quasi-isometric virions, which infect a wide range of dicotyledonous and monocotyledonous plants and are responsible for significant economic losses worldwide. Geminiviruses are divided into nine genera, according to their insect vector, host range, genome organization, and phylogeny reconstruction. Using rolling-circle amplification approaches along with high-throughput sequencing technologies, thousands of full-length geminivirus and satellite genome sequences were amplified and have become available in public databases. As a consequence, many important challenges have emerged, namely, how to classify, store, and analyze massive datasets as well as how to extract information or new knowledge. Data mining approaches, mainly supported by machine learning (ML) techniques, are a natural means for high-throughput data analysis in the context of genomics, transcriptomics, proteomics, and metabolomics. Here, we describe the development of a data warehouse enriched with ML approaches, designated geminivirus.org. We implemented search modules, bioinformatics tools, and ML methods to retrieve high precision information, demarcate species, and create classifiers for genera and open reading frames (ORFs) of geminivirus genomes. The use of data mining techniques such as ETL (Extract, Transform, Load) to feed our database, as well as algorithms based on machine learning for knowledge extraction, allowed us to obtain a database with quality data and suitable tools for bioinformatics analysis. The Geminivirus Data Warehouse (geminivirus.org) offers a simple and user-friendly environment for information retrieval and knowledge discovery related to geminiviruses.en
dc.formatpdfpt-BR
dc.identifier.issn1471-2105
dc.identifier.urihttps://doi.org/10.1186/s12859-017-1646-4
dc.identifier.urihttp://www.locus.ufv.br/handle/123456789/12748
dc.language.isoengpt-BR
dc.publisherBioMed Central Bioinformaticspt-BR
dc.relation.ispartofseriesv. 18, n. 240, May. 2017pt-BR
dc.rightsOpen Accesspt-BR
dc.subjectMachine learningpt-BR
dc.subjectKnowledge discoverypt-BR
dc.subjectData miningpt-BR
dc.subjectGeminiviruspt-BR
dc.subjectData warehousept-BR
dc.subjectRandom forestpt-BR
dc.titleGeminivirus data warehouse: a database enriched with machine learning approachesen
dc.typeArtigopt-BR

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1
Imagem de Miniatura
Nome:
document.pdf
Tamanho:
1.26 MB
Formato:
Adobe Portable Document Format
Descrição:
texto completo

Licença do pacote

Agora exibindo 1 - 1 de 1
Nenhuma Miniatura Disponível
Nome:
license.txt
Tamanho:
1.71 KB
Formato:
Item-specific license agreed upon to submission
Descrição:

Coleções