Navegando por Autor "Silva, José Cleydson Ferreira da"
Agora exibindo 1 - 4 de 4
- Resultados por Página
- Opções de Ordenação
Item Data warehouse enriquecido com métodos de aprendizado de máquina para a família Geminiviridae(Universidade Federal de Viçosa, 2016-07-25) Silva, José Cleydson Ferreira da; Cerqueira, Fabio Ribeiro; http://lattes.cnpq.br/3083063529099935Geminivírus infectam uma ampla faixa de plantas monocotiledôneas e dicotiledô- neas e causam expressivas perdas econômicas. A família Geminividae é uma das mais importantes famílias de vírus de plantas. Atualmente está composta por sete gêneros, é reconhecida pelo tipo de inseto vetor, hospedeiro, organização genômica e reconstrução filogenética. A amplificação por ciclo rolante permitiu que milhares de sequências completas e parciais fossem depositadas em bases de dados públi- cas. Entretanto, tais bases de dados são limitadas em ferramentas avançadas que permitam responder perguntas sofisticadas. Ao contrário de outros importantes patógenos virais, nenhum banco de dados para geminivírus que integre todas as informações relevantes foi ainda sugerido. Neste trabalho, um Data Warehouse (DW) designado geminivirus.com é proposto. Um DW amplamente enriquecido por abordagens de aprendizado de máquina que vise garantir confiabilidade e qua- lidade das sequências genômicas e seus metadados associados. As metodologias de extração, transformação dessas sequências e seus metadados foram implemen- tadas em um processo ETL (Extract, Transform and Load) específico para dados de geminivírus. Além disso, neste processo, o uso de algoritmos de aprendizado de máquina como Multilayer Perceptron (MLP), Máquina de Vetores de Suporte (SVM) e Random Forest são utilizados como classificadores taxonômicos in silico para classificar as sequências completas. Ademais, modelos de aprendizado de máquina foram propostos para classificação de genes. Os modelos para ambos os fins superam 98% de acurácia e precisão, utilizando apenas atributos extraídos da sequência genômica completa, sequência CDS (Coding DNA Sequence) e sequên- cia de aminoácidos. Também técnicas de Processamento de Linguagem Natural baseadas em teoria dos grafos foram propostas para extração de informação e co- nhecimento em resumos de artigos. Essa metodologia apresentou grande potencial para responder perguntas específicas. Explorando o grafo de texto buscando por palavras chaves que representam os mecanismos evolutivos, verificou-se que o tema recombinação é os mais estudado se comparado à mutação, migração, seleção na- tural e deriva genética. Tornando-se assim, uma técnica propicia para gerar novas hipóteses. Ao utilizar tal técnica, observou-se que ferramentas de predição de genes não foram mencionadas. Dessa oportunidade, sugerimos um método para predição e classificação de genes designado Fangorn Forest (F2). Além disso, como parte desse método sugerimos um algoritmo para predição de genes designado Millau Bridge (MB). Esse algoritmo testa todas as possíveis ORFs que uma sequência genômica completa pode codificar por meio de codons de iniciação e terminação. Além disso, identifica sítios de excisão de splicing. geminivrus.com tornou-se uma base de dados robusta capaz de proporcionar dados com boa qualidade, ferramen- tas avançadas enriquecidas por métodos de aprendizado de máquina que auxiliam pesquisadores em suas atividades de pesquisa e tomada de decisão.Item Development of a new machine learning-derived method for high-throughput prediction of plant receptor-like proteins(Universidade Federal de Viçosa, 2020-02-28) Silva, José Cleydson Ferreira da; Fontes, Elizabeth Pacheco Batista; http://lattes.cnpq.br/3083063529099935Machine learning (ML) is a field of artificial intelligence that has rapidly emerged in plant molecular biology, thus allowing the exploitation of massive data. The main challenges are to analyze massive datasets and extract new knowledge of cellular systems. Here, we just presented a systematic review to disentangle ML approaches is relevant for plant scientists (Chapter 1). We presented the main steps for ML development, including data selection, features extraction, training algorithms and evaluation of classification/prediction models, indicating role ML algorithm in the post-genomic era. Additionally, based on the systematic review we also developed a framework machine learning method for cell surface receptors prediction (Chapter 2). Two classes of cell surface receptors designated receptor-like protein kinase (RLK) and receptor-like protein (RLPs) are essential for perceiving and processing external and internal signals in plants and animal. Both are involved in plant development and pathogen responses and share a similar extracellular domain, capable of initial sensing environmental signal. However, RLPs have short divergent C-terminal regions not associated with conserved kinase domain characteristic of RLKs. The absence of C-terminal phylogenetic relationships between RLK and RLPs precludes the use of sequence comparison algorithms for high-throughput predictions of the RLP family. Thus, we developed the first RLP predictor in plants designated RLPredictiOme. The RLPredictiOme was implemented based on machine learning models associated with Bayesian inference. The ML models were developed in three stages to distinguish RLPs from noRLPs, RLPs from RLKs and classify new subfamilies of RLPs in plants. The evaluation of the models resulted in a high accuracy, precision, sensitivity, and specificity and relatively high probability ranging from 0.79 to 0.99 for RLPs predictions. In addition, a complete validate the of RLPredictiOme was performed with LRR-RLPs of previously characterized Arabidopsis RLPs, Arabidopsis and rice and more than 90% of known RLPs were correctly predicted. In addition to predicting previously characterized RLPs, RLPredictiOme uncovered new RLP subfamilies in the Arabidopsis genome. These include a probable lipid transfer (PLT)-RLP, plastocyanin-like-RLP, ring finger-RLP, glycosyl-hydrolase-RLP, and glycerophosphoryl diester phosphodiesterase (GDPDL)-RLP subfamilies, yet to be characterized. In comparison with the only Arabidopsis GDPDL-RLK, molecular evolution studies confirmed that the ectodomain of GDPDL-RLPs from Arabidopsis might have undergone purifying selection with a predominance of synonymous substitutions. Expression analyses revealed that predicted GDPGL-RLPs display a basal level of expression and respond to developmental and biotic signals. The results of these biological assays substantiate the notion that the members of this subfamily have maintained functional domains during evolution and may play relevant roles in development and plant defense. Therefore, RLPredictiOme can provide new insights into the functional role of surface receptors and their relationships with different biological processes. Keywords: Machine learning. Receptor-like protein. RLPredictiOme.Item A novel, highly divergent ssDNA virus identified in Brazil infecting apple, pear and grapevine(Virus Research, 2015-07-14) Basso, Marcos Fernando; Silva, José Cleydson Ferreira da; Fajardo, Thor Vinícius Martins; Fontes, Elizabeth Pacheco Batista; Zerbini, Francisco MuriloFruit trees of temperate and tropical climates are of great economical importance worldwide and several viruses have been reported affecting their productivity and longevity. Fruit trees of different Brazilian regions displaying virus-like symptoms were evaluated for infection by circular DNA viruses. Seventy-four fruit trees were sampled and a novel, highly divergent, monopartite circular ssDNA virus was cloned from apple, pear and grapevine trees. Forty-five complete viral genomes were sequenced, with a size of approx. 3.4 kb and organized into five ORFs. Deduced amino acid sequences showed identities in the range of 38% with unclassified circular ssDNA viruses, nanoviruses and alphasatellites (putative Replication-associated protein, Rep), and begomo-, curto- and mastreviruses (putative coat protein, CP, and movement protein, MP). A large intergenic region contains a short palindromic sequence capable of forming a hairpin-like structure with the loop sequence TAGTATTAC, identical to the conserved nonanucleotide of circoviruses, nanoviruses and alphasatellites. Recombination events were not detected and phylogenetic analysis showed a relationship with circo-, nano- and geminiviruses. PCR confirmed the presence of this novel ssDNA virus in field plants. Infectivity tests using the cloned viral genome confirmed its ability to infect apple and pear tree seedlings, but not Nicotiana benthamiana. The name “Temperate fruit decay-associated virus” (TFDaV) is proposed for this novel virus.Item Sugar metabolism and developmental stages of rubber tree (Hevea brasiliensis L.) seeds(Physiologia Plantarum, 2017-12-12) Souza, Genaina Aparecida de; Dias, Denise Cunha Fernandes dos Santos; Pimenta, Thaline Martins; Almeida, Andrea Lanna; Picoli, Edgard Augusto de Toledo; Alvarenga, Antônio de Pádua; Silva, José Cleydson Ferreira daChanges in the concentration of sugars and sucrose metabolism enzymes can characterize the developmental stages of a seed. In recalcitrant species such as Hevea brasiliensis L., little is known about these changes. We aimed to evaluate the three main stages of development of rubber tree seeds – histodifferentiation, cell elongation and accumulation of reserves. The activities of acid and neutral invertases (E.C. 3.2.1.26) and sucrose synthase (EC 2.4.1.13), and the concentrations of reducing sugars (RS), total soluble sugars (TSS) and sucrose (Suc) were determined concomitantly with the histochemical and anatomical evaluation of seed structure. Histodifferentiation in rubber tree seeds occurs up to 75 days after anthesis (DAA). The concentration of RS is high and of Suc is low during seed histodifferentiation, which occurs along with a visible increase in the number of cell divisions. After that period, there is an increase in the concentration of Suc (mg g−1) and in the number and size of starch granules, and a decrease in the concentration of RS (mg g−1). At that point, cell elongation occurs. At 135 DAA, there is an inversion in the concentration of these two sugars and an increase in reserve accumulation. Thus, in seeds of the evaluated clone, the period up to 75 DAA is characterized as the histodifferentiation stage, while from that time up to 120 DAA the cell elongation stage takes place. The final stage of seed maturation and reserve accumulation begins at 135 DAA, and the seed, including the embryo, is completely formed at 175 DAA.