Estudo epidemiológico das arboviroses no Brasil e o uso de inteligência artificial na construção de um modelo preditivo de letalidade para febre amarela
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Universidade Federal de Viçosa
Abstract
As arboviroses são doenças causadas por arbovírus e representam uma crescente preocupação para a saúde pública no Brasil e no mundo, principalmente pelo potencial de dispersão, pela capacidade de adaptação a novos ambientes e hospedeiros, e possibilidade de causar epidemias extensas. O compartilhamento de um mesmo vetor de transmissão entre dengue, zika, chikungunya e febre amarela, a co-circulação dessas doenças e a grande propagação do Aedes aegypti pelo território brasileiro podem causar epidemias de caráter emergente. Ainda, a similaridade entre os sintomas apresentados para essas doenças dificulta o manejo clínico. Nesse sentido, os estudos epidemiológicos e a utilização de recursos computacionais na forma de modelos preditivos, podem aprimorar a pesquisa em saúde, avaliação da qualidade da atenção, prevenção, prognóstico resultando em tratamentos mais efetivos, bem como no controle de doenças, direcionamento de recursos e melhorias na gestão. Assim, o objetivo desse trabalho foi estimar as prevalências das arboviroses dengue, zika, chikungunya e febre amarela nos períodos de 2000 a 2018 e construir um modelo preditivo para óbito por febre amarela no Brasil utilizando inteligência artificial. No primeiro estudo foi realizada uma revisão sistemática com metanálise baseada no protocolo PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses). Foi realizada a busca nas bases de dados PubMed/MEDLINE, Scielo (Scientific Electronic Library Online), ScienceDirect e BVS (Biblioteca Virtual de Saúde), sob o termo de busca (dengue OR zika OR chikungunya OR "febre amarela") AND prevalência). A metanálise foi realizada por modelo de efeito randômico. A heterogeneidade foi avaliada pelo teste do qui-quadrado (?²) com significância de P < 0,10, sendo sua magnitude apurada pelo o I-quadrado (I²). As análises foram realizadas com os comandos Metaprop e Metareg do programa Stata (versão 13.0). Dos 8052 artigos incluídos na busca 61 artigos foram selecionados. Os anos dos estudos publicados estavam entre os anos de 1992 e 2019, sendo 65,57% dos artigos (n=40) foram publicados nos últimos 10 anos. A maioria dos artigos selecionados descrevem principalmente dados epidemiológicos regionais derivados de pesquisas e estudos conduzidos em regiões específicas, sendo utilizados para apoiar os dados nacionais no que diz respeito à prevalência e sorotipo. Não foi possível realizar a metanálise com os artigos de febre amarela, pois poucos estudos foram selecionados. O desfecho primário foi a prevalência de dengue, zika e chikungunya com intervalo de confiança de 95% (IC95%). A prevalência agrupada de dengue, zika e chikungunya no modelo randômico foi, respectivamente, 35%; 19% e 28%. Para as três arboviroses, dengue, zika e chikungunya, nenhuma das variáveis, na análise subgrupos e metaregressão, explicou a grande heterogeneidade encontrada. No artigo original, foram utilizados dados secundários do SINAN disponibilizados pelo Ministério da Saúde e solicitados por meio do Sistema Eletrônico do Serviço de Informação ao Cidadão (e-SIC). Esses dados passaram por uma etapa de padronização e organização. Para realização do modelo, foram incluídos apenas as pessoas com Classificação final confirmada para febre amarela. Para realização das análises foi utilizado o Google Colaboratory. Para o problema encontrado foram testados 8 algoritmos: Naive Bayes, Decision Tree, k- Nearest Neighbour, Logistic Regression, Multilayer Perceptron, Support Vector Machine, Random Forest e CatBoost. Para nosso conjunto de dados, foi utilizada o percentual de 75% da base para treinamento e 25% para teste. Os parâmetros tiveram suas configurações modificadas. Para a validação do modelo utilizamos a Validação Cruzada 10 partes. Para visualização foi utilizada a biblioteca SHAP. O algoritmo selecionado para o modelo foi o CatBoost por apresentar métricas de desempenho melhores, por reduzir a necessidade de ajustes de hiperparâmetros, além de ser melhor com bancos categóricos e lidar com valores faltantes. Ao analisar o modelo, as variáveis que mais influenciaram no modelo foram resultado do exame imunológico de IgM, presença de distúrbio renal, presença de dor abdominal, critério de confirmação utilizado e presença de sinais hemorrágicos. O modelo proposto para previsão de óbito em pacientes com febre amarela teve um bom resultado com 81,1% de acurácia e 79,8% de precisão. O uso dessa ferramenta pode auxiliar os profissionais de saúde na prática clínica. A análise do comportamento do modelo permitiu aprofundar aspectos importantes do banco do SINAN confirmando alguns elementos de prognóstico já apontados pela literatura e indicando novas possibilidade de estudo a fim de aprofundar aspectos relacionados ao prognostico da febre amarela e construir estratégias para um melhor manejo clínico desta enfermidade e prevenção do óbito. Palavras-chave: prevalência. arbovirose. modelo preditivo. mineração de dados.
Arboviruses are diseases caused by arboviruses and represent a growing concern for public health in Brazil and around the world, mainly due to their potential for dispersal, their ability to adapt to new environments and hosts, and the possibility of causing extensive epidemics. The sharing of the same transmission vector between dengue, zika, chikungunya and yellow fever, the co-circulation of these diseases and the widespread spread of Aedes aegypti throughout Brazilian territory can cause emerging epidemics. Furthermore, the similarity between the symptoms presented for these diseases makes clinical management difficult. In this sense, epidemiological studies and the use of computational resources in the form of predictive models can improve health research, assessment of the quality of care, prevention, prognosis resulting in more effective treatments, as well as disease control, targeting of resources and improvements in management. Thus, the objective of this work was to estimate the prevalence of the arboviruses dengue, zika, chikungunya and yellow fever in the periods from 2000 to 2018 and build a predictive model for death from yellow fever in Brazil using artificial intelligence. In the first study, a systematic review with meta-analysis was carried out based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) protocol. A search was carried out in the databases PubMed/MEDLINE, Scielo (Scientific Electronic Library Online), ScienceDirect and VHL (Virtual Health Library), under the search term (dengue OR zika OR chikungunya OR "yellow fever") AND prevalence). The meta-analysis was performed using a random effect model. Heterogeneity was assessed by the chi- square test (?²) with a significance of P < 0.10, with its magnitude determined by the I-square (I²). The analyzes were carried out using the Metaprop and Metareg commands of the Stata program (version 13.0). Of the 8052 articles included in the search, 61 articles were selected. The years of published studies were between 1992 and 2019, with 65.57% of articles (n=40) being published in the last 10 years. Most of the selected articles mainly describe regional epidemiological data derived from surveys and studies conducted in specific regions, being used to support national data with regard to prevalence and serotype. It was not possible to carry out the meta-analysis with yellow fever articles, as few studies were selected. The primary outcome was the prevalence of dengue, zika and chikungunya with a 95% confidence interval (95%CI). The pooled prevalence of dengue, zika and chikungunya in the random model was, respectively, 35%; 19% and 28%. For the three arboviruses, dengue, zika and chikungunya, none of the variables, in the subgroup and meta-regression analysis, explained the great heterogeneity found. In the original article, secondary data from SINAN made available by the Ministry of Health and requested through the Electronic System of the Citizen Information Service (e-SIC) were used. This data went through a standardization and organization stage. To create the model, only people with a confirmed final classification for yellow fever were included. Google Colaboratory was used to carry out the analyses. For the problem encountered, 8 algorithms were tested: Naive Bayes, Decision Tree, k-Nearest Neighbor, Logistic Regression, Multilayer Perceptron, Support Vector Machine, Random Forest and CatBoost. For our dataset, 75% of the base was used for training and 25% for testing. The parameters have had their settings modified. To validate the model, we used 10-part Cross Validation. The SHAP library was used for visualization. The algorithm selected for the model was CatBoost because it presents better performance metrics, reduces the need for hyperparameter adjustments, in addition to being better with categorical banks and dealing with missing values. When analyzing the model, the variables that most influenced the model were the result of the IgM immunological test, presence of renal disorder, presence of abdominal pain, confirmation criteria used and presence of hemorrhagic signs. The proposed model for predicting death in patients with yellow fever had a good result with 81.1% accuracy and 79.8% precision. The use of this tool can assist healthcare professionals in clinical practice. The analysis of the model's behavior allowed us to delve deeper into important aspects of the SINAN database, confirming some prognostic elements already highlighted in the literature and indicating new possibilities for studies in order to delve deeper into aspects related to the prognosis of yellow fever and construct strategies for better clinical management of this disease. and prevention of death. Keywords: prevalence. arbovirus. predictive model. data mining.
Arboviruses are diseases caused by arboviruses and represent a growing concern for public health in Brazil and around the world, mainly due to their potential for dispersal, their ability to adapt to new environments and hosts, and the possibility of causing extensive epidemics. The sharing of the same transmission vector between dengue, zika, chikungunya and yellow fever, the co-circulation of these diseases and the widespread spread of Aedes aegypti throughout Brazilian territory can cause emerging epidemics. Furthermore, the similarity between the symptoms presented for these diseases makes clinical management difficult. In this sense, epidemiological studies and the use of computational resources in the form of predictive models can improve health research, assessment of the quality of care, prevention, prognosis resulting in more effective treatments, as well as disease control, targeting of resources and improvements in management. Thus, the objective of this work was to estimate the prevalence of the arboviruses dengue, zika, chikungunya and yellow fever in the periods from 2000 to 2018 and build a predictive model for death from yellow fever in Brazil using artificial intelligence. In the first study, a systematic review with meta-analysis was carried out based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) protocol. A search was carried out in the databases PubMed/MEDLINE, Scielo (Scientific Electronic Library Online), ScienceDirect and VHL (Virtual Health Library), under the search term (dengue OR zika OR chikungunya OR "yellow fever") AND prevalence). The meta-analysis was performed using a random effect model. Heterogeneity was assessed by the chi- square test (?²) with a significance of P < 0.10, with its magnitude determined by the I-square (I²). The analyzes were carried out using the Metaprop and Metareg commands of the Stata program (version 13.0). Of the 8052 articles included in the search, 61 articles were selected. The years of published studies were between 1992 and 2019, with 65.57% of articles (n=40) being published in the last 10 years. Most of the selected articles mainly describe regional epidemiological data derived from surveys and studies conducted in specific regions, being used to support national data with regard to prevalence and serotype. It was not possible to carry out the meta-analysis with yellow fever articles, as few studies were selected. The primary outcome was the prevalence of dengue, zika and chikungunya with a 95% confidence interval (95%CI). The pooled prevalence of dengue, zika and chikungunya in the random model was, respectively, 35%; 19% and 28%. For the three arboviruses, dengue, zika and chikungunya, none of the variables, in the subgroup and meta-regression analysis, explained the great heterogeneity found. In the original article, secondary data from SINAN made available by the Ministry of Health and requested through the Electronic System of the Citizen Information Service (e-SIC) were used. This data went through a standardization and organization stage. To create the model, only people with a confirmed final classification for yellow fever were included. Google Colaboratory was used to carry out the analyses. For the problem encountered, 8 algorithms were tested: Naive Bayes, Decision Tree, k-Nearest Neighbor, Logistic Regression, Multilayer Perceptron, Support Vector Machine, Random Forest and CatBoost. For our dataset, 75% of the base was used for training and 25% for testing. The parameters have had their settings modified. To validate the model, we used 10-part Cross Validation. The SHAP library was used for visualization. The algorithm selected for the model was CatBoost because it presents better performance metrics, reduces the need for hyperparameter adjustments, in addition to being better with categorical banks and dealing with missing values. When analyzing the model, the variables that most influenced the model were the result of the IgM immunological test, presence of renal disorder, presence of abdominal pain, confirmation criteria used and presence of hemorrhagic signs. The proposed model for predicting death in patients with yellow fever had a good result with 81.1% accuracy and 79.8% precision. The use of this tool can assist healthcare professionals in clinical practice. The analysis of the model's behavior allowed us to delve deeper into important aspects of the SINAN database, confirming some prognostic elements already highlighted in the literature and indicating new possibilities for studies in order to delve deeper into aspects related to the prognosis of yellow fever and construct strategies for better clinical management of this disease. and prevention of death. Keywords: prevalence. arbovirus. predictive model. data mining.
Description
Citation
FRACALOSSI, Karen Oliveira. Estudo epidemiológico das arboviroses no Brasil e o uso de inteligência artificial na construção de um modelo preditivo de letalidade para febre amarela. 2021. 133 f. Dissertação (Mestrado em Ciência da Nutrição) - Universidade Federal de Viçosa, Viçosa. 2021.
