Proposta de um interpolador geoestatístico híbrido com aprendizado de máquina
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Universidade Federal de Viçosa
Abstract
A krigagem tem sido um método univariado muito utilizado na literatura para interpolação de dados. Entretanto, apresenta a desvantagem de ser computacionalmente inviável para modelar o estimador de semivariograma em grandes conjuntos de dados e descartar variáveis importantes no estudo pela presença do efeito pepita puro. Para solucionar essas desvantagens e melhorar a capacidade de predição desse interpolador, apresenta-se nesse trabalho, um estudo que envolve a metodologia da Geoestatística com aprendizado de máquina para implementar, computacionalmente, um interpolador híbrido capaz de modelar, em uma abordagem multivariada, a influência da variabilidade espacial de todas as variáveis presentes no estudo na predição da variabilidade espacial da variável de interesse, sem a restrição ao número de variáveis e ao tamanho do conjunto de dados. E, para fins de comparação, foi realizada via coeficiente erro quadrático médio (EQM) e coeficiente de determinação (R2) uma análise para verificar o desempenho do interpolador implementado. Para isso, foram coletadas amostras do solo de 50m×30m em todas as linhas da região do estudo e amostras da produção média das castanheiras, no período 2007 a 2015. As análises estatísticas e geoestatísticas foram realizadas no ambiente computacional do software R e todos os pontos foram georreferenciados. Como resultado, obteve-se não só um aprimoramento do ajuste do modelo implementado e uma redução significativa para erro quadrático médio, bem como, o detalhamento do grau de importância de cada atributo do solo para predizer a variabilidade espacial da produção média das Castanheiras-da-amazônia. Palavras-chave: Random Forest. FRK. Inteligência Artificial. Bertholletia excelsa. Análise Multivariada.
A kriging has been a univariate method widely used in the literature for data interpolation; however, it presents a disadvantage of being computationally unfeasible to model the semivariogram estimator in large data sets and to discard important variables in the study due to the presence of the pure nugget effect. In order to solve these disadvantages and improve the interpolator's predictive capacity, this research presents a study involving the geostatistics methodology with machine learning to implement, computationally, a hybrid interpolator capable of defining, in a multivariate approach, the degree of importance of each variable under study to predicting the spatial variability of the interest’s variable, without restriction on the number of variables and the size of the data set. And, for comparison purposes, an analysis was performed through mean square error coefficient (EQM) and determination coefficient (R2) to verify the performance of the implemented interpolator. For that, samples of soil of 50 × 30m were collected in all lines of the study region and samples of the average production of chestnut trees in the period 2007 to 2015. Statistical and geostatistical analyzes were performed in the computational environment of the R software and all points were georeferenced. As a result, a perfect fit of the model was obtained and a significant reduction for mean squared error when using the implemented hybrid interpolator, as also, the degree of importance of each soil attribute to predict the spatial variability of the average production of Chestnuts of the Amazon. Keywords: Random Forest. FRK. Random Forest. Artificial intelligence. Bertholletia excelsa. Multivariate Analysis.
A kriging has been a univariate method widely used in the literature for data interpolation; however, it presents a disadvantage of being computationally unfeasible to model the semivariogram estimator in large data sets and to discard important variables in the study due to the presence of the pure nugget effect. In order to solve these disadvantages and improve the interpolator's predictive capacity, this research presents a study involving the geostatistics methodology with machine learning to implement, computationally, a hybrid interpolator capable of defining, in a multivariate approach, the degree of importance of each variable under study to predicting the spatial variability of the interest’s variable, without restriction on the number of variables and the size of the data set. And, for comparison purposes, an analysis was performed through mean square error coefficient (EQM) and determination coefficient (R2) to verify the performance of the implemented interpolator. For that, samples of soil of 50 × 30m were collected in all lines of the study region and samples of the average production of chestnut trees in the period 2007 to 2015. Statistical and geostatistical analyzes were performed in the computational environment of the R software and all points were georeferenced. As a result, a perfect fit of the model was obtained and a significant reduction for mean squared error when using the implemented hybrid interpolator, as also, the degree of importance of each soil attribute to predict the spatial variability of the average production of Chestnuts of the Amazon. Keywords: Random Forest. FRK. Random Forest. Artificial intelligence. Bertholletia excelsa. Multivariate Analysis.
Description
Citation
ILAMBWETSI, Patrícia de Sousa. Proposta de um interpolador geoestatístico híbrido com aprendizado de máquina. 2020. 96 f. Tese (Doutor em Estatística Aplicada e Biometria) - Universidade Federal de Viçosa, Viçosa. 2020.
