Predição da produtividade da soja por índices de vegetação: uma abordagem com modelos aditivos generalizados
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Universidade Federal de Viçosa
Abstract
O sensoriamento remoto surgiu como uma possibilidade para fornecer insights sobre características agronômicas sem a necessidade de contato direto com o objeto ou planta. Os instrumentos de detecção remota fornecem informações sobre cinco bandas espectrais, as bandas do espectro de luz visível RGB, além do RedEDGE e do Infravermelho Próximo (NIR). A combinação de duas ou mais dessas bandas forma um índice de vegetação (IV), o qual está associado a variáveis agronômicas, incluindo a produtividade. Essas associações podem ser relações não lineares entre as variáveis. Os modelos aditivos generalizados (GAMs) são uma soma de variáveis suavizadas das covariáveis e têm a capacidade de lidar de forma flexível com a não linearidade entre elas, além de serem um modelo aditivo. Nesse contexto, o objetivo deste trabalho é avaliar a abordagem dos GAMs quanto em relação à sua capacidade preditiva para a produtividade de soja, a partir de imagens aéreas, utilizando IV. Os dados experimentais utilizados foram obtidos em plantas de soja. O experimento instalado no delineamento de blocos casualizados. Foram capturadas 11 imagens ao longo do ciclo da soja, permitindo relacionar cada etapa com o estádio fenológico da soja. Este estudo foi dividido em três etapas. Na primeira parte, foi realizada uma seleção de variáveis utilizando Random forest (RF) em cada semana de estudo. Na segunda etapa, com os índices selecionados, foi realizada uma análise gráfica a partir do ajuste dos GAMs univariados, para verificar a associação linear ou não linear dos índices com a produtividade. Na terceira etapa, foi feita a comparação entre os GAMs, regressão linear múltipla (RLM) e RF quanto à capacidade preditiva. O desempenho dos modelos foi avaliado por meio de uma validação cruzada em 10 etapas, utilizando métricas como o raiz quadrada do erro quadrático médio (RMSE) e o coeficiente de correlação (r) entre os valores observados e os valores preditos, no caso da regressão, e pela acurácia, no caso da classificação. Entre os IVs e bandas espectrais selecionados mais associados à produtividade, destacam-se o NIR, Structure Intensive Pigment Index (SIPI), Normalized green–red difference index (NGBDI) e o Triangular Greenness Index (TGI). Os IVs foram separados em quatro categorias, em termos de associações lineares ou não lineares com a produtividade: estritamente lineares, moderadamente não lineares, mescla de associações lineares e não lineares, e estritamente não lineares. O desempenho dos modelos ajustados GAMs e RLM com as variáveis selecionadas foi semelhante, tanto em termos de regressão (RMSE e coeficiente de correlação), quanto em classificação (acurácia). Em ambos os modelos, o final da fase vegetativa e o início do enchimento dos grãos R5 foram as fases mais indicadas para a predição de produtividade. Palavras-chave: seleção de variáveis ; bandas espectrais; relações não lineares ; random forest; regressão linear múltipla.
Remote sensing emerged as a possibility to provide insights into agronomic characteristics without direct contact with the object or plant. Remote sensing instruments provide information on five spectral bands: the bands of the visible light spectrum (RGB) and the RedEDGE and Near Infrared (NIR). Combining two or more of these bands forms a vegetation index (IV), which is associated with agronomic variables, including productivity. These associations can be nonlinear relationships between these variables. Generalized additive models (GAMs) are a sum of smoothed functions of the covariates. They can flexibly handle the nonlinearity between them and be an additive model. In this context, this work aims to evaluate the generalized additive models approach in terms of its predictive capacity for soybean productivity, using aerial images and vegetation indices. The experimental data used were obtained from soybean plants. The experiment was set up in a randomized block design. Eleven images were captured throughout the soybean cycle, allowing each stage to be related to the soybean phenological stage. This study was divided into three stages. In the first part, a variable selection was performed using Random Forest (RF) for each study week. In the second stage, with the selected indices, a graphical analysis was carried out based on the adjustment of univariate GAMs to verify the linear or nonlinear association of the indices with productivity. In the third stage, a comparison was made between GAMs, multiple linear regression (RLM), and RF regarding predictive capability. The performance of the models was evaluated through 10-fold cross-validation, using metrics such as Root Mean Squared Error (RMSE) and the correlation coefficient (r) between the observed and predicted values in the case of regression, and accuracy in the case of classification. Among the selected IVs and spectral bands most associated with productivity, NIR, Structure Intensive Pigment Index (SIPI), Normalized green–red difference index (NGBDI), and the Triangular Greenness Index (TGI) stand out. The IVs were categorized into four groups based on their linear or nonlinear associations with productivity: strictly linear, moderately nonlinear, a mix of linear and nonlinear associations, and strictly nonlinear. The performance of the GAMs and RLM models fitted with the selected variables was similar in regression (RMSE and correlation coefficient) and classification (accuracy). In both models, the end of the vegetative phase and the beginning of the grain filling phase R5 were the most suitable stages for predicting productivity. Keywords: variable selection ; spectral bands ; nonlinear relationships ; random forest ; multiple linear regression.
Remote sensing emerged as a possibility to provide insights into agronomic characteristics without direct contact with the object or plant. Remote sensing instruments provide information on five spectral bands: the bands of the visible light spectrum (RGB) and the RedEDGE and Near Infrared (NIR). Combining two or more of these bands forms a vegetation index (IV), which is associated with agronomic variables, including productivity. These associations can be nonlinear relationships between these variables. Generalized additive models (GAMs) are a sum of smoothed functions of the covariates. They can flexibly handle the nonlinearity between them and be an additive model. In this context, this work aims to evaluate the generalized additive models approach in terms of its predictive capacity for soybean productivity, using aerial images and vegetation indices. The experimental data used were obtained from soybean plants. The experiment was set up in a randomized block design. Eleven images were captured throughout the soybean cycle, allowing each stage to be related to the soybean phenological stage. This study was divided into three stages. In the first part, a variable selection was performed using Random Forest (RF) for each study week. In the second stage, with the selected indices, a graphical analysis was carried out based on the adjustment of univariate GAMs to verify the linear or nonlinear association of the indices with productivity. In the third stage, a comparison was made between GAMs, multiple linear regression (RLM), and RF regarding predictive capability. The performance of the models was evaluated through 10-fold cross-validation, using metrics such as Root Mean Squared Error (RMSE) and the correlation coefficient (r) between the observed and predicted values in the case of regression, and accuracy in the case of classification. Among the selected IVs and spectral bands most associated with productivity, NIR, Structure Intensive Pigment Index (SIPI), Normalized green–red difference index (NGBDI), and the Triangular Greenness Index (TGI) stand out. The IVs were categorized into four groups based on their linear or nonlinear associations with productivity: strictly linear, moderately nonlinear, a mix of linear and nonlinear associations, and strictly nonlinear. The performance of the GAMs and RLM models fitted with the selected variables was similar in regression (RMSE and correlation coefficient) and classification (accuracy). In both models, the end of the vegetative phase and the beginning of the grain filling phase R5 were the most suitable stages for predicting productivity. Keywords: variable selection ; spectral bands ; nonlinear relationships ; random forest ; multiple linear regression.
Description
Citation
SILVA, Lucas Coelho da. Predição da produtividade da soja por índices de vegetação: uma abordagem com modelos aditivos generalizados. 2025. 87 f. Dissertação (Mestrado em Estatística Aplicada e Biometria) - Universidade Federal de Viçosa, Viçosa. 2025.
