Enriquecendo bases de dados geoespaciais oficiais: uma abordagem híbrida para integração de topônimos colaborativos validados com inteligência artificial
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Universidade Federal do Paraná
Abstract
Os nomes de lugares, ou topônimos, são centrais para identificação e comunicação do espaço geográfico e para produtos cartográficos, sistemas de informação e serviços baseados em localização. No entanto, a atualização de bases toponímicas oficiais é um desafio: a reambulação, método tradicional de coleta de nomes em campo, é demorada e onerosa, tornando difícil manter dados completos e atuais em países de grande extensão territorial e recursos limitados, como o Brasil. Essa limitação resulta em lacunas e desatualização que afetam tanto a qualidade de dados geoespaciais fundamentais quanto sua utilização em planejamento e políticas públicas. Em contrapartida, plataformas de mapeamento colaborativo, como o OpenStreetMap (OSM), oferecem um fluxo contínuo e dinâmico de topônimos baseados no conhecimento local de voluntários. Contudo, a integração confiável desses dados em produtos oficiais carece de métodos sistemáticos para avaliar qualidade e validade semântica, reduzindo o potencial de aproveitamento desse recurso. Diante desse cenário, esta tese propõe uma metodologia híbrida para enriquecer bases de dados geoespaciais oficiais por meio da priorização, validação e integração verificável de topônimos colaborativos do OSM, combinando análise de qualidade intrínseca com validação extrínseca apoiada em Inteligência Artificial (IA). Esta pesquisa teve por objetivo desenvolver e aplicar essa abordagem de forma reprodutível e em diferentes contextos urbanos. Metodologicamente, o percurso começou com um diagnóstico da Infraestrutura Nacional de Dados Espaciais (INDE) frente aos temas globais de dados geoespaciais fundamentais da UN-GGIM, identificando lacunas de completude e atualização que fundamentaram a necessidade de incorporar novas fontes. Em seguida, desenvolveu-se um framework de código aberto capaz de processar o histórico de edições do OSM em grades regulares, extrair informações toponímicas e relacioná-las a indicadores intrínsecos de qualidade, evidenciando padrões espaciais de contribuição. Na sequência, foi criada uma abordagem de validação extrínseca com IA, baseada na análise de imagens ao nível de rua de plataformas como Mapillary e Google Street View (GSV), com técnicas de detecção de texto (YOLO) e reconhecimento óptico de caracteres (Keras-OCR) para extrair evidências visuais, consolidadas pelo Índice de Validação de Topônimos Colaborativos por Evidências Acumuladas (ICTVAE). A integração dessas etapas resultou em um processo unificado, aplicado no município de Belo Horizonte (MG), produzindo um dataset toponímico enriquecido com 937 nomes validados e demonstrando ganhos quantitativos mensuráveis, como aumento de 81,9% na cobertura de nomes em relação à base oficial. Os resultados indicam viabilidade, escalabilidade e reprodutibilidade da metodologia proposta, ainda que o trabalho também evidencia limitações importantes, incluindo cobertura desigual e heterogênea das imagens ao nível de rua, vieses e lacunas herdados da própria comunidade OSM, além de desafios inerentes à ambiguidade semântica de nomes e à heterogeneidade de fontes oficiais. Essas restrições indicam a necessidade de ajustes contextuais e reforçam que a adoção prática da metodologia deve considerar disponibilidade de dados, qualidade das fontes externas e padrões institucionais de cada país. Conclui- se que a abordagem híbrida oferece uma solução documentada e replicável para à integração segura e transparente de informações colaborativas em bases geoespaciais oficiais. Palavras-chave: Topônimos; Nomes geográficos; OpenStreetMap; VGI; Inteligência Artificial; imagens ao nível de rua.
Place names, or toponyms, are central to the identification and communication of geographic space and to cartographic products, information systems, and location- based services. However, updating official toponymic databases remains a challenge: field survey, the traditional method of collecting names in the field, is time-consuming and costly, making it difficult to maintain complete and up-to-date data in countries with large territories and limited resources, such as Brazil. This limitation results in gaps and outdated information that affect both the quality of fundamental geospatial data and its use in planning and public policy. In contrast, collaborative mapping platforms, such as OpenStreetMap (OSM), offer a continuous and dynamic flow of place names based on the local knowledge of volunteers. However, the reliable integration of this data into official products lacks systematic methods for assessing quality and semantic validity, reducing the potential for leveraging this resource. To address this scenario, this thesis proposes a hybrid methodology to enrich official geospatial databases through the prioritisation, validation, and verifiable integration of collaborative place names from OSM, combining intrinsic quality analysis with extrinsic validation supported by Artificial Intelligence (AI). The aim of this research was to develop and apply the proposed approach in a reproducible way and across different urban contexts. The methodological journey began with a diagnosis of the National Spatial Data Infrastructure (INDE) in relation to the global fundamental geospatial data themes of UN-GGIM, identifying gaps in completeness and updating that justified the need to incorporate new sources. Next, an open-source framework was developed that can process OSM edit history in regular grids, extracting toponymic information, and relating it to intrinsic quality indicators, thereby highlighting spatial contribution patterns. Subsequently, an extrinsic validation approach with AI was created, based on the analysis of street-level images from platforms such as Mapillary and Google Street View (GSV), using text detection (YOLO) and optical character recognition (Keras-OCR) to extract visual evidence, consolidated by the Collaborative Toponym Validation Index by Accumulated Evidence (ICTVAE). The integration of these steps resulted in a unified process applied in the municipality of Belo Horizonte (MG), producing an enriched toponymic dataset with 937 validated names and demonstrating clear quantitative gains, such as an 81.9% increase in name coverage compared to the official database. Although the results indicate feasibility, scalability, and reproducibility, the work also highlights important limitations, including uneven and heterogeneous coverage of images at the street level, biases and gaps inherent to the OSM community itself, as well as challenges associated with the semantic ambiguity of names and the heterogeneity of official sources. These restrictions highlight the need for contextual adjustments and underscore that the practical application of the methodology must consider data availability, the quality of external sources, and institutional standards in each country. Overall, the hybrid approach provides a documented and replicable solution for the secure and transparent integration of collaborative toponymic information into official geospatial databases. Keywords: Toponyms; Geographical names; OpenStreetMap; VGI; Artificial Intelligence; street-level images.
Place names, or toponyms, are central to the identification and communication of geographic space and to cartographic products, information systems, and location- based services. However, updating official toponymic databases remains a challenge: field survey, the traditional method of collecting names in the field, is time-consuming and costly, making it difficult to maintain complete and up-to-date data in countries with large territories and limited resources, such as Brazil. This limitation results in gaps and outdated information that affect both the quality of fundamental geospatial data and its use in planning and public policy. In contrast, collaborative mapping platforms, such as OpenStreetMap (OSM), offer a continuous and dynamic flow of place names based on the local knowledge of volunteers. However, the reliable integration of this data into official products lacks systematic methods for assessing quality and semantic validity, reducing the potential for leveraging this resource. To address this scenario, this thesis proposes a hybrid methodology to enrich official geospatial databases through the prioritisation, validation, and verifiable integration of collaborative place names from OSM, combining intrinsic quality analysis with extrinsic validation supported by Artificial Intelligence (AI). The aim of this research was to develop and apply the proposed approach in a reproducible way and across different urban contexts. The methodological journey began with a diagnosis of the National Spatial Data Infrastructure (INDE) in relation to the global fundamental geospatial data themes of UN-GGIM, identifying gaps in completeness and updating that justified the need to incorporate new sources. Next, an open-source framework was developed that can process OSM edit history in regular grids, extracting toponymic information, and relating it to intrinsic quality indicators, thereby highlighting spatial contribution patterns. Subsequently, an extrinsic validation approach with AI was created, based on the analysis of street-level images from platforms such as Mapillary and Google Street View (GSV), using text detection (YOLO) and optical character recognition (Keras-OCR) to extract visual evidence, consolidated by the Collaborative Toponym Validation Index by Accumulated Evidence (ICTVAE). The integration of these steps resulted in a unified process applied in the municipality of Belo Horizonte (MG), producing an enriched toponymic dataset with 937 validated names and demonstrating clear quantitative gains, such as an 81.9% increase in name coverage compared to the official database. Although the results indicate feasibility, scalability, and reproducibility, the work also highlights important limitations, including uneven and heterogeneous coverage of images at the street level, biases and gaps inherent to the OSM community itself, as well as challenges associated with the semantic ambiguity of names and the heterogeneity of official sources. These restrictions highlight the need for contextual adjustments and underscore that the practical application of the methodology must consider data availability, the quality of external sources, and institutional standards in each country. Overall, the hybrid approach provides a documented and replicable solution for the secure and transparent integration of collaborative toponymic information into official geospatial databases. Keywords: Toponyms; Geographical names; OpenStreetMap; VGI; Artificial Intelligence; street-level images.
Description
Citation
NUNES, Darlan Miranda. Enriquecendo bases de dados geoespaciais oficiais: uma abordagem híbrida para integração de topônimos colaborativos validados com inteligência artificial. 2025. 131 f. Tese (Doutorado em Ciências Geodésicas) - Universidade Federal do Paraná, Curitiba. 2025.
