Empowering Robotics with Large Language Models: osmAG Map Comprehension with LLMs
CoRR(2024)
Abstract
Recently, Large Language Models (LLMs) have demonstrated great potential in
robotic applications by providing essential general knowledge for situations
that can not be pre-programmed beforehand. Generally speaking, mobile robots
need to understand maps to execute tasks such as localization or navigation. In
this letter, we address the problem of enabling LLMs to comprehend Area Graph,
a text-based map representation, in order to enhance their applicability in the
field of mobile robotics. Area Graph is a hierarchical, topometric semantic map
representation utilizing polygons to demark areas such as rooms, corridors or
buildings. In contrast to commonly used map representations, such as occupancy
grid maps or point clouds, osmAG (Area Graph in OpensStreetMap format) is
stored in a XML textual format naturally readable by LLMs. Furthermore,
conventional robotic algorithms such as localization and path planning are
compatible with osmAG, facilitating this map representation comprehensible by
LLMs, traditional robotic algorithms and humans. Our experiments show that with
a proper map representation, LLMs possess the capability to understand maps and
answer queries based on that understanding. Following simple fine-tuning of
LLaMA2 models, it surpassed ChatGPT-3.5 in tasks involving topology and
hierarchy understanding. Our dataset, dataset generation code, fine-tuned LoRA
adapters can be accessed at
https://github.com/xiefujing/LLM-osmAG-Comprehension.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined