Home

Awesome

Control Industrial Automation System with Large Language Models

This repository contains detailed information and video demonstration accompanying the paper titled "Control Industrial Automation System with Large Language Models", submitted to IEEE ICRA 2025.

A preprint of this paper is available on arXiv.

Y. Xia, N. Jazdi, J. Zhang, C. Shah and M. Weyrich, Control Industrial Automation System with Large Language Models, 2024, arXiv preprint. https://doi.org/10.48550/arXiv.2409.18009

Potential Use Case Demonstration:

In one of our previous works, we have demostrated the use case of user interacting with an automation system enhanced by autonomous LLM agents through natural language commands.

30s_GPT4Automation

https://github.com/user-attachments/assets/74fb9f78-4511-47de-93e5-3228345fa9e3

This full video also available on Youtube: Control Automation System with Large Language Models (2:57) https://youtu.be/GhBoxGfjRIE

The System Design

In this new work, we present a refined system design with more comprehensive testing and model fine-tuning. system_design

Prototypical Implementation in Laboratory Environment

lab_demo_4_scenes

Event-Based Control:

event_based_control

Prompt design:

prompt_design For a simple experiment, readers can paste this prompt example prompt_example.txt to an arbitrary LLM to test the output.

Model Fine-Tuning:

We apply supervised fine-tuning (SFT) to assess how training open-source models on the collected dataset improve the LLM’s performance for this specific downstream task. This training has the potential to enable the customization of a general LLM for intelligent control of specialized automation equipment. For GPT-4o, we use OpenAI’s proprietary fine-tuning API to explore the potential capabilities of LLMs, even though the training methods could be different.

Full fine-tuning (SFT): Llama-3-70B-Instruct, Llama-3-8B-Instruct, Qwen2-7B-Instruct, Mistral-7Bx8-Instruct-v0.2, Mistral-7B-Instruct-v0.2

Epochs: 1; Learning Rate: 1e-5; Batch Size: 16; Training Data Size: 0.2 million tokens

LoRA fine-tuning (SFT): Qwen2-72B-Instruct

LoRA-Rank: 32; LoRA-alpha: 32; Epochs: 1; Learning Rate: 1e-5; Batch Size: 16; Training Token Size: 0.2 million tokens

OpenAI’s API fine-tuning: GPT-4o;

Training Data Size: 0.2 million tokens

Evaluation Results:

Evaluation of pre-trained LLM

We begin by evaluating the original pre-trained models. In automation tasks, 1) some are routine processes where the LLM agent can follow SOP guidelines in agent prompts to operate the automation system, while 2) others require the agent to autonomously respond to unexpected events, for which reactions have not been instructed in agent prompts. We distinguish between these two types of tasks in our evaluation. Based on the evaluation results (the first 3 rows), GPT-4 generally outperforms other open-source models in interpreting agent prompts and events to generate control commands. Their performance varies significantly, and each model also exhibits distinct “personalities” in this use case.

Evaluation of post-trained LLM based on created dataset

Using the collected dataset, we apply supervised fine-tuning (SFT) to assess how training open-source models can improve the LLM’s performance for this specific downstream task (as indicated by the last three rows of data). This training has the potential to enable the customization of a general LLM for intelligent control of specialized automation equipment. For GPT-4o, we used OpenAI’s proprietary fine-tuning API to explore the capabilities that LLMs can achieve, even though the training methods may vary.

Evaluation based on 100 test pointsGPT-4oLlama-3-70B-InstructLlama-3-8B-InstructQwen2-72B-InstructQwen2-7B-InstructMistral-7Bx8-Instruct-v0.2Mistral-7B-Instruct-v0.2
Pre-trained (all)81% | 4.775% | 4.337% | 2.870% | 4.065 | 3.729% | 2.445% | 2.9
Pre-trained (SOP)100% | 5.087% | 4.553% | 3.185% | 4.563% | 3.634% | 2.437% | 2.5
Pre-trained (Unexpected)41% | 4.050% | 3.83% | 2.238% | 3.069% | 4.019% | 2.363% | 3.7
SFT (all)100% | 5.095% | 4.896% | 4.9* 66% | 3.997% | 4.945% | 3.1**N.A.
SFT (SOP)100% | 5.094% | 4.899% | 4.9* 82% | 4.497% | 4.961% | 3.6**N.A.
SFT (Unexpected)100% | 5.097% | 4.991% | 4.7* 31% | 2.897% | 5.09% | 2.3**N.A.

Notes:

Insights

OpenAI’s model and fine-tuning services outperform other models, and the GPT-4o model quickly learns from the samples how to control the automation systems. Other models also demonstrated reasonably good performance. Interestingly, fine-tuned smaller LLMs did not necessarily underperform in this particular use case. However, our contingency LoRA fine-tuning yielded poor results in our experiments and led to a decrease in model performance.

Technology Readiness Level (TRL)

nasa_trl_meter

Other related papers

This research is a continuation of previous works:

Y. Xia, M. Shenoy, N. Jazdi and M. Weyrich, Towards autonomous system: flexible modular production system enhanced with large language model agents, 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA), Sinaia, Romania, 2023, pp. 1-8, doi: 10.1109/ETFA54631.2023.10275362.

For a similar topic on LLM agent system and simulation model, one of our papers was acknowledged with the Best Paper Award at IEEE ETFA 2024, held on September 10-13, 2024 in Padova, Italy. A preprint of this paper is available on arXiv.

Y. Xia, D. Dittler, N. Jazdi, H. Chen and M. Weyrich, LLM experiments with simulation: Large Language Model Multi-Agent System for Process Simulation Parametrization in Digital Twins, 2024, arXiv preprint, https://doi.org/10.48550/arXiv.2405.18092.

Other similar works can be found at Google Scholar: https://scholar.google.de/citations?user=hi1srxkAAAAJ