ML for adaptive response time control
Runtime monitoring is essential to assuring timing behavior of real-time systems. Adding runtime control can be used to improve its robustness. Analytical approaches to control require difficult modeling or do not scale. We show how reinforcement learning can be used to learn an adaptive control policy, without having a detailed execution model.
A third paper by Mahshid during her impressive start in the TESTOMAT project. In this work, we try our ideas on using reinforcement learning for adaptive control of response time in real-time systems, i.e., Q-learning. Our work targets control programs running on PLCs, a typical situation in the railway domain. In this paper, we are happy to show the first experimental results. And I’m happy to finally connect a little bit with the SEAMS community!
Control programs running on PLCs are the backbones of many critical control systems in industry. These systems tend to be real-time systems, i.e., not only do they need to provide the right output – they also have to do it at the right time. Thus, controlling the worst case execution times and response times are critical activities. Carefully designed execution schedules are used to meet this goal. We show how adaptive runtime control can be used in addition to the schedulers, to cope with changing execution conditions.
Industrial control systems are often implemented in one of the languages of the IEC 61131-3 standard – the standard I developed editors and compilers for when I was with ABB. A popular language in the standard is function block diagrams, a kind of graphical programming for which the developer connects boxes. The figure below shows a simple example, in reality the program can turn quite big and complex.
In this work, we run simulation experiments to show how Q-learning works for adaptive response time control during runtime. The execution environment simulates several real-time programs using timer function blocks, and we integrated our control approach in the control scan thread. We then run two evaluation scenarios to enable response time analysis and sensitivity analysis, respectively. Furthermore, we evaluate three different action selection strategies in Q-learning, to vary how much the controller trusts previous experience vs. eagerness to explore the action space.
The results were good and we conclude that using Q-learning to learn response time control appears promising. In the simulations, we accepted a maximum deviation of at most 25 percent of the (upper bound) response time, and we set the default acceptable tolerance value to 500 ms – then we randomly injected time deviations. For all different action selection strategies, we show using our approach is better than an uncontrolled baseline.
Implications for Research
- A proof-of-concept that using Q-learning can be used for adaptive control of response time.
- We outline future work on modeling a fuzzy state space and using cooperative agents for faster learning.
Implications for Practice
- Implementing adaptive control mechanisms in the control scan thread has potential to improve real-time performance.
Mahshid Helali Moghadam, Mehrdad Saadatmand, Markus Borg, Markus Bohlin, and Björn Lisper. Adaptive Runtime Response Time Control in PLC-based Real-Time Systems using Reinforcement Learning, In Proc. of the 13th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, 2018. (link, preprint)
Timing requirements such as constraints on response-time are key characteristics of real-time systems and violations of these requirements might cause a total failure, particularly in hard real-time systems. Runtime monitoring of the system properties is of great importance to check the system status and mitigate such failures. Thus, a runtime control to preserve the system properties could improve the robustness of the system with respect to timing violations. Common control approaches may require a precise analytical model of the system which is difficult to be provided at design time. Reinforcement learning is a promising technique to provide adaptive model-free control when the environment is stochastic, and the control problem could be formulated as a Markov Decision Process. In this paper, we propose an adaptive runtime control using reinforcement learning for real-time programs based on Programmable Logic Controllers (PLCs), to meet the response time requirements. We demonstrate through multiple experiments that our approach could control the response time efficiently to satisfy the timing requirements.