Testing AD/ADAS in Simulators – Results from Two Tool Competitions

Automotive testing in simulators remains a hot topic in academia. Following up on our work on search-based software testing in PreScan and ESI Pro-SiVIC, we entered two tool competitions to test the generalizability of the approach. First, we generated test cases in BeamNG for a lane-keeping assist system representing variation in road topology. Second, we tested the robustness of Baidu Apollo’s pedestrian detection system in SVL.

We competed in two different teams. In both teams, the final-year PhD student Mahshid Helali Moghadam drove the development from the RISE side. The first competition was the Cyber-physical systems (CPS) testing competition at SBST21 organized in collaboration with the BeamNG research team. Mahshid and I worked together with Seyed Jalaleddin Mousavirad at the Hakim Sabzevari University in Iran. Motivated by good results, we then teamed up with Hamid Ebadi (Infotiv) and Gregory Gay and Afonso Alves (Chalmers) for the 2021 IEEE Autonomous Driving AI Test Challenge. The task was to devise novel ways to test Baidu Apollo in SVL.

In both competitions, we relied on generating fault-provoking test cases using search-based software testing using NSGA-II. As in our previous work, we target a handful number of test parameters and explore the possible input space efficiently and effectively – better than random testing and a systematic grid search. If we can provoke the system under test into making faults, this can turn into meaningful information for an underlying development organization. This can then initiate hardening cycles, i.e., efforts to improve the system.

Generating curvature that makes the car leave the lane

The task of the SBST21 competition was to generate roads that a lane-keeping assist ADAS couldn’t cope with. There were a number of restrictions on the road topology – many strange roads would simply result in invalid input, e.g., self-intersections. It wasn’t as simple as generating input with maximum input either – the car would then simply lower the speed substantially and then slowly follow the road. One could not simply guess what input would make the slid out of the lane.

We used an objective function with two components. First, we aim to maximize the distance between the car and the center of the lane. Second, our test generation process should generate as long test roads as possible. Two major advantages of our solution were that it pre-checked whether the road topology was valid before running any test cases and that we had crafted a high-quality population seed to speed up the search. Our solution Deeper generated the second highest number of test cases within the time budget in the competition. Deeper also generated the highest fraction of valid test cases. On the other hand, Deeper didn’t outperform the competition in terms of test case diversity – a quality measure you typically want to maximize.

Good lane keeping.
Sliding off the road – failed lane keeping.

Generating conditions under which crossing pedestrians are missed

The AI test challenge was way more open with a set of evaluation criteria from which a jury selected the winners. We got a selection of driving tasks that Baidu Apollo could perform and we decided to pick the emergency braking when a pedestrian crosses the road. This is also the scenario we worked with in PreScan and ESI Pro-SiVIC and the focus of our SMIRK development.

Our solution approach was a bit different compared to our previous work. We specified a fixed path for the pedestrian based on waypoints and entered “normal weather conditions.” Then, we added a noise vector to the base scenario, representing variation in, e.g., the coordinates of the waypoints, the time of day, and the level of rain. Finally, we defined an objective function that sought to minimize the distance between the car and the pedestrian, maximize the distance the car travels, and maximize the number of accidents. Then we used NSGA-II to search the input space represented by a set of ranges for the noise vector. Below are some figures that illustrate our approach. We didn’t finish on the podium, but our contribution was selected for a full paper presentation at the AI Test conference.

Waypoints specifying a pedestrian’s path.
Missing a pedestrian during a rainy night.

What did we learn from taking part in two different competitions? Quite a bit. It motivated us (forced us…) to learn two additional simulators: BeamNG and SVL. Also, we got the chance to check how the search-based approach to test case generation generalizes to new problems without too much trouble.

Implications for Practice

  • Simulation-based testing of AD/ADAS is not necessarily as deterministic and repeatable as a tester would like. In our case, the experimental setup with Baidu Apolla and SVL did not provide reproducible results given the same test input.
  • The simulators we worked with evolved fast. Things change, often faster than the accompanying documentation.

Implications for Research

  • SBST is a valid approach to generate effective test cases in different simulators.
  • It requires some effort to learn a new simulator. We have now published results on automotive testing based on four simulators – now we can start talking about generalizability.

References:

Helali Moghadam, Borg, and Jalaleddin Mousavirad. Deeper at the SBST 2021 Tool Competition: ADAS Testing Using Multi-Objective Search, In Proc. of the 14th International Workshop on Search-Based Software Testing, 2021. (link)