Testing in simulators is an essential activity in automotive software engineering. But what happens if you execute the same test scenario in another independent simulator? Do you get the same results? Do you learn the same details about your system under test? This study explores these questions in detail.
Testing in simulation is an essential activity in AD/ADAS development. It is both efficient and effective compared to on-road testing. It also enables engineers to start testing earlier. Furthermore, when working on ADAS with machine learning-based functionality, the experimental nature of data science makes repeated testing in simulators crucial. V&V in simulators indeed gets considerable attention in the evolving SOTIF standard.
In the last years, we have worked with a handful of automotive simulators, such as TASS/Siemens PreScan, ESI Pro-SiVIC, CARLA, and BeamNG. You quickly realize that they have different pros and cons. Some offer fantastic photorealism, and others provide an incredibly detailed simulation of the sensor internals. No simulator is the obvious best choice for all applications. There is also rapid development in the highly competitive simulator market.
The jungle of simulators left us wondering… do you learn the same things about your ADAS if you test it in different simulators? How similar are the digital models? Would testers draw the same conclusions if you replace the simulator? We introduce the term cross-simulator reproduction (X-sim reproduction) and ask the overall question “Does ADAS testing generalize x-sim?” We present a study comparing ADAS testing of a pedestrian detection system in two industry-grade simulators: PreScan and Pro-SiVIC.
X-sim Reproduction in Luxembourg
Everything started with me reading interesting work on ADAS testing by the SnT center in Luxembourg. My wife was on maternity leave with our youngest and we wanted to grab the opportunity to go somewhere and experience something new. I reached out to Luxembourg, and we started planning for a replication of the study Ben Abdessalem et al. (2018). Our family rented an apartment for two months and the family drove south in the late summer of 2019. Together with the Luxembourg team, I started working on reproducing the PreScan testing in Pro-SiVIC.
Just like in the original study, the ADAS under test is PeVi. PeVi is a pedestrian detection system provided by one of SnT’s industrial partners in Luxembourg. PeVi is designed to alert the driver if a pedestrian enters specified warning areas in front of the vehicle. Especially this Acute Warning Area (AWA) highlighted in blue in the figure on top of this page.
From a bird’s eye perspective, the system works like this. PeVi uses forward-facing sensors, i.e., a radar and a mono-camera. The radar unit provides object tracking with relative speeds. These measures go into a logic unit that calculates the time to collision (TTC) for the detected objects. If any object has a TTC of less than 4 s – potentially an imminent collision, then the camera input is checked to see if the object in question is a pedestrian. One might wonder if the driver shouldn’t be warned if there is an elephant or something else on the road? The answer is that an automated vehicle has a set of complementary ADAS systems. For many vehicle manufacturers, large animal collision would be detected by a separate system. For safety assurance purposes, it might be wise to treat human beings separately.
Porting the original study from PreScan to Pro-SiVIC sounds simple, perhaps, but required quite some engineering effort. PeVi is implemented in Simulink. PreScan is internally Simulink-based. Thus, the integration was straightforward. Pro-SiVIC doesn’t use Simulink internally. Communication from Pro-SiVIC is instead based on the protocol OpenDDS. In our ported solution, we use asynchronous broadcasting from Pro-SiVIC. This and other engineering decisions can be further explored in the GitHub repo.
Reproducing the Search-based software testing
The original study in PreScan focused on testing PeVis using a minimalistic traffic scenario. In a nutshell, we have excellent driving conditions, a straight and flat road, and a pedestrian crossing from the right. First, we reproduced the static environment in Pro-SiVIC. As the figure below shows, it looks pretty similar, but there are some differences. The Pro-SiVIC skydome is slightly different. There are mountains on the Pro-SiVIC horizon, there is a dirt shoulder, and the default pedestrian is female. The lady also runs like a (high-heeled) sprinter rather than the sailor with swinging arms in PreScan.
As in the original study, a test case is characterized by five test parameters. We translated inputs for the specific coordinate system used in PreScan to Pro-SiVIC. The five parameters are:
- Starting x of the pedestrian
- Starting y of the pedestrian
- Orientation of the pedestrian
- Speed of the pedestrian
- Speed of the car
We also reproduced the search-based testing of the original study. All details can be found in the repo, but it basically involves NSGA-2 with a 150 min search budget. We are looking for test inputs that minimize three fitness functions – when there is a crash or a near miss (<1 m), we call it a critical scenario. When PeVi does not detect the pedestrian in a critical scenario, we consider it a safety violation and also refer to it as an unsafe scenario.
- FF1 – the distance between the car and the pedestrian
- FF2 – the distance between the acute warning area and the pedestrian
- FF3 – the time-to-collision
The first research question we investigated acts like a sanity check. Is SBST an effective approach to testing PeVi also if we replace PreScan with Pro-SiVIC? We used the same SBST setup to generate 400 scenarios each in PreScan and Pro-SiVIC. All 800 scenarios ended up as critical, and roughly 60% of them resulted in safety violations – the same fractions for both simulators. Furthermore, the measures on the SBST Pareto fronts suggest solutions with similar qualities. The sanity check passed. The principal SBST findings from PreScan can be reproduced in Pro-SiVIC.
RQ2: Do we learn the same things about PeVi?
The second RQ focuses on whether we obtain the same information about PeVi when running SBST in PreScan and Pro-SiVIC. We study this by plotting the inputs that lead to safety violations. We then train a decision tree classifier to explain the differences. Looking at the plots first, the results are quite overwhelming. There appear to be safety violations (red dots) all over the input space. Some input intervals stand out though. First, we see that PeVi fails when the car moves 20 m/s or faster in both simulators. For lower speeds, we also see that SBST didn’t converge into critical scenarios when testing in PreScan.
The figure below shows decision trees for PreScan and Pro-SiVIC. In a decision tree, the most decisive feature ends up in the root of the tree. For PreScan, we find the speed of the car in the root. If the car drives faster than 18.92 m/s, PeVi always fails. For lower speeds, the orientation of the pedestrian (the theta angle) matters the most. The Pro-SiVIC decision tree looks different as the orientation is the most decisive feature. The rest really matters less.
The answer to RQ2 is that the characteristics of safety violations differ between the simulators. The only obvious agreement is that PeVi fails for high vehicle speeds – apart from that we learn about different PeVi weaknesses when testing in PreScan and Pro-SiVIC.
RQ3: X-sim reproduction of critical scenarios
Finally, we studied what happens when reproducing critical scenarios in the other simulator. We wanted to know if they remained critical. Would the same unsafe scenarios in Simulator X by unsafe also in Simulator Y? The results can be a bit hard to digest, but the figure should help.
In PreScan we had 229 unsafe scenarios. When reproducing those scenarios in Pro-SiVIC only 78 remain unsafe. Many end up as safe, typically because the distances change but also because PeVi worked. On the other hand, out of the 171 safe scenarios, 38 are now unsafe in Pro-SiVIC – because PeVi fails.
In Pro-SiVIC, we had 236 unsafe scenarios. Reproducing these in PreScan gave us 212 unsafe scenarios – 24 are no longer critical. Out of the 164 safe scenarios, none of them are unsafe in PreScan.
What did we learn from this? Our answer to RQ3 is that x-sim reproduction can change most things. There are both differences in the minimum distances (in time and space) between the pedestrian and the car. There are also variations in when PeVi works and when it fails.
Implications for Practice
- System testing in different simulators can give quite different results – also for a minimalistic scene.
- When testing an ADAS, make sure the results generalize to another simulator before initiating on-road testing.
- Simulator license costs can be substantial. Staff training leads to vendor lock-in effects. Nevertheless, companies should try to diversify their simulator portfolios.
Implications for Research
- The concept of cross-simulator reproductions is important and deserves research.
- When applying SBST, fitness functions that depend on the internals of simulators are hard to transfer to other simulators. Investigate their generalizability first.
- Researchers should ideally learn to use more than one simulator.
Markus Borg, Raja Ben Abdessalem, Shiva Nejati, Francois-Xavier Jegeden, Donghwan Shin. Digital Twins Are Not Monozygotic - Cross-Replicating ADAS Testing in Two Industry-Grade Automotive Simulators, In Proc. of the 14th International Conference on Software Testing, Verification and Validation, (ICST), 2021. (preprint, code, presentation)
The increasing levels of software- and data-intensive driving automation call for an evolution of automotive software testing. As a recommended practice of the Verification and Validation (V&V) process of ISO/PAS 21448, a candidate standard for safety of the intended functionality for road vehicles, simulation-based testing has the potential to reduce both risks and costs. There is a growing body of research on devising test automation techniques using simulators for Advanced Driver-Assistance Systems (ADAS). However, how similar are the results if the same test scenarios are executed in different simulators? We conduct a replication study of applying a Search-Based Software Testing (SBST) solution to a real-world ADAS (PeVi, a pedestrian vision detection system) using two different commercial simulators, namely, TASS/Siemens PreScan and ESI Pro-SiVIC. Based on a minimalistic scene, we compare critical test scenarios generated using our SBST solution in these two simulators. We show that SBST can be used to effectively generate critical test scenarios in both simulators, and the test results obtained from the two simulators can reveal several weaknesses of the ADAS under test. However, executing the same test scenarios in the two simulators leads to notable differences in the details of the test outputs, in particular, related to (1) safety violations revealed by tests, and (2) dynamics of cars and pedestrians. Based on our findings, we recommend future V&V plans to include multiple simulators to support robust simulation-based testing and to base test objectives on measures that are less dependant on the internals of the simulators.