Using AirSim to develop autonomous driving algorithms

Since we have participated in more traditional robotics competitions than you can imagine, we wanted to do something special for our 100th prize. As I had recently finished the CS231N course, I felt really eager to apply machine learning algorithms to robotic tasks.

The competition

Electromobility is a six-month long autonomous driving competition organised by Continental Automotive in Iasi, Romania. The challenge is to build an autonomous 1:10 scale RC vehicle which has the ability to drive itself, recognise traffic signs and be controllable from a smartphone application. Practically this means hacking together an unpolished and unsafe version of a mini Tesla.

HBFS car autonomous RC 1:10 scale vehicle
RC Car (Traxxas Slash) featuring NVIDIA Jetson TX2

Camera settings

In the qualification round we observed that the track surface was somewhat reflective, messing up the buggy line detector algorithm that I had programmed. Moreover, using the cheapest webcams available did not help at all.

electromobility track surface autonomous car competition
Competition track surface

Once findind out in the OBS Studio settings that Microsoft Lifecam 3000 webcams actually support manual exposure control, I have used v4l-ctl on Linux to adjust their corresponding parameters. This helped our three cameras distinguish something more than reflections.

v4l-ctl -d /dev/video1 --list-ctrls
v4l2-ctl -d /dev/video1 -c exposure_auto=1,brightness=30,saturation=40,exposure_absolute=39,contrast=4

Camera positioning

We have decided to mount our three cameras in a V configuration on our chassis to offer maximum coverage of the track, especially on the sides and in front of the car. The cameras are tilted at around 30 degrees, and the complete image is combined after applying inverse perspective mapping in openCV for each of the cameras.

three camera configuration top view
Camera configuration – inverse perspective mapping (AirSim data)

Solution overview

Our solution for this year’s competition uses a segmentation model to process the noisy camera inputs and output a cleaner segmentation map from which we can easily derive the steering commands for the vehicle by applying simple algorithms. Since the competition track was considerably different from publicly available self driving car datasets, it quickly became more interesting to explore the latest techniques for generating training data.

AirSim is an add-on for Unreal Engine which has many uses in computer vision tasks, deep learning and reinforcement learning. I really appreciate what the folks at Microsoft and their contributors have done by abstracting away all the intricate math required to generate RGBD, LiDAR and segmentation data. The epic visuals of UE4 and AirSim’s Python and ROS integration truly make it a swiss army knife for machine learning tasks.

car control algorithm in both simulated environment and real car
HBFS Car steering algorithm pipeline

All the code for the steering algorithm was developed in Python for portability. For curve fitting we have used weighted 2nd order polynomials. Steering commands are calculated from the lane angle and lateral offset using the formulas from this paper.

Track reconstruction

Since the track dimensions were detailed on the competition page, I have reconstructed each separate interest class as a FBX 3D model using both Blender and Unreal Editor. Track reflections were recreated by enhancing the road texture with normal maps.

unreal editor track reconstruction results
Reconstructed track in Unreal Editor

Once the map was built, I have included the AirSim addon in the project to be able to use the car vehicle and position the cameras on it.

Training data was recorded from AirSim using the simGetImages API function and applying the first pipeline processing stage, since I needed the resulting images to contain perspective mapping and blending artifacts. As so, a training pair would contain the following classes described below.

segmentation network classes and labels

Convolutional segmentation model

The autoencoder network architecture is inspired by U-Net, the most significant difference being the reduced number of layers and filters for faster inference time on embedded hardware like the NVIDIA Jetson TX2.

The advantages of applying a segmentation model are that you completely retain control the commands for steering, instead of relying on the network to do so, like in this the end-to-end AirSim cookbook example. The primary disadvantage is that some form of algorithm needs to be implemented in order to filter the output of the network and ignore erroneous inputs. The simulator helped tremendously in developing and testing the complete solution.

hbfs autonomous car segmentation network
HBFS Car – Convolutional network architecture

Around 5000 training examples were recorded in more than one hour of driving. This will definitely run much faster if you use something more capable than a GTX1050 based laptop. These samples were then “cooked” into a HDF5 binary file for faster access during training, resulting in more than 6GB of raw data. Some of the techniques used here to improve performance:

  • Recorded data making use of weather effects in AirSim
  • Added multiple light sources to recreate even more reflections
  • Dataset sample is 144×144 to allow for random crops to 128×128
  • Data augmentation through random flips, brightness and contrast changes – applied by extending keras DataGenerators
  • Input normalization was aplied along each RGB channel with a minimum variance threshold – outlier samples having only one visible class and low variance(track intersection) would have their camera noise amplified

Once the final architecture was completed, training the model took approximately 44 minutes per one hundred epochs with batch sizes of 64 examples.


Training with a lot of disturbances was essential for our segmentation network to perform well in real life scenarios. Having a method of recreating these artifacts in the simulation proved really helpful for both dataset generation and end-to-end algorithm development and testing. The inference ended up taking 10~12ms on the embedded hardware, which is pretty neat considering that the only optimizations applied were to freeze the graph and remove the training nodes.

This proved to be a very successful endeavor, since our team won the first prize in this competition. As expected, we encountered some situations when the sunlight was messing up the images, leaving about 20% usable data. The model was having a bad time, but I cannot blame it since I was barely able to distinguish the track lines as well. Overall, using these kinds of machine learning techniques reduced the development effort by alot and delivered great results when dealing with reasonable noise. Moreover, we kept using the same potato cameras.

Electromobility continental winner team - HBFS

In the weeks to come we will be working to publish most of the code and tools we have used to make this happen, so stay tuned and thanks for reading.