Start Over

How to deploy AI software to self driving cars

Authors :: Meenakshi Ravindran
Rod Burns
Nicolas Miller
Gordon Brown
Source :: IWOCL
Publication Year :: 2019
Publisher :: ACM, 2019.
Abstract: The automotive industry is embracing new challenges to deliver self-driving cars, and this in turn requires increasingly complex hardware and software. Software developers are leveraging artificial intelligence, and in particular machine learning, to deliver the capabilities required for an autonomous vehicle to operate. This has driven automotive systems to become increasingly heterogeneous offering multi-core processors and custom co-processors capable of performing the intense algorithms required for artificial intelligence and machine learning. These new processors can be used to vastly speed up common operations used in AI (Artificial Intelligence) and machine learning algorithms.The R-Car V3H system-on-chip (SoC) from the Renesas AutonomyâDc platform for ADAS (Advanced Driver Assistance Systems) and automated driving supports Level 3 and above (as defined by SAE's automation level definitions). It follows the heterogeneous IP concept of the Renesas Autonomy platformâDc, giving the developer the choice of high performance computer vision at low power consumption, as well as flexibility to implement the latest algorithms such as those used in machine learning.By examining the architecture of the R-Car hardware we can understand how this differs from HPC and desktop heterogeneous systems, and how this can be mapped to the SYCL and OpenCL programming models. When both power consumption and performance are important, as is the case in the automotive industry, the focus for implementing OpenCL and SYCL on these hardware platforms must be a balanced approach. The memory capacity and layout must be used in the most optimum way to build a pipeline that provides the best throughput. The R-Car hardware provides DMA and on-chip memory where these are used to facilitate efficient data transfer on the device. The memory hierarchy layers can be seen on how it is efficiently mapped to OpenCL paradigm.The R-Car hardware also offers many fixed function IP blocks, each performing a specific function like convolution for deep neural networks, optical flow and more, beyond the programmable processor. The flexibility of OpenCL enables the development of built in kernels so that developers can take advantage of these architecture designs.The OpenCL model enables extensive usage of the heterogenous hardware, including fully programmable IP, efficient data transfer using the DMA to the on-chip device memory via OpenCL extension, and fixed function IP block, such as CNN for enabling high throughput convolution operations, via OpenCL builtin kernels, device triggered DMA using partial subgroups. We examine the memory mapping to bring in efficiency and the software pipelining & parallelism. These hardware architectures include AI accelerator processors specifically designed to be used in the next generation of vehicles. In particular, the processors are designed to tackle complex algorithms whilst limiting the overall consumption of power. Benchmarks will be presented to show how portable code can also deliver performance for developers using this hardware.As well as enabling developers to choose OpenCL or SYCL, we will talk about how these standards enable additional high-level frameworks that can be used to target this hardware. These include libraries for deep neural networks and linear algebra operations.