How to Implement high performance power efficient Deep Learning on FPGAs
Field Programmable Gate Arrays (FPGAs) are a compelling choice for hardware acceleration on the edge especially when adding newer capabilities for machine learning inference. Specialized neural networks called Convolutional Neural Networks (CNN) are being deployed on the edge in embedded vision systems to perform tasks such as object detection, face and gesture recognition and pose estimation. FPGA architecture provides a unique set of features to satisfy the high computational complexity requirements along with sufficient memory access (via local and external memories) to realize CNNs efficiently. Their inherent programmability provides the flexibility to integrate and upgrade customized functions on a single device.
In this paper we share details about how Microchip’s programmable hardware along with the Core Deep Learning (CDL) framework from ASIC Design Services enable a power efficient imaging and video solution platform for embedded and edge computing applications. The techniques include quantization of the CNN at 8-bit integer precision, neural network optimization based on the underlying FPGA architecture and the INT8 dot product mode of the Math block to efficiently deploy Microchip FPGAs for machine learning inference.