High-performance implementation of ViT family of ML networks available for SoC designs
BURLINGAME, Calif.–(BUSINESS WIRE)–#Chimera–Quadric® today announced that its Chimera™ general purpose neural processing unit (GPNPU) processor intellectual property (IP) supports vision transformer (ViT) machine learning (ML) inference models. These newer ViT models are not supported by almost all NPUs currently in production, making it impractical to run ViT on many existing edge AI system-on-chip (SoC) devices.
ViT models are the latest state-of-the-art ML models for image and vision processing in embedded systems. ViTs were first described in 2021 and now represent the cutting edge of inference algorithms in edge and device silicon. ViTs repeatedly interleave MAC-heavy operations (convolutions and dense layers) with DSP/CPU centric code (Normalization, SoftMax). The general-purpose architecture of the Chimera core family intermixes integer multiply-accumulate (MAC) hardware with a general purpose 32-bit ALU functionality, which enabled Quadric to rapidly port and optimize the ViT_B transformer model.
Quadric’s Chimera family of GPNPUs blends the ML performance characteristics of a neural processing accelerator with the full C++ programmability of a modern digital signal processor (DSP). Chimera GPNPUs provide one unified architecture for ML inference plus pre-and-post processing, greatly simplifying both SoC hardware design by the semiconductor developer today and subsequent software programming months and years later by application developers.
Existing NPUs Cannot Support Transformers
Most edge silicon solutions in the market today employ heterogeneous architectures that pair convolution NPU accelerators with traditional CPU and DSP cores. The majority of NPUs in silicon today were designed three or more years ago when the ResNet family of convolution networks was state of the art, and Vision Transformers had not yet taken the AL/ML world by storm. ResNet type networks have at most one normalization layer at the beginning of a model, and one SoftMax at the end, with a long chain of convolution operations making up the bulk of the compute. ResNet models therefore map very neatly into convolution-optimized NPU accelerator cores as part of heterogenous SoC chip architectures.
The emergence of ViT networks broke the underlying assumptions of these NPUs. Mapping a ViT workload to a heterogeneous SoC would entail repeatedly moving data back and forth between NPU and DSP/CPU – 24 to 26 round trip data transfers for the base case ViT_B. The system power wasted with all those data transfers wipes out the matrix-compute efficiency gains from having the NPU in the first place.
Quadric Planned for the Future
“Back in 2018 when we conceived the Chimera GPNPU architecture, we knew the rapidly evolving nature of machine learning meant we had to build processor IP that was both matrix-performance optimized and general purpose,” stated Veerbhan Kheterpal, CEO at Quadric. “We knew something would be invented that pushed the ResNet family of networks into the history bin, but we did not know that transformers would be the current winner. And we do not know what state of the art will look like in 2027, but we do know that Chimera GPNPU licensees will be ready to tackle that next challenge.”
Quadric DevStudio Allows Rapid Exploration of ViT Performance
Quadric’s online DevStudio hosts an example ViT_B implementation that can be examined by interested potential licensees or application developers. Quadric DevStudio speeds software development with the industry’s first integrated ML plus DSP development system.
Quadric DevStudio is available today for a limited set of beta users. Visit studio.quadric.io to request access. For more information and details on the Chimera architecture visit the www.quadric.io website.
Quadric Inc. is the leading licensor of general-purpose neural processor IP (GPNPU) that runs both machine learning inference workloads and classic DSP and control algorithms. Quadric’s unified hardware and software architecture is optimized for on-device ML inference. Learn more at www.quadric.io.