DARPA asks industry for SWaP-optimized machine learning real-time ASICs able to learn from data


U.S. military researchers are asking industry to develop real time machine learning hardware able to interpret and learn from data, solve unfamiliar problems using what it has learned, and operate at power levels on par or better than the human brain.



Officials of the U.S. Defense Advanced Research Projects Agency (DARPA) in Arlington, Va., have released a broad agency announcement (HR001119S0037) for the Real Time Machine Learning (RTML) project, which will develop machine-learning hardware generators and circuit architectures.

Driven by rapidly evolving challenges from U.S. adversaries, future defense systems will need access to low size, weight, and power (SWaP) artificial intelligence (AI) solutions that can switch rapidly idea to practice.

Still, today's machine learning systems generally are trained prior to deployment and cannot adapt to new datasets in the field, limiting real-time function. The RTML seeks to develop algorithms and circuits from the ground up for real time machine learning.


The project seeks to develop energy-efficient hardware and machine learning architectures that can learn from a continuous stream of new data in real time. It will create no-human-in-the-loop hardware generators and compilers to enable automated creation of machine learning application-specific integrated circuits (ASICs) from high level source code.

DARPA researchers especially are interested in architectures like conventional feed forward (convolutional) neural networks; recurrent networks and their specialized versions; neuroscience-inspired architectures, such as spike time-dependent neural nets including their stochastic counterparts; non-neural machine learning architectures inspired by psychophysics as well as statistical techniques; classical supervised learning (e.g., regression and decision trees); unsupervised learning (e.g., clustering) approaches; semi-supervised learning methods; generative adversarial learning techniques, and other approaches such as transfer learning, reinforcement learning, manifold learning, and/or life-long learning.[Native Advertisement]

It is likely that ultra-specialized ASICs will be necessary to meet the physical SWaP requirements of autonomous systems with real-time response and low learning latency requirements. Unfortunately, the high cost of design and implementation today has made development of machine learning ASICs impractical for all but the highest-volume applications.


Complex machine learning processor chips take months or years to design, and require a large team of experts with knowledge in machine learning, low-level micro-architectures, and physical chip design. The complexity challenge of modern ASIC design is under investigation in DARPA’s Intelligent Design of Electronic Assets (IDEA), Posh Open Source Hardware (POSH), and Circuit Realization at Faster Timescales (CRAFT) programs.

The DARPA RTML program will capitalize these approaches by creating no-human-in-the-loop hardware generators optimized for machine learning, and enable automated generation of machine-learning ASICs directly from high-level source code. The RTML program is split into two 18-month research phases: machine learning hardware compiler; and real-time machine learning systems.

The first segment will create automated hardware compilers for state-of-the-art machine learning algorithms and networks using existing machine learning programming frameworks as inputs. The goal is to demonstrate a compiler capable of auto-generating a large catalog of scalable machine learning hardware instances ranked by performance, size, weight, area, power, throughput, and latency.

The second segment will incorporate state-of-the-art machine learning advances while adding compiler support for hardware optimization driven by system requirements.


The RTML seeks to answer these questions: can we build an application-specific silicon compiler for machine learning; what hardware architectures are best suited to RTML; what are the lower latency limits for various RTML tasks; and what is the lowest SWaP feasible for various RTML tasks?

Unwanted are research that does not result in deliverable hardware; circuits that cannot be produced in standards CMOS foundries; new domain-specific languages; new approaches to physical layout; and incremental efforts.