More Synthetic Data Needed to Improve AI



Developers of artificial intelligence platforms have promised that such systems will revolutionize the future of warfare, but experts and analysts said that more synthetic data is needed for that promise to be realized.

Pedro Rodriguez, a senior research scientist at the Johns Hopkins University Applied Physics Laboratory, said artificial intelligence and machine learning are being used extensively to analyze full-motion video, but the quality of the data is not as high as many would expect. Oftentimes they consist of low-resolution images and obstructions such as clouds.

“There is clearly a role for simulated data to be played in this environment,” he said during a panel discussion at a defense and security conference hosted by the Association of Unmanned Vehicle Systems International in National Harbor, Maryland.

Gregory Allen, an adjunct fellow at the Center for a New American Security’s Technology and National Security Program, said artificial intelligence platforms will need large amounts of synthetic data in order to improve their functionality.

“The best machine learning and AI algorithms that we have today — the ones with the most amazing super human performance — are all incredibly data hungry,” he said. “They require large data sets upon which to train the algorithm in order to have a high-performance system that you can deliver.”

Artificial intelligence systems can only function based on what they know and have been exposed to, he noted.

“If you train your AI system to analyze drone video data at one altitude and one level of cloudiness, then it is going to have very high performance at that altitude and that level of cloudiness,” he said.

“But if that happens to be different from the operational environment you encounter on the field, then the performance is going to fall off a cliff.”

There is a need to create synthetic information that can provide a system with a variety of environments it will be expected to encounter in the field, he said.

An artificial intelligence system can generate its own synthetic data, Allen noted.

For example, AlphaGo — an AI system developed by Google DeepMind — was able to beat the world’s Go champion, he said. Go is a two-player board game.

“A large portion of the data set upon which that system was trained was generated through self-play,” he said. “The system was playing the Go game against itself and then feeding that into the total library data corpus of games upon which it was developing its AI algorithm.” A corpus is a collection of information.

"AI does not lend itself naturally to the Pentagon’s development and procurement cycle."

That type of self-play is applicable for the warfare environment as well, Allen said. By using drone footage “you can develop an algorithm to develop modifications to that data set — say increasing the cloudiness, increasing the altitude and then you can analyze that data and add that to the total library,” he said. “If you do this, you can increase the performance of the overall system.”

Jaz Banga, co-founder and CEO of Airspace Systems Inc., a Silicon Valley startup that develops counter-drone systems, said there are still obstacles to developing those data sets.

“One of the challenges going forward is going to be providing those data sets to the folks that are building … improvements to the algorithms,” he said. “There’s a loop that has to develop there and that’s not there now.”

Industry should come up with an artificial intelligence strategy that puts a primary emphasis on data, Allen said. It would also be beneficial for companies to consider carving out a niche area within the technology.

“If you’re thinking about how to develop your own AI strategy … think about a data set where you could develop a real competitive advantage in analyzing and understanding that data set and translate it into AI systems,” Allen said.

For now, only certain aspects of artificial intelligence can be deployed in warfare, Allen noted. Silicon Valley companies right now are using the technology for logistics operations.

“When it comes to something like that, I think there’s relatively low stakes in deploying our AI system right away,” he said. “When it comes to something closer to the actual kill decision, the stakes are much higher and we’re going to need more confidence.”

However, AI does not lend itself naturally to the Pentagon’s development and procurement cycle, he added.

“If development stops when operational deployment starts — which is traditional in DoD procurement — then you’re probably going to have a pretty bad, low-quality system,” Allen said. “If you’re not iterating the development of that system based on what you are learning on the field in real time, then you’re probably going to be obsolete before you even deploy.”

Rodriguez said there are low-risk ways to deploy artificial intelligence and deep learning systems right now. Instead of integrating AI into individual platforms, industry could develop small, lightweight devices that could be attached to the platform to perform the analysis tasks. That way, a company wouldn’t have to completely remanufacture its entire deployed system, he noted.

Dorothy Engelhardt, director of programs for unmanned systems at the office of the deputy assistant secretary of the Navy, said there is substantial investment globally in artificial intelligence, including in Russia, China and India.

The U.S. government needs industry to help it utilize artificial intelligence more effectively, she said. Pentagon weapon platforms employ thousands of sensors that could offer companies an abundance of data to sift through, she added.

nationaldefensemagazine.org