Google Public Sector and NASA are training AI models to understand the speech, context and instructions needed to get airplanes from the runway to their gates as efficiently and safely as possible.

Today's airport surface management process revolves around quirky acronyms, aviation vocabulary and human voice traffic that's far from perfect. If NASA Aeronautics Research Institute’s (NARI) partnership with Google Public Sector pans out, the airport surface management process can be augmented with data, speech-to-text instructions and automation.

NASA Aeronautics Research Institute (NARI) is focused on cutting-edge aeronautics research and operational strategies. NARI connects industry, government, and academia to NASA with a focus on autonomous, high-speed, and electric aircraft. "We're the bridge between NASA researchers in aeronautics and the external community, which can be the FAA, other agencies, universities and industry," said Dr. Krishna Kalyanam, Deputy Director, NARI. "NARI was also set up to seed foundational early-stage research that may pan out and turn into a larger project funded by the government."

NARI's priorities include Advanced Air Mobility (AAM), Wildland Fire Initiatives, Shaping Tomorrow's Aviation Systems and providing a collaborative infrastructure for partners to work with NASA.

We caught up with Dr. Kalyanam (right) at the Google Public Sector Summit in Washington, DC to talk about the research project.

The project. Dr. Kalyanam said the goal of the research project is to leverage speech to text models in ground traffic control when planes land and taxi on the tarmac. If the process of getting plans through the airport surface to the gate can be optimized, airlines can improve safety and the cost structure. The research looked into whether voice content over the radio can be turned into taxi instruction with 100% accuracy so it can be absorbed by automation and provide another layer of instructions for pilots.

"You land and you have to get off the concrete. You don't want aircraft on the runway and the sooner you can get an aircraft to its destination, the more planes you can get on the runway," said Dr. Kalyanam. "As soon as you land, you're getting instructions to the gate assigned to you by your dispatcher. All instructions are provided by the ground controller. It's 'take this route. Turn here. And here.'"

Challenges. The biggest challenge with the project, according to Dr. Kalyanam, was that voice traffic between pilots and the control tower has its own vocabulary as well as poor radio quality.

"Say you're running into some bad weather and need instructions to the gate. Today that's full end-to-end speech. The information could be augmented by text, visual and other inputs to go along with voice that can be converted to a route that's communicated digitally," said Dr. Kalyanam. "Once digital it can be displayed on a map or directly ingested into route planning."

Another challenge is that instructions to pilots have a unique vocabulary including terms like Roger and Wilco that humans can easily fill in gaps when interpreting data. Models need to be trained on voice traffic over air to pick up this vocabulary.

More from Google Public Sector Summit:

The goal. By digitizing the voice traffic over radio, directions can be given via moving maps, text, and color codes. That data can also be used to optimize routes and improve efficiency. "Once you digitize the information you have all this information in one place that can be optimized," said Dr. Kalyanam. "There are 100 tasks that are needed between the time the plane lands, people get off and the plane is ready to take off again."

Dr. Kalyanam said this research could also apply to autonomous aircraft and refueling. "The traditional processes are mostly human-centric," he said. "Some of these things can be automated, but at the least you can make it easier for humans to perform tasks.”

He added that the motivation of the research is to provide a secondary source of information for the pilots.

Training models. Google Public Sector used multiple models for training, but training was done on a minimum data set of 10 hours of voice instructions. Google's base models were already trained on general English conversations but had to be customized by vocabulary and use case. The models would pick up voice instructions, transcribe them according to ground control's acronyms and vocabulary and create digitized instructions.

"It's almost like learning a new language," said Dr. Kalyanam. "There are words that we will never use in English because they mean completely different things. You need to get the right context. If you hear “Dealt” it most probably means “Delta” with the ‘-ah’ sound clipped. Sometimes you can’t hear parts of what is being said. You're training models to be as perfect as they can be in an imperfect environment."

Google Public Sector and NASA worked with retired controllers to verify the ground truth as well as the voice commands and how the models performed. "The goal was to capture the taxi instruction with 100% accuracy," said Dr. Kalyanam. "There may be a conversation, but you want to be able to know that the pilot can't turn left from Charlie to Lima and then use that information. There's local knowledge about the airport layout that can be used to fix errors.”

Complicating the training effort is the reality that every airport is different, and the models will need to know local context—say the differences between the airport in Dallas vs. Tampa. It’s possible that models will need more fine tuning based on location of the airport. This fine-tuning will be even more important if this research is applied to international airports.

Working with Google Public Sector. Dr. Kalyanam said partnering with Google Public Sector made sense given Google's experience in AI, speech-to-text use cases and mapping. "We had some internal stuff we developed over the years, but the speech-to-text expertise was with Google," said Dr. Kalyanam. "Google has the models and has done the research."

Dr. Kalyanam added that Google Public Sector also had the engineers available to test multiple models and configurations for the audio. "Not one single model works best," said Dr. Kalyanam. "It took a lot of experimentation. This is custom engineering work. It's a good partnership since we don't have access to what's inside the box, but we can provide feedback so Google can build something. We also have retired controllers and pilots for model validation."

Metrics. Although it's early in the research process, Dr. Kalyanam said time saved and reduced mishaps will be core metrics. "If you end up in the wrong place it's a lot of time wasted because aircraft normally do not go in reverse," he said. "There's a lot of opportunity with digitized data. If you didn't make a turn, automation can alert you and give you new instructions. I think this process can be made simpler and hopefully less prone to error."

What's next? NASA and Google Public Sector are looking to publish their research and work with the FAA and the aviation industry. "There's a lot of interest in this research," said Dr. Kalyanam. "This is exploratory research, so we are ready to accept some failures. We are trying to prove this concept and maybe we'll simulate it in one airport and see how it adapts. We do the research, crunch the numbers and work with the FAA and industry to mature the technology for use."