Voice Recognition to Support Assessment of Cross Platform Situational Awareness and Decision Making
Navy SBIR 20.2 - Topic N202-098
Naval Air Systems Command (NAVAIR) - Ms. Donna Attick email@example.com
Opens: June 3, 2020 - Closes: July 2, 2020 (12:00 pm ET)
N202-098 TITLE: Voice Recognition to Support Assessment of Cross Platform Situational Awareness and Decision Making
RT&L FOCUS AREA(S): General Warfighting Requirements (GWR)
TECHNOLOGY AREA(S): Air Platform, Battlespace, Human Systems
OBJECTIVE: Develop a voice recognition capability that can support analysis and debrief of Carrier Strike Group level decision making and Situational Awareness (SA).
DESCRIPTION: There is a need for complex, highly coordinated, System-of-Systems, Air Defense missions and tactics cross-platform communications. The complexity of coordination associated with integrated tactics necessitates a significant amount of voice communications across the different platforms to provide SA and elicit decision-making. Communication is critical to cross platform coordination and overall tactic execution, yet it remains one of the most challenging training objectives to meet during Air Defense events. Specifically, there are challenges with recognizing when a call or request for communication has been made (i.e., at a specific point in the timeline), ensuring timeliness of communications (i.e., time to respond to a request or provide required information based on environmental cue), and providing the appropriate brevity terms and standard communications protocols. The need for timely, diagnostic feedback specific to cross-platform communications becomes critical. Current practice for assessing communications and overall performance relies solely on qualitative instructor assessments in large part due to the need to understand what is being said, and the context of the situation it is being said in. Challenges are associated with human error, manpower, and time resources required to meet training demands. In addition, debriefs can take thirty to ninety minutes to prepare, which can create potential loss of learning points. Consequently, the need for reliable (i.e., consistently, accurately captures what was said), timely (i.e., data can be synthesized and used for debrief within thirty minutes or less) and diagnostic feedback (i.e., data provided allows instructors to correlate voice communication with tactical execution to provide relevant feedback based on environmental context) for voice communication that can be standardized across platforms is important. A proof-of-concept to demonstrate the ability to create logs and a plan for prototype development and implementation and evolve into a demonstrable capability integrated into the Next Generation Threat System’s (NGTS) Analysis and Reporting Tool (ART). In order to integrate with ART, a voice tool would have to provide a parse-able “utterance log” in a compatible format (e.g., json, xml, hdf5) that could log individual utterances with metadata (e.g., start/end time, sender/receiver, text transcript).
The development of an innovative speech recognition tool for cross-platform SA and decision-making will benefit the Fleet by significantly decreasing instructor workload, reducing human error and manpower time requirements, and automatically provide instructors with information on communication protocol adherence and timeliness to improve SA and increase debriefing capabilities. The tool should analyze virtual, and eventually live training events, using speech to text (STT) technologies and natural language processing (NLP) to verify automatically the semantic content of utterances associated with relevant tactical communications. It should provide a parse-able “utterance log” of these utterances to include things like start/end time, sender/receiver, text that accurately captures voice communication, etc., allowing the communications data to be linked to objectively captured contextual cues within the tactical environment (e.g., threat location). Applying this type of technology to Air Defense integrated training will enhance assessment by providing more robust and accurate assessments. The tool will allow for natural, free flowing interactions between platforms, which will result in speech recognition and understanding among groups within context. Additionally, the tool should be designed and developed to include debrief visualizations that support diagnosis and feedback of voice communication tied to context in the tactical environment at the time of the communication. Visualizations should also account for timeliness and accuracy. The tool should be easy to use as determined by usability and technology evaluations that should be documented.
Work produced in Phase II may become classified. Note: The prospective contractor(s) must be U.S. owned and operated with no foreign influence as defined by DoD 5220.22-M, National Industrial Security Program Operating Manual, unless acceptable mitigating procedures can and have been implemented and approved by the Defense Counterintelligence and Security Agency (DCSA). The selected contractor and/or subcontractor must be able to acquire and maintain a secret level facility and Personnel Security Clearances. This will allow contractor personnel to perform on advanced phases of this project as set forth by DCSA and NAVAIR in order to gain access to classified information pertaining to the national defense of the United States and its allies; this will be an inherent requirement. The selected company will be required to safeguard classified material IAW DoD 5220.22-M during the advanced phases of this contract.
PHASE I: Define and develop a concept for standalone, voice assessment capability for a single Air Defense platform. Demonstrate feasibility of application into the larger, integrated training system. The concept should include a plan for integration into the NGTS ART to allow voice feedback/assessment to be aligned with unclassified performance data from NGTS Ch.10 log files (to be provided as GFI) and include assessment visualizations to support diagnosis and feedback. The Phase I effort will include prototype plans to be developed under Phase II.
PHASE II: Develop and demonstrate a prototype voice assessment capability for a single Air Defense platform through execution of the integration plan developed in Phase I. Integration with NGTS ART will enhance the capability by aligning voice feedback with performance data already collected by the ART. Design and develop the prototype to include visualizations, usability documentation, and technology evaluation.
Work in Phase II may become classified. Please see note in the Description section.
PHASE III DUAL USE APPLICATIONS: Extend functionality to multiple platforms integrated in NGTS ART. Final testing and transition will include regression tests, bug fixes, and patching as required by the NGTS ART transition customer to support the Integrated Training Facility’s requirements. Perform an in-depth evaluation of the training effectiveness of the tool and provide return on investment information for program acquisition. Expand to include application of the baseline capabilities to other mission sets and domains as needed.
Development of a voice recognition suite that includes performance and visualizations integrated with various pilot simulations allows for a modular capability. This technology could be used for commercial pilot training as well as other team-based domains, which focus heavily on communication and coordination, particularly within the aviation domain (e.g., Air Traffic Control).
1. Ahmed, U.Z., Kumar, A., Choudhury, M., and Bali, K. “Can Modern Statistical Parsers Lead to Better Natural Language Understanding for Education?” Computational Linguistics and Intelligent Text Processing, 7181, 2012, pp. 415--417. https://www.cse.iitk.ac.in/users/umair/papers/cicling12.pdf
2. Deng, L. & Xiao, L. “Machine Learning Paradigms for Speech Recognition: An Overview.” IEEE Transactions on Audio, Speech, and Language Processing, vol 21, no 5, 2013. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.337.8867&rep=rep1&type=pdf
3. Jurafsky, D. and Martin, J.H. “Speech and Language Processing. Cambridge, MA: MIT Press, 2008. https://www.cs.colorado.edu/~martin/SLP/Updates/1.pdf
4. Mathieu, J. E., Heffner, T. S., Goodwin, G. F., Salas, E. & Cannon-Bowers, J. A. “The Influence of Shared Mental Models on Team Process and Performance”. Journal of Applied Psychology, 85(2), 2000, p. 273. https://www.ida.liu.se/~729A15/mtrl/shared_mental-models_mathieu.pdf
5. Shneiderman, B. “The Limits of Speech Recognition.” Communications of the ACM, 43(9), 2000. https://www.cs.umd.edu/users/ben/papers/Shneiderman2000limits.pdf
6. Stensrud, B., Taylor, G. and Crossman, J. “IF-Soar: A Virtual, Speech-Enabled Agent for Indirect Fire Training.” Proceedings of the 25th Army Science Conference, Orlando, FL., November 27-30, 2006. https://www.cs.umd.edu/users/ben/papers/Shneiderman2000limits.pdf
7. Traum, D. R. & Hinkelman, E. A. “Conversation Acts in Task-Oriented Spoken Dialogue.” Computational Intelligence Special Issue: Computational Approaches to Non-Literal Language, vol 8, no 3, 1993. https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8640.1992.tb00380.x
8. Zaihrayeu, I., Sun, L., Giunchiglia, F., Pan, W., Ju, Q., Chi, M. & Huang, X. “Web Directories to Ontologies: Natural Language Processing Challenges.” Springer: Berlin Heidelberg, pp. 623-636. http://iswc2007.semanticweb.org/papers/617.pdf
KEYWORDS: Speech To Text, STT Technologies, Natural Language Processing, NLP, Tactical Speech, Decision Making, Speech Recognition, Voice Recognition
TPOC-1: Jennifer Pagan
TPOC-2: Sarah Warnham
TPOC-3: Heather Priest