Since its inception, Cogito has used signal processing and machine learning techniques to analyze the vocal and behavioral cues throughout a conversation. Using this analysis, Cogito’s AI Coaching System provides real-time guidance to help customer service representatives improve their conversations with customers as they are happening, helping to make each call more empathetic and personalized -- and ultimately more positive for both parties. 


For effective real-time coaching, it is essential to provide actionable behavioral guidance in-the-moment, otherwise, the opportunity for improvement is lost. Conventional machine learning techniques are typically not developed with this strict design constraint in mind. As a result, we needed to do extensive, novel research and development work in this area to come up with new techniques that are appropriate for our purposes. We refer to this class of techniques as Signal-Based Machine Learning.


Signal-Based Machine Learning involves the use of novel neural network model architectures specifically designed to enable incremental, real-time inferences on streamed signal data. It is a critical ingredient to provide continuous, in-the-moment measurement and intervention, which is both timely and contextually appropriate. 


For Cogito’s solution, minimizing latency is extremely important and as a result, our machine learning models must be designed in such a way that they are highly responsive. Many of the modern model architectures, common for example in the natural language processing field, require bi-directional processing of sequence data or access to an entire sequence. The benefit of using entire sequences as input is accuracy, as models are able to account for the entire context of the data, but this type of approach is prohibitive for very low-latency applications like ours. Take, for instance, machine translation -- although the input data is sequential (essentially a lexical time series), state-of-the-art machine learning models for this problem require the entire sequence (e.g., sentence or document) as input to make a valid inference. Though accurate, such approaches are not conducive to low latency inferences on streaming data.


Cogito’s Signal-Based Machine Learning approach is unique. It is both efficient from a compute performance perspective, including low computational load and extremely low latency, and also accurate from a context-sensitive processing perspective. We apply multimodal signal processing, which means that we can have multiple streams of data, e.g., audio signals as well as word signals, produced from automatic speech recognition. A challenging aspect of this multi-modal processing is that the acoustic and word signals, from the various parties on a telephone interaction, are not synchronized in time. An important aspect of our machine learning approach involves novel methods to synchronize these multi-modal inputs early in the model architecture which enables us to fully exploit the synergies between these asynchronous inputs. 


Cogito continues to leverage Signal-Based Machine Learning techniques to create richer, context-aware guidance to both contact center agents and supervisors. Especially in our increasingly digital world, organizations need real-time insights to help guide their decision making and customer offerings. It’s with novel innovations like Signal-Based Machine Learning from Cogito that organizations can leverage human-aware technology to improve customer touchpoints in real-time.