Voice Stream (RTP)
RTP (Real-time Transport Protocol) is a network protocol designed to deliver voice and video over IP networks. It structures data into packets engineered for high-speed transmission and real-time reassembly into a continuous, natural-sounding media stream. RTP is foundational to a wide range of real-time communication applications including VoIP, video conferencing, WebRTC, telephony, television, and web-based push-to-talk services.
In conversational AI, RTP streams form the voice processing backbone for voice bot applications. When a customer calls into a voice bot, the caller's audio is delivered as an RTP stream to the AI platform, where it is processed by ASR for transcription and then by the NLU engine for intent detection. The bot's synthesized response is similarly delivered back to the caller as an RTP stream — enabling real-time, two-way voice communication between the caller and the AI.