You've probably seen the tutorials. Add telephony to your AI agent using Twilio. They make it sound straightforward. Buy a number, set up WebSockets, integrate speech-to-text, pipe in text-to-speech, handle the call state.
Then you actually try it and spend three weeks debugging WebSocket connections, transcription lag, and audio quality issues - only to end up with something that mostly works and requires constant maintenance.