Interview: "People no longer perceive that they are talking to an AI"
AI-powered telephone assistants are intended to handle calls. How this works at Fonio is explained by CEO Daniel Keinrath.
(Image: incrediblephoto / Shutterstock.com)
Overloaded lines, long waiting loops, and scarce personnel resources are now part of everyday life for many doctor's offices and other healthcare facilities. Especially during peak times, appointment requests, prescription inquiries, and organizational questions pile up – often at the expense of actual patient care. Telephone assistants are intended to help structure calls and automate routine tasks without requiring additional personnel resources.
AI-powered telephone assistants such as 321 MED, Docmedio, Vitas, Medivoice, or Doctolib's Aaron are intended to help, taking calls, booking appointments, or pre-sorting inquiries. The provider Fonio states that it currently serves nearly 400 customers in the healthcare sector.
(Image:Â Fonio)
In an interview with heise online, Fonio co-founder and CEO Daniel Keinrath explains how the system is structured and where the greatest technical challenges currently lie.
Where is your infrastructure hosted – especially in light of current political and regulatory developments?
We host almost everything locally at Hetzner in Germany and consciously do not work with hyperscalers, ensuring all our data remains in the EU. Otherwise, what we are currently doing would hardly be feasible. Especially in the DACH region, there is a high sensitivity regarding data protection – and this has increased significantly in recent months. Previously, large enterprise customers primarily asked for on-premise solutions. Now, we are hearing this from smaller companies as well. Many explicitly want EU data residency and no infrastructure from US providers.
Videos by heise
Which AI technically powers your system?
A complete in-house development would be unrealistic. Our ambition is to always offer the best available AI telephony on the market. That's why we've built our system to be agnostic. We use models from OpenAI and Google. If a better model comes onto the market, we can integrate it flexibly.
How exactly does your architecture work?
You can think of it like an orchestration layer. First, we establish the telephone connection – figuratively speaking, like a Zoom call without video. When the caller speaks, what they say is transcribed via speech-to-text. A finely tuned LLM then generates a response. In parallel, API requests are made, for example, for calendar access or RAG queries from documents. Subsequently, a text-to-speech model converts the response into natural language, which is then played back over the telco connection. All of this needs to happen rapidly to feel like a real conversation.
How fast is "extremely fast"?
We are now often under 800 milliseconds of total latency. That's an important threshold – below that, people generally no longer perceive that they are talking to an AI.
Is latency still the greatest challenge?
Not anymore. Latency used to be our main issue. Today, the AI is sometimes too fast. If it responds too quickly, it interrupts people. That's why we are working hard on "turn detection," i.e., recognizing when someone has actually finished speaking and when it's just a pause for thought. We now have to teach the AI artificial pauses.
What are the biggest technical hurdles currently?
The biggest weakness is still speech-to-text, especially in Europe. In the US, telephony is much more strongly based on internet connections with higher audio quality. European telephone networks are often more compressed, and the sound quality is poorer. This makes precise transcription difficult, especially with background noise or unclear pronunciation. That's why we work extensively with probability models and context estimations.
The market is growing rapidly. What distinguishes your system from other providers?
There are hundreds of providers now. But only a few have their own orchestration layer. Many rely on existing platforms and merely add an interface on top. Furthermore, many models are primarily optimized for English. Non-English languages require different fine-tuning – for example, when spelling out email addresses. To make this work, we specifically built our own system.
How open is your system to integrations?
Very open. We can execute API requests before, during, and after the call. Any API-capable system can be connected – whether Shopify, Salesforce, HubSpot, or industry-specific software. Common calendar solutions are natively integrated.
In healthcare, there are often closed systems. Do you notice that?
Yes. We regularly receive inquiries from practices that use existing solutions and want to switch. The problem is typically the lack of APIs. Some systems are completely closed. This significantly hinders competition. However, changing practice software is a major undertaking – which is why we do not actively pursue such system changes.
What is behind the cooperation with Easybell?
Previously, it often worked like this: customers received a separate AI telephone number from us and forwarded their main number. With Easybell – a Berlin-based telecommunications provider for VoIP, SIP trunks, and cloud telephone systems – their business customers can integrate our AI directly into their existing Cloud PBX without additional call forwarding. This is particularly helpful for medium-sized companies with many extensions. The AI can route calls directly or book appointments without calls having to be technically routed out of and back into the system.
You operate across industries. Why no specialization?
Typically, a SaaS company focuses on a niche. We consciously decided against it. Structurally, a telephone conversation hardly differs: whether it's a dental appointment or a tire change – it's about appointment scheduling, information retrieval, and routing. The conversation logic is very similar. Our goal is for every SME to be able to set up an AI telephone assistant within five to ten minutes.
What does the system cost?
We offer three price tiers, starting at 99 euros per month. An entry-level plan is aimed at smaller businesses and includes one user and 1,000 minutes per month. Additionally, there is a team plan for medium-sized businesses with extended features, such as integration into existing telephone systems. For larger companies, we create individual offers depending on call volume and integration needs. Additional minutes can be flexibly booked. On-premise solutions are generally possible but are only implemented in exceptional cases, as EU hosting is usually sufficient in practice and causes significantly less complexity.
(mack)