Introduction
This article isn’t a detailed list of all of the differences I’ve found, but it does provide some explicit differences and some guidance in how to check for differences between platforms. It is also useful to know that many IVR vendors base their VoiceXML implementation from common code bases. Behaviors you find in one browser are likely to show up in another browser.
Common VoiceXML Enhancements
Objects
The VoiceXML standard has a built-in extension point. The VoiceXML object element was designed to allow vendors a place where they could add their own extensions. Usage varies significantly and most of the features exposed are very platform specific.
Transfer Element
The VoiceXML standard only supported bridge and blind transfers. In many call centers and IVR deployments, this isn’t sufficient. Supervised and other transfers are usually necessary and many platforms enhanced the transfer element to support alternate types.
Tone Playing
The VoiceXML standard didn’t provide a mechanism for playing tones. These are often needed for network transfers or to drive other devices. For traditional TDM networks, you usually just play recordings of DTMF tones. However, in most SIP environments you can’t play tones as telephony tone data is typically carried outside of the audio band.
CTI
The VoiceXML standard nor CCXML failed to provide a sufficient mechanism for attaching data to a call or getting a unique call reference ID. This was probably done because the availability of this information is highly environment dependent.
VoiceXML Differences
Sloppy rule enforcement
As platform vendors and behind the scenes software vendors created the initial VoiceXML browsers, many of them left holes that didn’t strictly enforce the standard. Once those holes were created, they often weren’t fixed to avoid breaking customer applications that had been deployed.
For example the VoiceXML VAR element, which is documented here , can’t be used to initialize the property of an object (e.g. obj.prop1). However, many, but not all browsers allow this behavior.
These types of problems exist in most implementations and as far as I’ve seen, are never documented.
(Embedded) Grammars
Given the way many platforms handle speech recognition, you may not be able to use embedded speech grammars within the VoiceXML. Most platforms allow inline DTMF grammars if the platform is only running in DTMF mode. And, a few platforms, don’t allow external grammar references (ie nearly always platforms that don’t support speech recognition)
Interaction with Speech Recognition Engine
The interface between IVR platforms and speech engines, in the past, has left a lot to be desired. Often the platforms had to implement output and input as a turn taking exercise, which had impacts a platform’s ability to barge-in. This problem can also manifest itself by a platform not support the VoiceXML 2.1 mark element.
Bridges to Legacy Applications
Many of the IVR platforms on the market weren’t built specifically for VoiceXML. This means that applications often need some sort of bridge to the existing application framework. This might mean a proprietary application to launch the VoiceXML application or it just may mean some extensions to allow a call to move back and forth between the legacy, proprietary application and the VoiceXML application.
Invalid XML
This one is a personal pain point. I’ve come across a few platforms that support, and sometimes require, invalid XML. I’ve seen a few flavors to date. First, an application that didn’t need to encode common symbols, like &. This means that you either need to preprocess the XML before using an XML editor or not use any XML tools. The second involved incorrect encoding formats. The document encoding would say one thing, but actually be another. Some platforms, I believe, ignore the encoding tags. And finally, I’ve come across cases where the platforms that don’t need the XML declaration.
Note, I’ve listed this under VoiceXML differences, but this issue often also applies to the supporting standards.
Note, I mentioned sloppy rule enforcement above. This also applies to following the VoiceXML DTD. I’ve come across elements that are empty, that shouldn’t be. I’ve seen extra attributes and missing attributes. From a developer’s perspective, many of these hassles are minor. If you’re a tool vendor trying to support multiple platforms, it can be a bit maddening at times.
Return statements without a subdialog
I’ve come across at least one platform that ignores return statements if the call flow wasn’t currently in a subdialog. This can hide an application bug that will show up on other platforms.
On the topic of subdialogs, I know of a case or two where I’ve seen platforms that can’t or don’t properly support the nesting of subdialog calls.
How undefined variables are handled
Some platforms generate errors, some ignore, some treat as empty strings or 0 depending on usage.
Variable posting
How variables are posted back to the application server can vary if you stray from simple usage. As indicated above, undefined variables can yield different behavior. And, there isn’t any standard for how objects with properties should be rendered and transmitted. I’ve seen at least two different formats.
Built-in variables
Platform information, such as a channel number, is often provided as built-in variables. However, the variable names and existence of them varies significantly platform to platform.
Recordings
Not all platforms follow the standard HTTP Post mechanism for transmitting recording results back to the application server.
Built-ins
Some platforms have built-in audio and/or grammars.
The Supporting Stuff
VoiceXML, by itself, is insufficient to write an application. There are a set of defined and assumed dependencies that can cause you some difficulties.
ECMAScript
The VoiceXML specification references ECMA-262. Not all of the browsers on the market implement an ECMAScript engine that meets this standard. However, most either implement Rhino or SpiderMonkey depending on whether the platform is built on Java or C++. Both ECMAScript engines have their own extensions and few developers understand what is in the specification and what is not. Most authors talking about ECMAScript tend to focus on the ECMAScript engines implemented in web browsers leaving developers a bit unprepared to question how they should be writing their applications.
There are a variety of other differences associated with the ECMASCript engine. Here’s a quick list:
- Many platforms add custom ECMAScript functions to their implementation
- How undefined variables are handled
- Whether or not variables even need to be defined before use and the value of their default values
HTTP
While HTTP has been around for awhile, many of the VoiceXML browser implementations were clearly home grown and not fully functional. Some things you may experience:
- No support for https, or support for encryption, but not certificates.
- No support for proxies.
- Session management is independent between VoiceXML engine and grammar engine. May also apply to other features such as audio fetching, data fetching, ect.
- Caching rules can often be difficult to determine or control
Grammars
The standards for grammar development are SRGS and SISR. These standards came later in the process and have been very slowly adopted. It’s still more common to see proprietary implementations in the field than seeing those that follow the standards. And, as a tool vendor, you can’t count on the formatting information provided in the documents.
SSML
Text to speech engines are even more out of sync with the standard than speech recognition engines. You should expect to see only minimal adherence to the standards and little consistency in complex text to audio conversions.
In Conclusion
If you care about portability or are writing tools to process VoiceXML, your best chance is to get access to as many applications and platforms as you can. Even still, you find them all. You will only be able to minimize the issues that will occur.
The next few articles are going to discuss the different ways VoiceXML applications are built.
Recent Comments