Download PDF

11.3.1: Building a technology base

Edwin Blake, William Tucker, Meryl Glaser and Adinda Freudenthal

Building a technology base (2001-2003)


Telgo323
Telgo323 was our first attempt at a Deaf telephony solution. During this cycle, we engaged an intermediary to the Deaf community rather than the community itself. Together we devised an overview of telephony options for members of DCCT (Glaser & Tucker, 2001). An MSc student at Rhodes University designed and built an initial prototype (Penton et al., 2002). Telgo323 converted a Deaf user's text on the Teldem to speech on the telephone. We felt that a two-way conversation was necessary to demonstrate the prototype to Deaf users, and therefore did not show it to them. The following table provides an overview of the cycle.

Cycle overview

Description

Timeframe

Early 2002

Community

N/A

Local champion

N/A

Intermediary

Meryl Glaser (UCT)

Prototype

Telgo323

Coded by

Jason Penton (Rhodes)

Supervised by

William Tucker (UCT/UWC), Alfredo Terzoli (Rhodes)

Technical details

(Glaser & Tucker, 2001; Penton et al., 2002; Penton, 2003; Tucker et al., 2002)

Telgo323 cycle overview

Diagnosis

A prior Teldem field study showed that ICT could be designed with adequate functionality and the best of intentions yet still not be adopted by the target community (Glaser, 2000). Deaf users approved of the Teldem concept, but could not, or would not integrate the Teldem into their lives. The Teldem exhibited frequent technical faults that required a tedious reset process. It was difficult to discern if the displayed text came from a person or the Teldem itself. If the former were true, there was no way for a Teldem user to identify the other caller. When the latter occurred, the text was often incomprehensible. There were also financial issues with the Teldem. A Deaf user needed access to a Teldem, and had to pay for calls. The duration of text-based calls was longer than voice calls due to type-time and was therefore expensive. In addition, some Deaf users felt that hearing users would use their phone plugs when the Teldem was absent and they would have to pay for it. Such problems resulted in Deaf users not trusting the Teldem.

At the time, we considered the Teldem to be the only real-time communication option to give DCCT members an independent means to communicate with anyone, both hearing and Deaf users. One could argue that voice-based synchronous communication could be substituted by asynchronous text-based mechanisms. However, we felt synchronous communication was the only way to reliably ensure that a communicative exchange was actually happening. It was therefore imperative to use synchronous mechanisms. The main problem, as we saw it, was to bridge synchronous communication from Deaf to hearing users. The technical challenge was to convert text on the Teldem to voice on the PSTN, and back again.

Plan Action

At the time, the Internet offered many text and video-based opportunities for Deaf-friendly communication. The Teldem had no connection to Internet. If the Teldem could interface to the Internet via the PSTN, the Deaf participants' connectivity circle could grow and the prospects of communication independence could increase as well. To work toward this aim, we planned a series of bridges utilising the Teldem (Glaser & Tucker, 2001). The first Deaf telephony bridge would be a system where a human operator relayed communication from a Teldem user to a hearing party on a telephone. We also conceived of various software solutions that offered connectivity between the Teldem and the Internet. A final bridge enabled text and speech to be automatically converted and relayed between text and voice users in real-time.

Implement Action

Directly after presenting the ideas for these bridges at a local telecommunications conference (Glaser & Tucker, 2001), Jason Penton, a Masters student at Rhodes University, offered to code the last bridge because it fitted within his Masters project to create VoIP services with H.323 (Penton, 2003). We called his prototype Telgo323 because it enabled a Teldem to connect to a telephone on the PSTN with ISDN via H.323 (Penton et al., 2002; Tucker et al., 2002). The bridge was implemented in one direction (see figure below). The Deaf user typed text on a Teldem that was converted to speech and delivered as voice to a hearing user. The reverse direction, from speech to text, was beyond the scope of Jason's thesis, and was left as future work. At the time, the Softbridge concept was gestating. The next figure shows the Softbridge stack design of Telgo323 in hindsight.

Telgo323 architecture
The Telgo323 prototype relayed text from a Deaf user to voice for a hearing user. The Teldem issued a real-time stream of Baudot-encoded text characters to an H.323 ISDN gateway enhanced with an open source text-to-speech (TTS) engine from Festival. The converted voice was relayed to a telephone with an H.323 gateway. The gateways resided in the IP space where we were able to make software modifications.

Telgo323 Softbridge stack
Telgo323's aim was to provide a two-way automated translation bridge between a Deaf user using a Teldem and a hearing person with a telephone or soft phone. Telgo323 only implemented the communication in one direction, using an open source text-to-speech engine called Festival to relay text typed on the Teldem to a handset connected to a PSTN-based PBX or an H.323 endpoint, a soft phone running on a PC.

Evaluate Action

The Telgo323 prototype only worked in one direction from a Deaf user to a hearing user. Because the Teldem's Baudot encoding was non-standard, the struggle of returning text to the Teldem caused us to abandon that task. Another reason to postpone the speech-to-text delivery was that TTS technology worked fairly well at the time, but ASR was difficult to train over the phone. Furthermore, open source ASR tools did not easily recognize South African accented English. Telgo323 was designed to 'plug and play' the ASR and TTS tools so that as technology improved, a new tool could slot into the architecture.

Reflection/Diagnosis

Telgo323 work ceased because Jason completed his Masters (Penton, 2003). In order to test the tool with a Deaf user we needed to convert speech to text, but first we wanted to move the implementation of VoIP to SIP (Handley et al., 1999). SIP was an IETF competitor to the ITU's H.323 protocol family and has since replaced H.323 as the VoIP protocol of choice. We decided the next step would be to port Telgo323 to SIP.

One concern for using VoIP, with the intention of eventually deploying this solution over the Internet instead of just inside our laboratory, was that VoIP in South Africa was illegal (DoC, 1996; DoC, 2001). We felt that the legislation hindered the take up of tools like Telgo323 and kept South African Deaf people even more disadvantaged. We hoped the situation would change and indeed VoIP eventually became legal in 2005 (DoC, 2005).


TelgoSIP
Later in 2002, two interns from the Indian Institute of Technology (IIT), Harsh Vardhan and Nitin Das, came to UWC to port Telgo323 to SIP for their final year project. We still did not engage the Deaf community. However, this prototype is included as an action research cycle for completeness. The table below provides a brief overview.

Plan action

The intention for the interns was only to port Telgo323 to SIP, and not be concerned with implementing the reverse direction (with ASR and text-to-Baudot conversion). We knew this prototype would not be sufficient to trial with a Deaf user, so we did not even plan to do so. We would, however, test the technical performance in the laboratory.

Cycle overview

Description

Timeframe

Mid 2002

Community

N/A

Local champion

N/A

Intermediary

Meryl Glaser (UCT)

Prototype

TelgoSIP

Coded by

Nitin Das and Harsh Vardhan (IIT/UWC)

Supervised by

William Tucker (UCT/UWC)

Technical details

Not published

TelgoSIP cycle overview

Implement action

We completed the port from H.323 to SIP and called the new prototype TelgoSIP. TelgoSIP used the VOCAL open source SIP platform from Vovida (http://www.voip-info.org/wiki/view/VOVIDA+SIP). The PSTN gateway was implemented with the VOCAL stack on a QuickNet card. TelgoSIP offered a cleaner solution with modification of a user agent (UA) client instead of modifying gateways as with Telgo323. We still used open source Festival to convert text to speech. The speech was then sent via the QuickNet card using an unmodified VOCAL gateway. The TelgoSIP architecture is shown below. The Softbridge architecture was virtually identical to that of Telgo323.

Evaluation action

Because neither Telgo323 nor TelgoSIP was able to successfully work out the reverse direction, we did not test either prototype with a Deaf person. We brought out another pair of Indian interns to work on that problem in 2003, but they were not able to solve the problem. We therefore aborted what would have been a second TelgoSIP cycle. We had learned that the Teldem was a failure in terms of social take up in South Africa, and was also so difficult to work with on a technical level that further development was unjustified.

Reflection

We decided to abandon the Teldem, but did not abandon the goal for a Deaf-to-hearing relay bridge. The Telgo323 and TelgoSIP prototypes demonstrated a software bridge concept that, when completed in the opposite direction, could enable Deaf people to communicate synchronously with hearing people on the PSTN and Internet. The most obvious bridge was between Deaf and hearing people accomplished by bridging between voice and text modalities. Another bridge was at the network level between the PSTN and the Internet, and consequently between physical devices, e.g. handset, Teldem and soft phone. Another network bridge was between high and low bandwidth, because most of the data transport between gateways and/or user agents was text that required much less bandwidth than voice. All of this bridging meant that anybody, hearing or Deaf, with access only to a PSTN line via a handset or Teldem could communicate via IP on the Internet.


TelgoSIP architecture
TelgoSIP ported Telgo323 to SIP. TelgoSIP modified a SIP client that acted as a proxy for the PSTN-connected Teldem. Modification at the client level significantly decreased the complexity of the overall design. TelgoSIP did not provide hearing-to-Deaf communication, and therefore was not tested with end-users.


Softbridge v1
While the Telgo prototypes were in development, another MSc student at UCT started working on an approach based on generic adaptors between various disparities. The most obvious adaptation was between text and voice. The preliminary Softbridge abstraction emerged from this effort. This cycle still did not engage the Deaf community, but abandoning the Teldem encouraged IP-based opportunities to build Softbridge reference implementations. The table below provides an overview.

Diagnosis

To perform action research, we needed to test a fully functional bridge with actual end-users. We considered establishing a manual bridge with a human relay operator. Human relay call centres were already in place throughout developed regions, mostly subsidised by government and a respective telco. However, in South Africa, as in many developing regions, the incumbent telco did not provide this service simply because it was perceived not to be able to generate revenue. Our research group did not have the funding or the clout to establish a relay service. The Teldem, produced by Telkom, was difficult for us to integrate into a solution. We had to consider other devices besides the Teldem, and revisited our conceptual bridges (Glaser & Tucker, 2001). The chat interface on the PC appealed because it resembled an Instant Messaging approach.

Cycle overview

Description

Timeframe

Mid 2002

Community

N/A

Local champion

N/A

Intermediary

Meryl Glaser (UCT)

Prototype

Softbridge

Coded by

John Lewis (UCT)

Supervised by

William Tucker (UCT/UWC), Edwin Blake (UCT)

Technical details

(Lewis et al., 2002)

Softbridge v1 cycle overview

Plan action

A preliminary Softbridge abstraction shown below emerged from the design specifications of the Telgo and Softbridge prototypes (Lewis et al., 2002). The model concentrated on media and temporality bridging. However, Deaf telephony also required adaptation at other layers in the Softbridge stack (see next figure).

Implement action

A reference implementation was initiated with CORBA (Common Object Request Broker Architecture) as the platform for inter-process communication, but was not completed. Therefore, this cycle emphasised preliminary conception of the Softbridge abstraction. Unlike in the OSI model, the user application itself, not the application network API, was included in the design. The text relay bridge application required media and temporality bridging. Another significant difference from the OSI model was that communication modalities could be converted, or adapted, from one to another based on user capabilities. Thus, it was clear that the user must be included in the Softbridge stack.


Generic Softbridge approach to Deaf telephony
The early Softbridge approach adapted various forms of media in and out, e.g. text, voice or video. Softbridge also adapted delivery of media between synchronous and asynchronous temporalities, creating a semi-synchronous form of delivery. The adaptations did have negative quality consequences, however. Text-to-speech and vice versa incurred quality degradation of the message content, and semi-synchronous communication incurred latency in message delivery.

Fully automated text relay Softbridge stack
Whereas the Telgo prototypes implemented bridging as part of the Deaf telephony application itself, the initial Softbridge prototype moved the media and temporality adaptation out of the application into middleware. This distinction was significant to the design process in order to hide bridging from the users. Media adaptation could not be hidden so easily. The Deaf user knew the text was being converted to speech, and the hearing user was also painfully aware that a TTS engine was being used. The temporality bridging was more interesting. Both Deaf and hearing users would be aware of the delays incurred due to the TTS and ASR adaptation even though they most probably would not realise the extent to which asynchronous Instant Messaging was being converted to synchronous VoIP.

Evaluate action

Softbridge design implied that bridging synchronous and asynchronous temporalities could produce semi-synchronous exchange. Communication quality in the conventional QoS sense was obviously degraded by the large latencies incurred by media adaptation with TTS and ASR. One research aim was to determine if and how those quality problems could be overcome. While TTS functioned reasonably well, we knew ASR would be problematic. Furthermore, temporality adaptation between asynchronous text and synchronous voice introduced delays into the system, especially for the hearing user. Later, we would instrument the code to record these delays, and also explore ways to measure the quality of the TTS and ASR output.

Diagnosis/Reflection

This cycle produced an initial Softbridge abstraction that was mostly concerned with media and temporality bridging. However, other bridges were also relevant to bridge between Deaf and hearing users with different types of interfaces, devices, networks and their power systems. These considerations would coalesce in the Softbridge models to come. However, the task remained to build a Softbridge prototype that could be tested with the Deaf community.


Softbridge v2


The preliminary Softbridge abstraction began to take shape in a second Softbridge reference implementation that successfully provided a bi-directional Deaf telephony system. The action research method also improved during this cycle because of the first engagement of the Deaf community with a Softbridge prototype. We tested the prototype with a single literate Deaf person who provided valuable input about the prototype and more importantly about the social processes surrounding the introduction of such technology. The table below provides an overview.

Cycle overview

Description

Timeframe

Late 2002 - mid 2003

Community

DCCT

Local champion

N/A

Intermediary

Meryl Glaser (UCT)

Prototype

Softbridge v2

Coded by

John Lewis (UCT)

Supervised by

William Tucker (UCT/UWC), Edwin Blake (UCT)

Technical details

(Lewis et al., 2003; Tucker, 2003; Tucker et al., 2003a, 2003b)

Softbridge v2 cycle overview

Plan Action

The plan was to provide Deaf-to-hearing communication in both directions. Leveraging the ease of IP-based tools, we planned to employ an IM interface on a PC for the Deaf user. The system would convert IM text to voice for the hearing user on some form of audio device, and convert the hearing user's spoken voice back to text for delivery to the Deaf user. We designed a bi-directional prototype based on the early stages of the evolving Softbridge concept (see below). We named the prototype Softbridge, but this should not be confused with the Softbridge abstraction. Here, the Softbridge prototype was an instance of the Softbridge abstraction (as were all subsequent Deaf telephony prototypes in one form or another).

Instant Messaging-based Softbridge architecture
The Softbridge prototype would replace the Teldem device with an Instant Messaging client running on a PC. The Softbridge would automatically adapt between voice and text, and also between asynchronous Instant Messaging for the Deaf user and synchronous VoIP for the hearing user. The adaptation between text and voice would be implemented with web services.

Implement Action

The actual architecture for the implemented prototype is shown below and was described by Lewis et al., (2003). The system architecture was based on a generic Softbridge core with clients specifically built for Deaf telephony. A Jabber-based IM client coded with .Net provided text in and out for the Deaf user. The hearing user also used a Jabber client, but for speech. We also built a hearing client that had both voice and text so the hearing user could see what the Deaf user typed in addition to hearing the TTS-generated speech. We also wanted the hearing user to be able to type messages to the Deaf user when the ASR failed to deliver intelligible text. The TTS and ASR engines were wrapped as loosely coupled web services, thus making them very easy to plug and play.


Softbridge v2 architecture
The actual architecture differed slightly from the planned approach shown earlier. Both text and voice transport were implemented with an asynchronous protocol. That meant no adaptation to synchronous VoIP. We reasoned this was acceptable because there were already large delays introduced by TTS and ASR adaptation. Those adapters were loosely coupled web services. We built a variety of clients for the hearing user. For example, one had voice only and another had both voice and text so the hearing user could see exactly what the Deaf user typed.

Instead of using real-time VoIP, a speech clip of maximum 2MB was packaged into XML (eXtensible Mark-up Language) messages to and from a modified Joey server (an open source Jabber server). The modified Joey server effectively became a Softbridge server. The speech XML packets were delivered with XMPP, Jabber's asynchronous protocol. Therefore, no temporality adaptation was required or performed. The intention was to deliver the voice messages as quickly as possible to appear semi-synchronous. However, the TTS and ASR media adaptation in both directions introduced additional delay and also often distorted the message content.

We tested the prototype with a Deaf user who was somewhat unique in that he was PC literate and also had some experience with research studies (Tucker et al., 2003b). We built several clients to experiment with different types of bridging, mostly at the media layer (see below). The Deaf user client was text in/text out. The hearing client allowed us to pick and choose media capabilities. We experimented with the following combinations: text & TTS in/text out, TTS in/text out or TTS in/ASR & text out. The reason for the combinations was to use text to clarify the output of the TTS and ASR. We began the experiment with instructions interpreted to SASL, then used the tool itself to convey instructions to the Deaf user.


Softbridge v2 Softbridge stack
For the Softbridge v2 prototype, bridging was not necessary at the lower three Softbridge stack layers. We used this prototype and several client interfaces to conduct an experiment with a Deaf user.

Evaluate Action

We found that the Deaf user treated the IM client like a Teldem. He was not aware that characters were not being sent one at a time, but rather in page mode, or one message at a time. He also used the GA terminator as if in half-duplex mode. He informed us that Deaf users would type English slowly, and likely with SASL grammar. He gave us some examples, and of course, the TTS engine could only pass on what it received. On the hearing side, TTS messages arrived unexpectedly. The natural rhythmic exchanges in voice conversation were disrupted by the delays caused by typing time and TTS conversion.

Reflection

Our intermediary had long urged us to test the prototype with real Deaf users; that building solutions in the laboratory was pointless. The compromise was to test a solution in the laboratory with a Deaf user. We had emphasised the Deaf user interface, but learned that we also had to address the needs of hearing users. A case in point was the way that the TTS speech appeared without warning. The Deaf user had a persistent textual representation of the conversation in the IM interface. However, a hearing user with an audio-only interface was at a disadvantage not being able to see the conversation in order to rescue the misunderstandings due to spelling/grammar mistakes and poor ASR conversion.

The MSc student had devoted a substantial amount of time to the technical implementation of the Softbridge prototype and its clients. He managed to bridge voice from the Jabber-based platform to an H.323 soft phone client. Unfortunately, the effort ceased prematurely before an intended H.323 gateway was provided to connect the system to the PSTN.Implementation interruptions of this sort had happened several times already because the lead programmer was an MSc student who either graduated or terminated their studies.