This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
Telephone communication is a challenge for many hearing-impaired individuals. One important technical reason for this difficulty is the restricted frequency range (0.3–3.4 kHz) of conventional landline telephones. Internet telephony (voice over Internet protocol [VoIP]) is transmitted with a larger frequency range (0.1–8 kHz) and therefore includes more frequencies relevant to speech perception. According to a recently published, laboratory-based study, the theoretical advantage of ideal VoIP conditions over conventional telephone quality has translated into improved speech perception by hearing-impaired individuals. However, the speech perception benefits of nonideal VoIP network conditions, which may occur in daily life, have not been explored. VoIP use cannot be recommended to hearing-impaired individuals before its potential under more realistic conditions has been examined.
To compare realistic VoIP network conditions, under which digital data packets may be lost, with ideal conventional telephone quality with respect to their impact on speech perception by hearing-impaired individuals.
We assessed speech perception using standardized test material presented under simulated VoIP conditions with increasing digital data packet loss (from 0% to 20%) and compared with simulated ideal conventional telephone quality. We monaurally tested 10 adult users of cochlear implants, 10 adult users of hearing aids, and 10 normal-hearing adults in the free sound field, both in quiet and with background noise.
Across all participant groups, mean speech perception scores using VoIP with 0%, 5%, and 10% packet loss were 15.2% (range 0%–53%), 10.6% (4%–46%), and 8.8% (7%–33%) higher, respectively, than with ideal conventional telephone quality. Speech perception did not differ between VoIP with 20% packet loss and conventional telephone quality. The maximum benefits were observed under ideal VoIP conditions without packet loss and were 36% (
VoIP offers a speech perception benefit over conventional telephone quality, even when mild or moderate packet loss scenarios are created in the laboratory. VoIP, therefore, has the potential to significantly improve telecommunication abilities for the large community of hearing-impaired individuals.
Engaging in telephone conversations is a challenge for many hearing-impaired individuals, including hearing aid and cochlear implant users [
The two main technical limitations of conventional telephones for hearing-impaired individuals are, first, the frequency restriction and second, the digital data compression used in conventional telephony to maximize the network infrastructure utilization.
In addition, issues related to coupling the telephone to a hearing aid or a cochlear implant may further reduce speech perception by the hearing-impaired end user, even when assistive telephone listening devices are used [
From a theoretical perspective, Internet telephony (voice over Internet protocol [VoIP]) should offer improved speech perception to the end user, as the transmitted frequency range is double that of conventional telephones (0.1–8 kHz vs 0.3–3.4 kHz). The association between improved speech perception and the presentation of speech at higher bandwidths has been repeatedly shown in the literature [
These technical advantages of Internet telephony translate into improved speech perception by hearing-impaired and normal-hearing adults, at least when simulated under ideal laboratory conditions [
Each data packet originating in a voice over Internet protocol (VoIP)-sending device (not shown) takes a different route through the Internet (TCP/IP network) before arriving at the receiver. Data packets may be delayed or lost on the way. The VoIP software includes two solutions to improve audio quality in these cases: the jitter buffer collects the maximum number of data packets by waiting as long as needed and keeping the time delay to a minimum; and packet loss concealment (PLC) aims to reconstruct lost data packets. Finally, digital data packets are decoded and delivered to a VoIP-compatible user interface, such as a VoIP handheld telephone, a headset, or external loudspeakers.
We conducted all tests between January and June 2009 at the University Department of Otorhinolaryngology, Head and Neck Surgery, Inselspital, Bern, Switzerland. The study protocol was approved by the local institutional review board. All patients gave written informed consent.
Participants in the study were 20 hearing-impaired adults, consisting of 10 users of cochlear implants and 10 users of hearing aids, and 10 normal-hearing adults. All test participants were at least 18 years old and were selected from the institution’s clinical database. Mean age was 46 years in the cochlear implant group, 68 years in the hearing-impaired group, and 35 years in the normal-hearing group. A total of 90% of participants who were tested in our previous experimental study [
Clinical data for cochlear implant users and hearing aid users.
Participant | Sex | Age |
Hearing loss |
Device brand |
Aided German monosyllable |
||||
Ear | 60 dB | 75 dB | 80 dB | ||||||
|
|||||||||
1 | Ma | 77 | Progressive | MED-EL Pulsar/Opus 2 | Left | 77.5 | NAb | 97.5 | |
2 | M | 17 | Postmeningitic | MED-EL Pulsar/Opus 2 | Left | 77.5 | NA | 87.5 | |
3 | Fc | 39 | Congenital | MED-EL Pulsar/Opus 2 | Right | 62.5 | NA | 72.5 | |
4 | F | 69 | Progressive | MED-EL C40+ Tempo+ | Left | 72.5 | NA | 85 | |
5 | F | 48 | SHLd | MED-EL C40+ Tempo+ | Right | 77.5 | NA | 77.5 | |
6 | F | 61 | Progressive | MED-EL Pulsar/Opus 2 | Left | 50 | NA | 65 | |
7 | F | 22 | SHL | MED-EL C40+ Tempo+ | Right | 55 | NA | 75 | |
8 | M | 50 | Congenital | MED-EL C40C Tempo+ | Right | 85 | NA | 80 | |
9 | M | 58 | Meningitis | MED-EL C40C Tempo+ | Left | 70 | NA | 65 | |
10 | F | 23 | Progressive | MED-EL C40C Tempo+ | Right | 55 | NA | 52.5 | |
|
|||||||||
1 | F | 66 | Presbycusis | BTE/Oticon Tego Pro VC | Left | 100 | 100 | NA | |
2 | M | 77 | Presbycusis | ITE/Bernafon Symbio XT | Left | 100 | 100 | NA | |
3 | M | 86 | Presbycusis | BTE/Phonak Piconet 2 | Left | 90 | 95 | NA | |
4 | F | 79 | Presbycusis | ITE/Bernafon Neo 315 | Left | 75 | 95 | NA | |
5 | M | 91 | Presbycusis | BTE/Widex Inteo | Left | 25 | 85 | NA | |
6 | M | 62 | Presbycusis | BTE/Phonak Una M AZ | Right | 30 | 80 | NA | |
7 | M | 76 | Progressive | BTE/Phonak Extra | Left | 70 | 85 | NA | |
8 | F | 36 | Congenital | BTE/GN ReSound Air | Right | 100 | 100 | NA | |
9 | M | 63 | Progressive | BTE/Phonak micro eXtra | Left | 90 | 100 | NA | |
10 | M | 41 | Progressive | BTE/Phonak Audéo | Right | 70 | 95 | NA |
a Male.
b Not applicable.
c Female.
d Sudden hearing loss.
Lower quartile, median, upper quartile, and 1.5*interquartile range (X = outliers) of aided hearing thresholds in the free sound field for cochlear implant (CI) and hearing aid (HA) users. The analog telephone (public switched telephone network [PSTN]) speech signals are shown in dB hearing level as dotted lines. VoIP = voice over Internet protocol.
All tests were performed in the free sound field in a sound-treated room. The speech test material was played on an audio compact disc (CD) player connected to an audiometer (GSI 61; Grason-Stadler, Milford, NH, USA) and was reproduced using a pair of loudspeakers (Type 1030 A; Genelec Oy, Iisalmi, Finland) situated 1 m from the front of the patient’s head.
All tests were conducted monaurally. The most suitable ear was selected based on the inclusion criteria; if both ears were equally suitable, the ear commonly used for telephony was used. The opposite ear canal was occluded with an earplug (E.A.R. Classic, Aearo Technologies, Stockport, UK). The specified average attenuation of these earplugs is 24.6–41.6 dB in the 250- to 4000-Hz range. Bilateral cochlear implant users had to switch off one of their devices to produce homogeneous and comparable data. Monaural testing was necessary because not all individuals had the same degree of hearing loss on both sides, and speech perception performance varied between the first and second listening device. In addition, monaural testing more realistically reflects the use of a conventional telephone handset.
We used the standardized German Hochmair-Schulz-Moser (HSM) Sentence Test [
Frequency and filter characteristics for each audio quality
Codec | Frequency |
Sampling rate | Fp (Hz)a | Fs (Hz)b | Ap (dB)c | As (dB)d |
PSTNe codec G.711 | 0.1–3.4 | 8 kHz G.711 A-Law | 3900 | 4400 | 1 | 60 |
iPCMwb codec | 0.1–8 | 16 kHz PCMf | 8000 | 8500 | 1 | 60 |
CD low-pass filtered | 0.1–8 | 16 kHz PCM | 8000 | 8500 | 1 | 60 |
a Frequency at the edge of the pass band.
b Frequency at the beginning of the stop band.
c Amount of ripple allowed in the pass-band (also called Apass).
d Stop-band attenuation.
e Public switched telephone network.
f Pulse code modulation.
The original audio CD files of the HSM Sentence Test were converted into a wave-format audio file using the Switch Audio File Converter software, version 1.05 (NCH Software Pty Ltd, Canberra, ACT, Australia). The speech and noise channels were mixed to mono wave files with a sampling rate of 44.1 kHz, thereby allowing identical signal processing for speech and noise. Before encoding, the audio files were low-pass filtered using MATLAB software, version 7.9.0.529 (The MathWorks, Inc, Natick, MA, USA).
The audio files were then converted again into raw files using Switch Audio File Converter software. To generate a VoIP simulation with different extents of packet loss, the raw data were encoded using simulation software (a voice engine demonstration application) in conjunction with a modern iPCMwb codec (0.1–8 kHz; Global IP Solutions, San Francisco, CA, USA).
Parameters of the voice over Internet protocol (VoIP) simulations.
Condition | Description |
Loss rate |
Loss length |
BurstR | Frame length |
q |
0 | Perfect | 0.0 | 1.0 | |||
1 | Mild loss (5%) | 0.05 | 1.15 | 1.1 | 640 | 0.87 |
2 | Medium loss (10%) | 0.10 | 1.30 | 1.2 | 640 | 0.77 |
3 | Severe loss (20%) | 0.20 | 2.0 | 1.43 | 640 | 0.5 |
We simulated 4 different scenarios: 1 scenario without packet loss and 3 scenarios with packet losses of 5%, 10%, and 20%. In
To simulate conventional telephone audio quality, we implemented a PSTN G.711 A-Law codec, which is a standard in PSTNs, in the Switch Audio File Converter software
All 5 audio CDs (4 VoIP simulations and 1 PSTN quality simulation) were reproduced using an active loudspeaker system (Genelec Type 1030 A). They were calibrated in the free sound field using a Type 2636 measuring amplifier and a Type 4133 FF measuring microphone connected to a Type 2619 preamplifier (all from Brüel & Kjær Sound & Vibration Measurement A/S, Nærum, Denmark). We used a Pistonphone Brüel & Kjær 4288 precision calibrator to calibrate the measurement arrangement. The final measurements showed no difference in the sound pressure levels of speech signals across different audio signals.
W used a 2-tailed Wilcoxon matched-pairs signed rank test to compare the scores obtained under different VoIP versus telephone quality simulations. For the condition with no packet loss (condition 0,
Across all test groups, speech perception scores assessed with different VoIP qualities versus conventional telephone quality were higher in 39 out of 60 test conditions (
Cochlear implant users showed improved speech perception using VoIP in 19 out of 20 test conditions (
Hearing aid users had improved speech perception scores with different VoIP qualities in half of the test conditions (
Normal-hearing adults showed an average benefit of 20.8% (range 1%–53%) with different VoIP qualities under half of the conditions (
When we experimentally increased VoIP packet loss from 0% to 10%, speech perception scores dropped only mildly (
Mean differences in speech perception scores (D%) assessed with different voice over Internet protocol qualities (degree of packet loss) versus conventional telephone quality using the Hochmair-Schulz-Moser (HSM) Sentence Test.
Participant |
SNR |
Packet loss | |||||||
None (0%) | Mild (5%) | Medium (10%) | Severe (20%) | ||||||
D% |
|
D% |
|
D% |
|
D% |
|
||
CIb users | 0 | +2 | NAc | +1 | NA | +2 | NA | +2 | NA |
5 | +21 | .002d | +11 | .01d | +9 | .008d | +1 | .81 | |
10 | +36 | .001d | +34 | .002d | +30 | .002d | +4 | .04e | |
15 | +36 | .001d | +23 | .01e | +25 | .006d | +6 | .25 | |
Quiet | +16 | .001d | +16 | .002d | +16 | .002d | –1 | .92 | |
HAf users | 0 | +4 | .16 | +2 | .69 | +0 | >.99 | –1 | .02e |
5 | +18 | .002d | +4 | .56 | +6 | .38 | –6 | .16 | |
10 | +16 | .003d | +9 | .08 | +4 | .32 | –12 | .02e | |
15 | +6 | .004d | –4 | .65 | –7 | .25 | –17 | .004d | |
Quiet | +2 | .16 | –1 | .31 | –1 | .56 | –7 | .004d | |
NHAg | 0 | +53 | .001d | +46 | .002d | +33 | .002d | +22 | .002d |
5 | +17 | .003d | +16 | .002d | +16 | .004d | –3 | .49 | |
10 | +1 | .21 | +3 | .06 | +1 | .47 | –6 | .009d | |
15 | 0 | NA | 0 | NA | –1 | NA | –1 | >.99 | |
Quiet | 0 | NA | 0 | NA | –1 | NA | –1 | NA |
a Signal to noise ratio.
b Cochlear implant.
c Not applicable.
d Statistically significant with Bonferroni correction.
e Statistically significant without Bonferroni correction.
f Hearing aid.
g Normal-hearing adults.
Speech perception scores assessed with the HSM Sentence Test are plotted against different signal to noise ratios (SNRs) for cochlear implant users (A), hearing aid users (C), and normal-hearing adults (E) for 4 different VoIP qualities (0%, 5%, 10%, and 20% packet loss) and 1 ideal conventional telephone quality. The impact of different network conditions with increasing data packet loss (x-axis) on word discrimination scores is shown for different SNRs in B, D, and F. The black triangle indicates the speech perception level corresponding to a conventional telephone with a constant and stable transmission. VoIP = voice over Internet protocol.
The present study confirmed that simulations of Internet versus conventional telephony quality are associated with improved speech perception by hearing-impaired and normal-hearing adults under perfect network conditions without packet loss or delay in the laboratory. The advantage for cochlear implant users and normal-hearing adults persists, even when the VoIP quality is reduced by 5% and 10% packet loss. Similarly, hearing aid users also scored higher with VoIP than with conventional telephony, but the differences did not reach statistical significance.
In general, speech perception scores assessed with VoIP quality remained good until a packet loss of 10%. Interestingly, VoIP simulation under severely adverse network conditions (with a packet loss of 20%) was not inferior to a perfect conventional telephone simulation in the majority of test conditions and for most of the tested participants.
Our research group earlier showed the superiority of VoIP versus conventional telephony simulations under perfect network conditions [
Cochlear implant users had the lowest speech perception scores of all 3 tested groups in our study (
The second reason for the advantage of VoIP is the conservation of high audio quality through digital signal processing using the chosen iPCMwb codec. In the first study by our group, speech perception scores that were assessed with VoIP quality were equal to scores obtained with frequency-restricted (0.1–8 kHz), uncompressed audio CD quality [
Speech perception is more challenging with increasing competing noise, or decreasing SNRs. In particular, elderly hearing-impaired individuals with predominant high-frequency hearing loss suffer in noisy test conditions; in addition to complication from competing noise, the high-frequency content of speech is missing. VoIP may be helpful, because it transmits the high-frequency content of speech and because it offers the possibility of presenting the speech signal simultaneously to both ears through external loudspeakers, thereby allowing binaural hearing, which is a well-known advantage for speech perception in noise. Additionally, wired or wireless links enabling binaural hearing from 1 telephone signal are already available for hearing aids and cochlear implants.
The measurement of packet loss under real VoIP transmission is a challenge for many VoIP companies because there is no constant data transmission over the Internet [
A decade ago, when VoIP telephony was not so highly developed, the average packet loss for a large number of measurement traces has been reported to be below 8% (p < 0.08 and q > 0.8) [
It can therefore be assumed that telecommunication using VoIP should substantially improve speech perception compared with conventional telephony under real network conditions, since the benefit of VoIP was measurable for most of the test participants up to an experimental packet loss of 10% in our study. A packet loss of more than 10% has a significant impact on sound quality with tone bursts, interruptions, extended time delay, and jitter of the audio signal. This is shown in
The calculations and models of packet loss depend on the measuring method used [
To our knowledge, no other group has assessed the speech perception of hearing-impaired individuals using Internet telephony under adverse network conditions. The results of the present study therefore fill an important gap. Measuring packet loss under controlled laboratory conditions offers the opportunity to systematically address a highly variable phenomenon in the real network.
Many technical parameters that may further influence speech perception by the end user have not been addressed in the present study. Every conversation can be disturbed when data packets arrive late to the receiver [
Our test results may be important for hearing-impaired individuals, including hearing aid and cochlear implant users, because there is now strong experimental evidence for real improvement in speech perception when using VoIP instead of conventional telephones. The study is also important for physicians, audiologists, cochlear implant engineers, speech therapists, and other professionals who care for hearing-impaired individuals. Professionals should encourage hearing-impaired individuals to try VoIP, which is typically downloadable at no cost from most providers. Patients who already own a computer may be able to gain the benefits of VoIP at no cost. The use of external loudspeakers connected to the computer may further improve speech perception by permitting bilateral hearing and additional amplification through the volume control, which should be mentioned to the patients. Hearing aid and cochlear implant accessories, such as an FM transmitter and 3.5-mm audio jack, may also be helpful for coupling the computer directly to the hearing device. Patients should be advised that both sender and receiver should have a good microphone and loudspeaker system to take advantage of VoIP’s broadband advantage over conventional telephony.
Speech perception by hearing-impaired individuals and normal-hearing adults is improved when using perfect VoIP versus perfect conventional telephony transmission under controlled laboratory conditions. The superiority of VoIP persists even under experimental adverse network conditions, but not to the same extent and not for all tested individuals. Cochlear implant users seem to benefit more than hearing aid users because their devices are better suited to exploit VoIP’s broadband frequency range.
compact disc
Hochmair-Schulz-Moser
public switched telephone network
signal to noise ratio
voice over Internet protocol
The authors would like to thank Christoph Schmid, engineer, for technical support and Aisha Spring for editing the figures. The authors would like to thank Global IP Solutions (GIPS, San Francisco, CA, USA) for providing their simulation software and the iPCMwb codec. We are very grateful to Michael Poppler and Tina le Grand, GIPS engineers, for their suggestions regarding the manuscript.
Financial support for this study was obtained from the foundation Stiftung für die Erforschung von Hör- und Sprachstörungen, Bern.
None declared.