| |
The articulation index, or, as it is now known, the speech intelligibility index, was originally developed by early telephone engineer-scientists to describe and predict the quality of telephone circuits (Fletcher, 1921; Collard, 1930; French and Steinberg, 1947). Initially their motivation was to provide a method that would reduce the need for lengthy (and expensive) human articulation tests to evaluate the merit of telephone circuit modifications. However, by 1947, with the appearance of the watershed papers of French and Steinberg (1947) and Beranek (1947), it was evident that articulation theory had the potential for much greater significance and broader application than was implied by these early goals.
The term articulation index (AI) was coined at the Bell Telephone Laboratories and first appeared in company memos as early as 1926 (French, 1926). It replaced the term quality index that Harvey Fletcher (1921) had proposed earlier. The new term better reflected the relationship between the index and what were then called articulation tests. Articulation tests were speech tests in which listeners were asked to identify speech sounds spoken by a caller under conditions of interest to the experimenter. The exact makeup of the articulation tests varied, but they usually consisted of a carrier sentence with a nonsense syllable test item at the end. Several callers would in turn utter the sentences via the test circuit to crews of six to eight listeners. The listeners would record what they heard phonetically. The circuit articulation was equal to the average proportion of the sounds (or syllables) heard correctly by the listeners. The articulation index was devised as an alternative to this procedure.
The articulation index is an index that describes the proportion of the total importance-weighted speech signal that is audible to the listener under specified conditions. Normally, the index's value (ranging from 0 to 1) is derived from physical measurements, including principally the speech signal's intensity level, the level of any noise that may be present, and the characteristics of the transmission system that delivers the speech and noise to the listener's ear. Reverberation effects are sometimes included. Index values are often modified based on certain well-known performance characteristics of the human auditory system. Most commonly these would include pure-tone thresholds and cross-frequency band spread of masking. The negative effects of listening at high signal levels might also be included, as well as positive factors, such as the effect of the listener being able to see the talker's face. The articulation index value can be used directly as an indicator of the relative quality of a communications system, or it can be used to predict average speech recognition success for the particular types of speech or speech elements under specified listening conditions through the use of an appropriate transfer function.
The early history of development of the articulation index is difficult to follow with certainty because much of this work was done at the Bell Telephone Laboratories and was described only in internal company memos and reports. Most was not published in the scientific literature for 10–25 years after it was carried out. Some was never published. Fortunately, copies of many of the original documents from this early period recently became available on a compact disk through the efforts of C. M. Rankovic and J. B. Allen (Rankovic and Allen, 2000).
The first credible effort to evaluate the effectiveness of a communication system based on physical measurements was that of H. Fletcher (1921). In an unpublished Western Electric Laboratory report, Fletcher pointed out that a suitable measure (index) of circuit quality must have the property of additivity. By this he meant that if a particular frequency range of speech heard alone has a quality value of Q1 and if a second frequency range has a value Q2, then the value of both ranges heard at the same time should equal Q1 + Q2. It was clear that articulation test scores did not possess this property (i.e., A1 + A2 ≠ A12). Therefore, Fletcher proposed an intermediate variable, related to articulation, that would at least approximate this additivity property. He also proposed methods to derive the index's value for a circuit from measures of received speech intensity, frequency distortion, room and line noise, “asymmetric distortion,” and other factors (Fletcher, 1921).
The term articulation index first appeared in the published literature in the landmark paper by Bell Telephone Laboratory scientists N. R. French and J. C. Steinberg in 1947. However, the term appears in internal Bell Labs documents as a replacement for quality index as early as 1926. The expression quality theory, however, continued in use for several more years before finally being replaced by articulation theory.
In a parallel development in England, John Collard, working for International Telephone and Telegraph, published a detailed description of an index with properties similar to those of the articulation index in a series of papers beginning in 1930. He called his index band articulation (or frequently just “the new unit”). Speech test scores were called sound articulation. By 1939 Collard had created a mechanical band articulation calculator (Pocock, 1939) that included many of the features of more recent articulation index methods. It is unclear whether the Bell Telephone Laboratory scientists were aware of Collard's work, but it is not cited in either their published or unpublished reports.
During World War II, Fletcher, then director of physical research at Bell Telephone and chairman of the National Defense Research Committee (NDRC), provided a description of at least some of the articulation index methods that had been developed at Bell Labs to Leo Beranek, then at Harvard University (Allen, 1996). Beranek was working on methods for improving communications for aircraft pilots as a part of the war effort under a contract from the NDRC. Allen (1996) reports that following the war, Beranek persuaded the Bell Labs group to finally publish a description of their work on the articulation index. The classic 1947 paper by N. R. French and J. C. Steinberg was the result. This paper was soon followed by Beranek's frequently cited paper on the articulation index (Beranek, 1947). These two papers together were to play highly influential roles in the future of the articulation index.
In 1950 Fletcher and his long-time Bell Labs associate Rogers Galt published a detailed description of their conception of the articulation index. Their goal in this paper was to cover a broader range of conditions than had ever been attempted before. “Telephony” was defined by the authors as referring to “any talker-listener combination” (p. 90). The results were seen as applicable to sound recording and reproduction systems, public address systems, and even hearing aids. In 1952, Fletcher further extended the applications to persons with hearing loss. Unfortunately, Fletcher and Galt's attempt to account for all aspects of the problem resulted in a conceptualization and method that was too complex for most people to understand, and it was seldom used. Recently, there has been a renewed interest in the method (Rankovic, 1997, 1998). An available computer program implementing the procedure by H. Müsch (2001) now makes its practical use feasible.
The next major steps in the history of the articulation index were taken by Karl Kryter. In two landmark papers (Kryter, 1962a, 1962b), he described and validated a comprehensive method for calculating the articulation index under a broad range of conditions. This work was based directly on the publications of Beranek (1947) and French and Steinberg (1947). Kryter's methods became even more influential in 1969 when they were adopted, virtually intact, by the American National Standards Institute (ANSI) as the American National Standard Methods for the Calculation of the Articulation Index (ANSI S3.5-1969). For some 30 years this method quite literally defined the articulation index.
An important development taking place in Europe during this period was that of the speech transmission index by T. Houtgast and H. J. M. Steeneken (Houtgast and Steeneken, 1973; Steeneken and Houtgast, 1980). Similar in many ways to the articulation index of French and Steinberg (1947), the speech transmission index added a unique method for incorporating and combining the effects of noise and reverberation into the calculation through measurements of the modulation transfer function (MTF). At first called the weighted MTF, the speech transmission index has found relatively widespread use in the area of architectural acoustics in an abbreviated form known as the rapid speech transmission index, or RASTI.
In 1997 ANSI published the American National Standard Method for Calculation of the Speech Intelligibility Index (ANSI S3.5-1997). This renamed standard is the direct successor of articulation index standard ANSI S3.5-1969 and it is similar to the earlier standard in basic concept. However, there are many differences in detail. One of the more obvious is the change in the name of the index to speech intelligibility index. The new name finally severs the connection with the now obsolete term, articulation test, and also avoids confusion between its abbreviation, AI, and the newer but more general use of this abbreviation to mean artificial intelligence.
Of the procedural differences, one of the more fundamental ones concerns the frequency importance function. The frequency importance function describes the relative importance of different frequency regions along the frequency scale for speech intelligibility. ANSI S3.5-1997 provides a function for average speech, but it also provides different functions for particular types of speech. This change from the earlier standard implies that frequency importance is significantly dependent on the characteristics of the speech material and not only dependent on the characteristics of the auditory system. In addition, ANSI S3.5-1997 provides for calculations based on the auditory critical band, uses newer methods for calculating spread of masking, includes a speech level distortion factor, uses different speech spectra for raised, loud, and shouted speech, and provides for use of the modulation transfer function methods of Houtgast and Steeneken (1980). To ensure accuracy, a computer program implementing the S3.5-1997 method was made available. This program lacks a user-friendly interface but provides an invaluable test of other implementations of the method.
| |