User:Keffy/Transcription essay

Transcription types and the IPA

This essay is about the International Phonetic Alphabet and the most appropriate type of transcription to use in Wiktionary's pronunciation entries. I wrote it in response to a number of misconceptions that since the day I got here I've heard repeated by several people whose opinions are bang on in almost every other facet of lexicography.

I suppose it's inevitable. The concepts of linguistics have been poorly explained to the general public at the best of times, and it doesn't help that the idealistic hopes of phoneticians in the 1880s (which, like so many other hopes of 19th-century science, never panned out because the world just doesn't work that way) have continued to be presented as what phonetics is all about. Hopefully, this essay will clear up some of the confusions before we make any more policy decisions based on those confusions.

The short version of the following is:

Narrow phonetic transcriptions would be a nightmare to use in pronunciation entries.
Broad or phonemic transcriptions are the right level of generalization, which is why all other dictionaries use them.
It's not only possible to do phonemic transcriptions in IPA -- it's what the IPA was invented for.
Fortunately, we're mostly already doing that. We just need to stop misusing terms.

Why narrow phonetic transcriptions are inappropriate

I just now spoke the word Canada in a sentence. [Imagine the audio here.] Using the full arsenal of the IPA, I could make a "general phonetic" or "impressionistic" (the IPA's terms, not mine) of my utterance as follows:

[k̟ʰæ̠͋ːn̞ɐ̝ɾ̬ɞ̟̆]
(It's not actually the full arsenal, since I have no reliable way to stack multiple diacritics and have you actually see them, or to type the vertical-staff tone letters that I'd love to have used for relative pitch levels. But this is ugly enough to make my point.)

Should I run off and add this our article on Canada? Of course not.

The practical problem: It's utterly useless. Absolutely nobody would benefit from seeing that mess. Even an experienced phonetician would need to put in a lot of work to decipher what on earth I meant.

The conceptual problem: A "general phonetic" transcription is only possible for a single utterance or "speech event". If I were to say Canada again, there would be some details that are different the second time around. If my sister said it once, it would be slightly different again. If I grabbed a random English speaker from the streets of my city and had them say it once, there'd be more differences still. We certainly don't want a separate transcription for every single time the word has been spoken in the history of the universe. We want something more general, something that abstracts away from minor differences and shows what all the utterances have in common. But what counts as a "minor" difference? And exactly what do all these pronunciations have in common? At the "general phonetic" level, nothing relevant or useful. We're stuck.

We're faced with a ton of variation in one "sound" in actual speech -- variation that for many purposes (including ours) is completely irrelevant.

random variation: When people say the last vowel of Canada, their tongues will be in slightly different positions in their mouths each time, with little rhyme or reason. I could use a bunch of different phonetic transcriptions to capture as many of the subtle variations as I can: [ɘ], [ɜ], [ɵ], [ɐ̝], [ɛ̆], [ɛ̽], [a̽], [ɪ̈], [ɯ̙], [ɯ̽] invalid IPA characters (][][][][][][][][][), and many, many more. But what I want is to ignore all of that and to write them all as [ə].
contextual variation: My brain is firmly convinced that there's a "T" sound in a number of different words: tie, still, tree, straw, water, battle, hat, eighth. But what my mouth is actually doing is systematically different in each: [tʰ], [t], [ʈʰ], [ʈ], [ɾ], [tˡ], [ʔt] [t̪] invalid IPA characters (][][][][][][][). Worse, the systematicness doesn't always create a unique result -- I usually say something close to [ˈwɑːɾə˞], but sometimes it's [wɑːʈə˞], sometimes [wɑːʈʰə˞]. Putting all those differences into my transcriptions would help nobody. It misses the point that, for the purposes dictionary users are interested in, all those piddly variants count as the same thing.

So how do we justify ignoring the piddly variants and writing something that's psychologically "the same thing" the same way every time? How do we decide which details are important enough and which aren't? How do we make sure we stay consistent about it?

For the record, here are some bad justifications: I'm too busy to worry about all of that. They all kinda sound similar to me. It just doesn't seem important enough. It doesn't really count as POV if I ignore the way my sister (or my friend, or the entire state of Alabama) says it and transcribe it the way I say it -- um, at least the way I think I usually say it. I said Canada a hundred times and counted: I used an [ɐ̝] 12% of the time and the next runner up was [ɛ̽] with 7%, so I'll just write [ɐ̝] for all of them.

Contrast as a criterion for including details

The only tried and true justification for not caring about the piddly variants is that they don't change the meaning. [ˈwɑːɾə˞], [wɑːʈə˞] invalid IPA characters (][), and [wɑːʈʰə˞] are all names for the same liquid. [ˈkænɜɾɘ], [ˈkænɜɾɜ], [ˈkænɜɾɵ], [ˈkænɜɾɐ̝], [ˈkænɜɾɛ̆] invalid IPA characters (][][][][), and the rest are all names for the same country. The different sounds don't contrast, so for many purposes I get to ignore the differences and write them all the same way. Dictionary pronunciation entries are one of those lucky purposes.

Pronunciation entries need to give their users just enough information so that they can pronounce the word they intend to pronounce, rather than some other word, possibly resulting in acute embarrassment. The user wanting to pronounce water only needs to make sure they don't accidentally say walker, waller, waiter, potter, and so on. Pronunciation entries that provide exactly the level of detail necessary to do this are ... [drumroll] ... phonemic transcriptions. (Also known as a "broad transcription", since you can never have too many terms for the same thing.)

In a phonemic transcription system, you choose one symbol to represent the whole set of detailed sounds that are interchangeable. Since the set is a mental category, not an actual sound, there's no reason you couldn't choose your symbol to be an integer or a Zapf dingbat or a Chinese character or even an AHD kludge. Sane linguists, however, choose an IPA symbol that is reminiscent of some of the members of the set and put it between slashes to emphasize that this is an even higher level of generalization. For the mess of [tʰ], [t], [ʈʰ], [ʈ], [ɾ], [tˡ], [ʔt] invalid IPA characters (][][][][][][), and [t̪], it makes sense to choose the symbol /t/. Getting rid of the piddly ugly details in [k̟ʰæ̠͋ːn̞ɐ̝ɾ̬ɞ̟̆], I can instead write /ˈkænədə/. Others even have a hope in hell of reading that and understanding it.

Judiciously choosing your symbols at this stage will buy you a whole lot of convenience later on. The following are all useful and entirely kosher considerations when choosing between potential symbols for a phoneme:

Pick a symbol whose narrow, "general phonetic" meaning is one of the range of variants, or at least somewhere in the neighbourhood.
Pick something easy to write or type.
Pick something easy to read.
Pick the same symbol that others have previously used for that sound in that language.

You can't succeed on all of these at the same time. (You can't even succeed on just the "be consistent with your predecessors" one in English, since they aren't even consistent with themselves.) Any reasonable choice you make in balancing these goals is legitimate, as long as you explain it explicitly to your readers and as long as you proceed to use it consistently.

I don't happen to agree with every single symbol choice that the Wiktionary community has made (or accidentally stumbled into). But the system as a whole is workable. It's far more important that the system be used consistently than for it to live up to my every aesthetic foible. Arguing about which way up the /r/ or /ɹ/ should be is a waste of time, even if I personally prefer /ɹ/.

Where do we go from here?

The good news is that existing Wiktionary pronunciation entries are predominantly phonemic. The slashes around our entries are usually not lies.

So our two choices are:

admit that we're already doing phonemic transcriptions and should continue to for a ton of good reasons; fix a large, but not unmanageable, number of entries where transcriptions aren't currently phonemic; communicate all interesting non-phonemic detail using the appropriate no-phonetics-necessary technology of audio files.
try to be unique among the dictionaries of the English-speaking world; claim not to be phonemic; deliberately aim to record irrelevant phonetic details; update the overwhelming majority of entries by adding those irrelevant phonetic details along with a tangle of very narrow region tags; get into edit wars over whether the region tags are right; get into constant arguments over the inherently undecideable question of which irrelevant phonetic details are so irrelevant they should be ignored; explain the rationale for all this to every sensible newbie who comes along and starts entering phonemic transcriptions -- and do all of this mountain of extra work with an under-trained and woefully under-sized pool of volunteers.

If the community consensus is honestly in favour of #2, could we at least try to minimize the damage to Wiktionary's credibility by agreeing to use the technical terms "phonemic" and "phonetic" in the same way as they're used in the disciplines where they're technical terms?

Some objections anticipated, caricatured, and responded to

Below I anticipate a few objections and preemptively try to defuse them. (If you think you recognize yourself in one of the objections, it's not just you. I've heard these several times from several people on Wiktionary, Wikipedia, and elsewhere.)

For those who may be tempted to think I'm making it all up about IPA, I'm including liberal quotations from the latest publication of the guidelines -- Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet; Cambridge University Press, 1999.

(The "Principles" are the organization's frequently revised two-page mini-mission-statement, found in appendix 1. Everything else is from the 36-page introduction that tries to explain the principles.)

IPA is only for phonetic transcriptions, not phonemic transcriptions.

Absolutely not.

The very first principle of the original 1888 Principles of the IPA was: "There should be a separate sign for each distinctive sound; that is, for each sound which, being used instead of another, in the same language, can change the meaning of a word." In short, the main original purpose was precisely to do phonemic transcriptions.

In the following century, it became increasingly clear that the phonemic transcription goal was incompatible with other stated goals of the IPA (such as universal meanings for symbols). Instead of jettisoning one goal or another, the IPA evolved into a flexible system which could be used to transcribe speech at several different levels of generalization, where different goals are relevant. The current Principles acknowledge the continuing importance of phonemic transcription: "The use of symbols in representing the sounds of a particular language is usually guided by the principles of phonological contrast." Much of the handbook's introduction is devoted to the issues involved in using IPA for phonemic transcriptions.

But an IPA symbol has to mean exactly the same sound in every language.

Outdated marketing hype. Get over it.

In the nineteenth century, the IPA wanted to do several things at the same time, without realizing the goals were incompatible. Since then, they've learned to live with doing incompatible things at different times. "Same sound = same symbol" is no longer an absolute for every use of the IPA.

Here's the current handbook, illustrating the more flexible interpretation of "same sound = same symbol" (and implicitly admitting that it's always been more honoured in the breach):

"When a symbol is said to be suitable for the representation of sounds in two languages, it does not necessarily mean that the sounds in the two languages are identical. Thus [p] is shown as being suitable for the transcription of pea in English, and also for pis is French; similarly [b] is shown as being suitable for the transcription of bee in English, and also for bis in French; but the corresponding sounds are not the same in the two languages. The IPA has the resources for denoting the differences, if it is necessary to do so, as illustrated below in section 4; but at a more general level of description the symbols can be used as a representation in either language." (p. 18)

But it's still wrong to use IPA symbols in a different way from how they show up on the chart, even in a phonemic transcription.

No. It's a time-honoured and accepted practice.

Here's Daniel Jones, the most famous phonetician of the first half of the twentieth century, advocating the practice more eloquently than I could hope to:

"The right solution of the problem of international phonetic transcription for language learners is, I believe, that the number of letters should be as small as practicable, and that each letter should have an elastic value. Among the values which each letter can represent is naturally a 'cardinal' one. But it should, in my opinion, be a recognized principle that when a letter is not needed to denote a 'cardinal' sound, it may be used for representing other sounds having reasonably near relationships to that cardinal sound. And cases may occur where it is convenient to assign to a given letter other values than the cardinal whenter or not the cardinal sound occurs as a member of a phoneme of the language to be transcribed." (page 224, The Phoneme: Its Nature and Use, 3rd edition, Heffer and Sons, 1967 [originally 1950].)

Here's the current IPA handbook discussing how to write subtle phonetic details (the discussion it promised in the passage quoted above), and in the process confirming the normal practice of people who are not fussing about details:

"In providing the means to how the detail of phonetic realization in a given language, the IPA also achieves the delicacy of notation needed to compare the phonetic detail of different languages. For instance, although a phonemic representation /tru/ might be suitable for the English word true or the French word trou, the difference in pronunciation of the two words is reflected in phonetically more detailed representations such as [t̠ɹ̥ʉ] (true) and [t̪ʁ̥u] (trou)." (p. 28)

Oh no! The dreaded rightside-up /r/ being used for English by the IPA handbook itself!

(And keep in mind that even [t̠ɹ̥ʉ] and [t̪ʁ̥u] represent only tiny subsets of the delicate details you could -- do you really still believe should? -- record.)

But I really want to look down on dictionary X for using the IPA differently from me.

Go right ahead. I certainly do. Just accept that it's an entirely aesthetic judgment. It's got nothing to do with the rules.

"There can be many system of phonemic transcription for the same variety of a language, all of which conform fully to the principles of the IPA." (p.30)

"The IPA does not provide a phonological analysis for a particular language, let alone a single 'correct' transcription, but rather the resources to express any analysis so that it is widely understood." (p.30)

But a phonemic transcription applies to several dialects at the same time.

No. Strictly, a phonemic transcription applies only to the mental system of a single speaker. If several speakers share isomorphic mental systems, you can get away with using the same transcription.

The cool part is that there are many regions of the English-speaking world where most speakers' mental systems are close enough to being isomorphic that the same phonemic transcription can be useable for several dialects/accents, assuming you make your symbol choices judiciously.

For example, a DC bureaucrat might pronounce ban as [bæˑn]. A younger Canadian or Californian might say [ba̟n]. A teenager in a Detroit suburb might say [be̞ə̯n]. But the system of meaning contrasts is the same for each of them -- ban is different from Ben and Bonn and bun, etc. So if I've chosen /æ/ as my symbol for the DCer's phoneme that includes [æˑ], /æ/ for the Canadian or Californian's phoneme that includes [a̟], and /æ/ for the Detroit teenager's phoneme that includes [e̞ə̯], then I can simply put /bæn/ as the "North American" pronunciation entry on the ban page, and it's maybe 90% true -- not to mention more useful than finding "(Canada, western) [ba̟n]; (parts of N.E. US) [be̞ə̯n]".

However, this was only possible because all the speakers had nearly identical mental systems of contrasts between their vowels. When their systems of contrasts aren't isomorphic, as they aren't for me and Widsith (to pick a random British Wiktionarian), the trick won't work and we'll need different regionally-tagged phonemic transcriptions. The ability to perform mildly trans-dialectal transcriptions is a wonderful accident when it happens, and we should opportunistically seize on every one of them. But it's neither a goal of phonemic transcription nor, alas, perfectly possible.

But shouldn't we be trying to cover all the dialects in a single transcription?

Maybe. Probably not.

It wouldn't be an unworthy goal for a dictionary to try to base its pronunciation entries on what Daniel Jones called "diaphones". But both the conceptual and practical problems are enormous. The most obvious practical problem is that doing it right requires more expertise in English dialectology than any single editor will ever likely have.

Take a quick look at the "lexical sets" that Jimp has been adding to Appendix:List of dialect-independent homophones, or at Wikipedia's w:Lexical_set. We'd need a different vowel symbol for each of those sets. The entry for bath would need a vowel symbol -- "a₃" perhaps? -- that's different from both the one in trap (since it doesn't have that vowel in many British dialects) and the one in father (since it doesn't have that vowel in most N.A. dialects). Suddenly almost every word you thought you knew the pronunciation of has become ambiguous. Sure, ban has the /æ/ phoneme in my dialect, but knowing for sure whether it belongs in the TRAP set or in the BATH set (and therefore knowing which transdialectal symbol should be used) would require me to know more about British dialects than I'm sure I do. I'd be even more lost at deciding between the LOT, PALM, CLOTH, and THOUGHT sets, and don't even get me started on NORTH vs. FORCE.

I'm not against transdialectal transcriptions, but IMHO they're a far more ambitious project than we're equipped to handle. We're already struggling with phonemic transcriptions.

But your way means we'll need a separate key for every language/dialect we do.

Sorry, but yeah, it does mean that. Of course, we'd need language-specific keys anyway, out of pure decency to our readers. Right? [insistently] Right?

IPA isn't magic. It might be great if users could bring a casual knowledge of IPA to bear language-neutrally and successfully decode any language's pronunciation entries, but it's not going to happen. Here's Daniel Jones again, elaborating on the passage quoted earlier:

"Stated in other terms the principle amounts to this: a particular letter cannot have precisely the same value in all the languages in which it is used; the learner of any particular language must familiarize himself with the value (or values when the phoneme has more than one member) assigned to it in that language." (p. 225)

Even the IPAssoc admits that casual universal inter-operability is a pipe-dream. The current Principles again:

"7. A transcription always consists of a set of symbols and a set of conventions for their interpretation. Furthermore, the IPA consists of symbols and diacritics whose meaning cannot be learned entirely from written discriptions of the phonetic categories involved. The Association strongly recommends that anyone intending to use the symbols should receive training in order to learn how to produce and recognize the corresponding sounds with a reasonable degree of accuracy." (p. 170)

If you honestly think it's fine to tell casual Wiktionary users to go away and take a year of university-level phonetics courses before they'll have the right to understand our transcriptions, Microsoft's department for user-experience quality control may have a job for you. Feel free to list me as a reference.

Meanwhile we need user-friendly keys, preferably with several audio examples of each phoneme in different word-positions and different accents.

But including subtle phonetic details that don't change meaning would help ESL students improve their accents.

Extremely unlikely. For this to happen, it would require:

us to transcribe the subtle details correctly and consistently on every single page. This plain just isn't going to happen. We're having a hard enough time making sure our broad transcriptions aren't howlingly inaccurate in the already tiny fraction of articles that have pronunciation entries.
readers to know enough IPA and phonetic theory to understand and use the gory details. Somewhat likelier, but there are few readers with the skills to do it, and vanishlingly few of those who couldn't also predict the details on their own at least as accurately as our editors. That's a mightly small target audience to justify redundancies, inconsistencies, and inaccuracies on a massive scale.

Audio recordings are the appropriate medium to communicate subtle phonetic details to ESL users and others.

But gory phonetic details are interesting. And We're Not Paper™.

I agree 100%.

And since We're Not Paper™, we have no excuse to tie ourselves to a bad application of a paper-based knowledge representation system in a domain that paper was always and will always be inherently inappropriate for.

We have audio.