Orthography

The Orthography tier in Phon encodes the spoken form of an utterance. Phon's orthography format is based on the CHAT transcription system used by TalkBank and CHILDES. The CHAT main line maps directly to Phon's Orthography tier, with one difference: CHAT embeds media segment timing on the main line, while Phon stores media segments in a separate Segment tier.

Within the Orthography tier, words are separated by spaces. Each word may be modified using the prefixes and suffixes defined below. Utterances end with a terminator (e.g., period, question mark). Events, annotations, and other coding may also appear inline.

For the complete CHAT specification, see the CHAT Manual.

Words

Words are the basic units of the Orthography tier. A word is a series of characters separated by spaces. The first word of an utterance is not capitalized unless it is a proper noun or a word normally capitalized on its own (e.g., "I" in English, nouns in German).

Word Prefixes

Word prefixes identify words with special status. Prefixed words appear in a secondary color in the Transcript view to distinguish them from regular word text.

Prefix Category Example
0 word omission (excluded from word alignment) 0word
&+ fragment (incomplete word) &+ba
&- filler &-um
&~ nonword &~ba

Special Form Markers

Special form markers identify words that are not found in standard dictionaries or have some special status. They are placed at the end of a word after the @ symbol. Special form markers appear in a secondary color in the Transcript view to distinguish them from regular word text.

Suffix Category Example
@a addition xxx@a
@b babbling abame@b
@c child-invented form gumma@c
@d dialect form younz@d
@e echolalia want@e more@e
@f family-specific form bunko@f
@fp filled pause um@fp
@g general special form gonga@g
@i interjection uhhuh@i
@k multiple letters (kana) abcd@k
@l letter b@l
@n neologism breaked@n
@nv no voice ha@nv
@o onomatopoeia woofwoof@o
@p phonology consistent form aga@p
@q quoted metareference no@q
@sas sign & speech apple@sas
@si singing lalala@si
@sl signed language apple@sl
@t test word wug@t
@u UNIBET transcription binga@u
@wp word play goobarumba@wp
@x words to be excluded stuff@x
@z:* user-defined code word@z:rtfd

Untranscribed Material

These special words represent material that cannot be transcribed normally. Untranscribed words should be aligned with * in IPA Target and IPA Actual tiers.

Code Meaning
xxx unintelligible speech
yyy unintelligible with phonological coding on %pho line
www untranscribed material

Incomplete Words

When a word is incomplete but the intended meaning is clear, the missing material is enclosed in parentheses within the word: (be)cause, sit(ting). The word is treated as the complete form for analysis.

Compound Words and Clitics

Two types of word concatenation are supported:

  • + joins compound words, e.g., bird+house
  • ~ joins clitics to their host word, e.g., is~n't

Language Specification

Words from a secondary language are marked with @s followed by a language code, e.g., istenem@s:hu for a Hungarian word in an English transcript. Language codes follow the ISO 639 standard.

Utterance Terminators

Every utterance must end with a terminator. The three basic terminators are the period, question mark, and exclamation point. Special terminators begin with + and end with a basic terminator.

Terminator Name Description
. period end of a declarative utterance
? question end of a question
! exclamation end of an imperative or emphatic utterance
+. broken for coding utterance broken at a phrasal boundary to mark overlap
+... trail off incomplete utterance where speaker trails off
+..? trail off question question that trails off
+!? question with exclamation question spoken with amazement
+/. interruption utterance interrupted by another speaker
+/? interruption question question interrupted by another speaker
+//. self interruption speaker breaks off and starts a new utterance
+//? self interruption question question where speaker self-interrupts
+"/ . quotation next line quoted material follows on next line
+". quotation precedes quoted material preceded this utterance

Utterance Linkers

Linkers appear at the beginning of an utterance to indicate how it connects to a preceding utterance. All linkers begin with +.

Linker Name Description
+" quoted utterance marks an utterance being directly quoted
+^ quick uptake utterance follows immediately after previous speaker
+< lazy overlap utterance overlaps previous utterance (without specifying extent)
+, self completion completion of own utterance after interruption
++ other completion completion of another speaker's utterance

Pauses

Unfilled pauses between words are coded with parentheses containing periods. Pauses are included in word alignment and should be aligned with the corresponding pause in IPA Target and IPA Actual tiers.

Notation Length Example
(.) simple (short pause) I don't (.) know .
(..) long pause I don't (..) know .
(...) very long pause (...) what do you think ?
(1.5) numeric (exact seconds) I don't (0.15) know .

Numeric pauses may include minutes using a colon, e.g., (1:05.15) for one minute and 5.15 seconds.

Events

Events describe actions, sounds, or other non-word occurrences that happen at a specific point within an utterance.

Simple Events (Happenings)

Simple events use the &= prefix to mark actions and sounds such as coughs, laughs, and sneezes: &=laughs, &=coughs, &=sneezes.

Events may include an object after a colon: &=imit:motor, &=points:car.

Interposed Words

An interposed word from another speaker is marked with &* followed by the speaker's three-letter ID, a colon, and the word:

when I was at my friend's house &*MOT:mhm the dog tried to lick me .

Long Events

Events spanning multiple words use begin/end markers:

  • Vocal: &{l=laughs ... &}l=laughs
  • Nonvocal: &{n=waving:hands ... &}n=waving:hands

Scoped Annotations

Scoped annotations are enclosed in square brackets and refer to stretches of speech. When angle brackets precede a scoped annotation, the annotation applies to the enclosed material. Without angle brackets, the annotation applies to the single preceding word.

Example: <I wanted> [/] I wanted cereal .

Retracing and Repetition Markers

Marker Name Description
[/] retracing (repetition) speaker repeats preceding material without change
[//] retracing with correction speaker repeats and corrects preceding material
[///] retracing reformulation complete reformulation of the message
[/?] retracing unclear type of retracing is uncertain
[/-] false start speaker abandons utterance and starts a new one

Stressing, Guessing, and Exclusion

Marker Description
[!] stressing of preceding word(s)
[!!] contrastive stressing
[?] best guess (transcription uncertain)
[e] excluded material (omitted from analysis)

Group Annotations

Group annotations provide explanatory text within square brackets:

Notation Name Example
[= text] explanation (target word) etymologist [= entomologist]
[=! text] paralinguistics that's mine [=! cries] .
[=? text] alternative transcription one or two [=? one too]
[% text] inline comment wouldn't [% said with emphasis] do that

Replacements

A replacement substitutes a standard form for a nonstandard form on the preceding word:

  • [: text] — replacement (e.g., whyncha [: why don't you])
  • [:: text] — real-word replacement

Error Marking

Errors are marked by placing [* text] after the error, e.g., goed [: went] [*] .

Duration

Duration of a preceding event or word can be marked as [# M:S.ms], e.g., [# 1:05.3].

Overlap Markers

Overlap markers indicate simultaneous speech between speakers:

  • [>] — overlap follows (material overlaps with next speaker)
  • [<] — overlap precedes (material overlaps with previous speaker)

When multiple overlaps occur in a single utterance, they are numbered: [>1], [<1], [>2], [<2].

Postcodes and Freecodes

  • [+ code] — postcode: utterance-level code placed after the terminator
  • [^ code] — freecode: marks a local event at the point of insertion

Separators and Tag Markers

Separators provide conventional punctuation within utterances. Unlike terminators, separators do not end the utterance. Tag markers indicate pragmatic function.

Symbol Description
, comma (pause, syntactic juncture)
; semicolon
[^c] clause delimiter
(double dagger) vocative marker
(double low quote) tag marker

Quotation

Short quoted stretches within an utterance are enclosed in curly (typographic) quotation marks: (begin, U+201C) and (end, U+201D). For longer quoted material spanning multiple utterances, use the quotation terminators and linkers (+"/. and +").

Tone Direction Markers

Tone markers indicate intonation contour at the end of or within utterances:

Symbol Direction
(U+21D7) rising to high
(U+2197) rising to mid
(U+2192) level
(U+2198) falling to mid
(U+21D8) falling to low

Prosody Within Words

Prosodic features can be marked within words:

Symbol Feature Example
: drawl (lengthened syllable) bana:nas
^ pause between syllables rhi^noceros
ˈ (U+02C8) primary stress baˈna:nas
ˌ (U+02CC) secondary stress ˌbaˈna:nas

Phonetic Groups

When multiple orthographic words map to a single phonetic transcription, they can be grouped using angle quotation marks: (U+2039) and (U+203A). For example, ‹going to› represents two orthographic words with a single phonetic form.

Conversation Analysis (CA) Coding

Phon supports an extensive set of Conversation Analysis symbols for detailed transcription of speech features. These are specialized symbols used primarily in CA research.

CA Elements Within Words

The following Unicode symbols can appear within words to mark articulatory features:

Symbol Unicode Feature
U+2260 blocked segments
U+223E constriction
U+2219 inhalation
U+2193 pitch down
U+21BB pitch reset
U+2191 pitch up
U+2051 hardening
U+2907 hurried start
U+2906 sudden stop

CA Scope Delimiters

These symbols mark the beginning of a stretch of speech with a particular quality. The same symbol marks the end of the stretch:

Symbol Unicode Quality
U+204E creaky voice
U+2206 faster
U+25C9 louder
° U+00B0 softer
U+2207 slower
U+222C whisper
U+263A smile voice
U+222E singing
§ U+00A7 precise
U+21AB repeated segment
U+2047 unsure
U+264B breathy voice
U+2594 high pitch
U+2581 low pitch

CA Overlap Points

Precise overlap onset and offset can be marked within words using bracket-like symbols:

Symbol Unicode Position
U+2308 top start (first speaker overlap begins)
U+2309 top end (first speaker overlap ends)
U+230A bottom start (second speaker overlap begins)
U+230B bottom end (second speaker overlap ends)