Orthography
The Orthography tier in Phon encodes the spoken form of an utterance. Phon's orthography format is based on the CHAT transcription system used by TalkBank and CHILDES. The CHAT main line maps directly to Phon's Orthography tier, with one difference: CHAT embeds media segment timing on the main line, while Phon stores media segments in a separate Segment tier.
Within the Orthography tier, words are separated by spaces. Each word may be modified using the prefixes and suffixes defined below. Utterances end with a terminator (e.g., period, question mark). Events, annotations, and other coding may also appear inline.
For the complete CHAT specification, see the CHAT Manual.
Words
Words are the basic units of the Orthography tier. A word is a series of characters separated by spaces. The first word of an utterance is not capitalized unless it is a proper noun or a word normally capitalized on its own (e.g., "I" in English, nouns in German).
Word Prefixes
Word prefixes identify words with special status. Prefixed words appear in a secondary color in the Transcript view to distinguish them from regular word text.
| Prefix | Category | Example |
|---|---|---|
0 |
word omission (excluded from word alignment) | 0word |
&+ |
fragment (incomplete word) | &+ba |
&- |
filler | &-um |
&~ |
nonword | &~ba |
Special Form Markers
Special form markers identify words that are not found in standard
dictionaries or have some special status. They are placed at the end of a
word after the @ symbol. Special form markers appear
in a secondary color in the Transcript
view to distinguish them from regular word text.
| Suffix | Category | Example |
|---|---|---|
@a |
addition | xxx@a |
@b |
babbling | abame@b |
@c |
child-invented form | gumma@c |
@d |
dialect form | younz@d |
@e |
echolalia | want@e more@e |
@f |
family-specific form | bunko@f |
@fp |
filled pause | um@fp |
@g |
general special form | gonga@g |
@i |
interjection | uhhuh@i |
@k |
multiple letters (kana) | abcd@k |
@l |
letter | b@l |
@n |
neologism | breaked@n |
@nv |
no voice | ha@nv |
@o |
onomatopoeia | woofwoof@o |
@p |
phonology consistent form | aga@p |
@q |
quoted metareference | no@q |
@sas |
sign & speech | apple@sas |
@si |
singing | lalala@si |
@sl |
signed language | apple@sl |
@t |
test word | wug@t |
@u |
UNIBET transcription | binga@u |
@wp |
word play | goobarumba@wp |
@x |
words to be excluded | stuff@x |
@z:* |
user-defined code | word@z:rtfd |
Untranscribed Material
These special words represent material that cannot be transcribed
normally. Untranscribed words should be aligned with
* in IPA Target and IPA Actual tiers.
| Code | Meaning |
|---|---|
xxx |
unintelligible speech |
yyy |
unintelligible with phonological coding on %pho line |
www |
untranscribed material |
Incomplete Words
When a word is incomplete but the intended meaning is clear, the missing
material is enclosed in parentheses within the word:
(be)cause, sit(ting). The word is
treated as the complete form for analysis.
Compound Words and Clitics
Two types of word concatenation are supported:
+joins compound words, e.g.,bird+house~joins clitics to their host word, e.g.,is~n't
Language Specification
Words from a secondary language are marked with @s
followed by a language code, e.g., istenem@s:hu for a
Hungarian word in an English transcript. Language codes follow the ISO
639 standard.
Utterance Terminators
Every utterance must end with a terminator. The three basic terminators are
the period, question mark, and exclamation point. Special terminators begin
with + and end with a basic terminator.
| Terminator | Name | Description |
|---|---|---|
. |
period | end of a declarative utterance |
? |
question | end of a question |
! |
exclamation | end of an imperative or emphatic utterance |
+. |
broken for coding | utterance broken at a phrasal boundary to mark overlap |
+... |
trail off | incomplete utterance where speaker trails off |
+..? |
trail off question | question that trails off |
+!? |
question with exclamation | question spoken with amazement |
+/. |
interruption | utterance interrupted by another speaker |
+/? |
interruption question | question interrupted by another speaker |
+//. |
self interruption | speaker breaks off and starts a new utterance |
+//? |
self interruption question | question where speaker self-interrupts |
+"/ . |
quotation next line | quoted material follows on next line |
+". |
quotation precedes | quoted material preceded this utterance |
Utterance Linkers
Linkers appear at the beginning of an utterance to indicate how it connects
to a preceding utterance. All linkers begin with +.
| Linker | Name | Description |
|---|---|---|
+" |
quoted utterance | marks an utterance being directly quoted |
+^ |
quick uptake | utterance follows immediately after previous speaker |
+< |
lazy overlap | utterance overlaps previous utterance (without specifying extent) |
+, |
self completion | completion of own utterance after interruption |
++ |
other completion | completion of another speaker's utterance |
Pauses
Unfilled pauses between words are coded with parentheses containing periods. Pauses are included in word alignment and should be aligned with the corresponding pause in IPA Target and IPA Actual tiers.
| Notation | Length | Example |
|---|---|---|
(.) |
simple (short pause) | I don't (.) know . |
(..) |
long pause | I don't (..) know . |
(...) |
very long pause | (...) what do you think ? |
(1.5) |
numeric (exact seconds) | I don't (0.15) know . |
Numeric pauses may include minutes using a colon, e.g.,
(1:05.15) for one minute and 5.15 seconds.
Events
Events describe actions, sounds, or other non-word occurrences that happen at a specific point within an utterance.
Simple Events (Happenings)
Simple events use the &= prefix to mark actions and
sounds such as coughs, laughs, and sneezes:
&=laughs, &=coughs,
&=sneezes.
Events may include an object after a colon:
&=imit:motor,
&=points:car.
Interposed Words
An interposed word from another speaker is marked with
&* followed by the speaker's three-letter ID, a
colon, and the word:
when I was at my friend's house &*MOT:mhm the dog tried to
lick me .
Long Events
Events spanning multiple words use begin/end markers:
- Vocal:
&{l=laughs...&}l=laughs - Nonvocal:
&{n=waving:hands...&}n=waving:hands
Scoped Annotations
Scoped annotations are enclosed in square brackets and refer to stretches of speech. When angle brackets precede a scoped annotation, the annotation applies to the enclosed material. Without angle brackets, the annotation applies to the single preceding word.
Example: <I wanted> [/] I wanted cereal .
Retracing and Repetition Markers
| Marker | Name | Description |
|---|---|---|
[/] |
retracing (repetition) | speaker repeats preceding material without change |
[//] |
retracing with correction | speaker repeats and corrects preceding material |
[///] |
retracing reformulation | complete reformulation of the message |
[/?] |
retracing unclear | type of retracing is uncertain |
[/-] |
false start | speaker abandons utterance and starts a new one |
Stressing, Guessing, and Exclusion
| Marker | Description |
|---|---|
[!] |
stressing of preceding word(s) |
[!!] |
contrastive stressing |
[?] |
best guess (transcription uncertain) |
[e] |
excluded material (omitted from analysis) |
Group Annotations
Group annotations provide explanatory text within square brackets:
| Notation | Name | Example |
|---|---|---|
[= text] |
explanation (target word) | etymologist [= entomologist] |
[=! text] |
paralinguistics | that's mine [=! cries] . |
[=? text] |
alternative transcription | one or two [=? one too] |
[% text] |
inline comment | wouldn't [% said with emphasis] do
that |
Replacements
A replacement substitutes a standard form for a nonstandard form on the preceding word:
[: text]— replacement (e.g.,whyncha [: why don't you])[:: text]— real-word replacement
Error Marking
Errors are marked by placing [* text] after the error,
e.g., goed [: went] [*] .
Duration
Duration of a preceding event or word can be marked as
[# M:S.ms], e.g., [# 1:05.3].
Overlap Markers
Overlap markers indicate simultaneous speech between speakers:
[>]— overlap follows (material overlaps with next speaker)[<]— overlap precedes (material overlaps with previous speaker)
When multiple overlaps occur in a single utterance, they are numbered:
[>1], [<1],
[>2], [<2].
Postcodes and Freecodes
[+ code]— postcode: utterance-level code placed after the terminator[^ code]— freecode: marks a local event at the point of insertion
Separators and Tag Markers
Separators provide conventional punctuation within utterances. Unlike terminators, separators do not end the utterance. Tag markers indicate pragmatic function.
| Symbol | Description |
|---|---|
, |
comma (pause, syntactic juncture) |
; |
semicolon |
[^c] |
clause delimiter |
‡ (double dagger) |
vocative marker |
„ (double low quote) |
tag marker |
Quotation
Short quoted stretches within an utterance are enclosed in curly
(typographic) quotation marks: “ (begin,
U+201C) and ” (end, U+201D). For longer
quoted material spanning multiple utterances, use the quotation
terminators and linkers (+"/. and
+").
Tone Direction Markers
Tone markers indicate intonation contour at the end of or within utterances:
| Symbol | Direction |
|---|---|
⇗ (U+21D7) |
rising to high |
↗ (U+2197) |
rising to mid |
→ (U+2192) |
level |
↘ (U+2198) |
falling to mid |
⇘ (U+21D8) |
falling to low |
Prosody Within Words
Prosodic features can be marked within words:
| Symbol | Feature | Example |
|---|---|---|
: |
drawl (lengthened syllable) | bana:nas |
^ |
pause between syllables | rhi^noceros |
ˈ (U+02C8) |
primary stress | baˈna:nas |
ˌ (U+02CC) |
secondary stress | ˌbaˈna:nas |
Phonetic Groups
When multiple orthographic words map to a single phonetic transcription, they
can be grouped using angle quotation marks: ‹
(U+2039) and › (U+203A). For example,
‹going to› represents two orthographic words
with a single phonetic form.
Conversation Analysis (CA) Coding
Phon supports an extensive set of Conversation Analysis symbols for detailed transcription of speech features. These are specialized symbols used primarily in CA research.
CA Elements Within Words
The following Unicode symbols can appear within words to mark articulatory features:
| Symbol | Unicode | Feature |
|---|---|---|
≠ |
U+2260 | blocked segments |
∾ |
U+223E | constriction |
∙ |
U+2219 | inhalation |
↓ |
U+2193 | pitch down |
↻ |
U+21BB | pitch reset |
↑ |
U+2191 | pitch up |
⁑ |
U+2051 | hardening |
⤇ |
U+2907 | hurried start |
⤆ |
U+2906 | sudden stop |
CA Scope Delimiters
These symbols mark the beginning of a stretch of speech with a particular quality. The same symbol marks the end of the stretch:
| Symbol | Unicode | Quality |
|---|---|---|
⁎ |
U+204E | creaky voice |
∆ |
U+2206 | faster |
◉ |
U+25C9 | louder |
° |
U+00B0 | softer |
∇ |
U+2207 | slower |
∬ |
U+222C | whisper |
☺ |
U+263A | smile voice |
∮ |
U+222E | singing |
§ |
U+00A7 | precise |
↫ |
U+21AB | repeated segment |
⁇ |
U+2047 | unsure |
♋ |
U+264B | breathy voice |
▔ |
U+2594 | high pitch |
▁ |
U+2581 | low pitch |
CA Overlap Points
Precise overlap onset and offset can be marked within words using bracket-like symbols:
| Symbol | Unicode | Position |
|---|---|---|
⌈ |
U+2308 | top start (first speaker overlap begins) |
⌉ |
U+2309 | top end (first speaker overlap ends) |
⌊ |
U+230A | bottom start (second speaker overlap begins) |
⌋ |
U+230B | bottom end (second speaker overlap ends) |
