Groups
Phonex allows defining groups by placing any subpattern between the parenthesis - ( and ) - metacharacters.
Some reasons to use groups:
- Repeating subpatterns
- Extract information for further processing
- Exclude part of the pattern from the final match
- Denote different possible subpatterns
Capture Groups
Capture groups are used to extract portions of matches for further processing. Groups are numbered left to right starting at 1 (group 0 is the entire match). In Phon this is often used to create a new column in query result listings containing the data matched by the group subpattern.
For example, say you were searching for any CV pattern (e.g., \c\v) but you wanted the consonant and vowel
in their own separate columns in a Phon query report. You would place each phone matcher into a group using parenthesis.
(It's also required to 'name' the group in this situation, see 'Group Names' below.)
E.g.
(\c)(\v)Capture groups may be quantified. The following expression will match a consonant followed by a vowel repeatedly:
(\c\v)+Lookahead and Lookbehind Groups
Lookahead and lookbehind groups allow matching subpatterns around a pattern without including
the content matched by the lookahead or lookbehind group. These groups are considered to be
zero-width assertions (i.e., the length of matched content is zero) like the start-of-input ^
and end-of-input $ boundary matchers.
Lookahead patterns are contained within parenthesis like regular groups with the special prefix ?>.
An example of using a lookahead group would be to search for all consonants \c which
are followed by a high vowel {v, high}.
\c(?>{v, high})Lookbehind patterns are specified by the group prefix ?<. They behave in the same manner as
lookahead groups, but look backwards in the input rather than forwards. An example would be to search
for all vowels \v which are preceded by a b.
(?<b)\vLookahead and lookbehind groups can be used together in the same pattern:
(?<\c)\v(?>\c)This matches vowels between two consonants.
Alternation
Alternation allows for choices within patterns. To specify choices, subpatterns in a group
are separated by the logical-or (or pipe) | metacharacter. The following example will match
the sequence bab as well as dib.
(ba|di)bAlternation groups may be quantified.
Group Numbers
Groups in a phonex pattern are numbered left to right. Each open parenthesis ( metacharacter will increment the group index by 1 unless
the group is 'non-capturing' such as for lookbehind and lookahead groups. The following example pattern has two groups,
the first group includes both the consonant \c and vowel \v matchers; the second group includes only the
vowel \v matcher:
(\c(\v))The next example also has two groups as the lookbehind group is not included in group indexing:
(?<^\S)(\c(\v))Phonex includes syntax to exclude a group from indexing (the group's content will not be stored.) These groups are
called non-capturing or organizational groups. To exclude a group from indexing the group content must start with ?=.
The following phonex pattern has two capturing groups: group 1 includes a syllable boundary \S, consonant \c, and
vowel \v matcher; group 2 includes just the vowel \v matcher. There is one non-capturing group
containing the consonant \c matcher.
(\S(?=\c)(\v))Note that while the consonant is considered part of a non-capturing group it will still be included in the enclosing group's matched data.
Group Names
Capturing groups may also be named. To name a group the group content should start with the desired group name followed by an
equals = metacharacter. The group name must start with a letter and consist of only letters, numbers, and
underscore _. The following expression has two named groups; the first group name is 'onset'
and will match a consonant in the onset position \c:O; the second group name is 'nucleus'
and will match a vowel in the nucleus position \v:N.
(onset=\c:O)(nucleus=\v:N)When used in Phon queries named groups will be added to result listings in a new column with a title
matching the phonex group name. The group name X is reserved in Phon queries to mark
the portion of the phonex pattern to be used as the query result.
Back References
Back references are used to match a subpattern previously matched by a capture group. Back references compare the base phone character only; they do not match syllable constituent type or supplementary matcher information.
Numbered Back References
\n matches the same content previously captured by group n. The following pattern will match a consonant, store the value of the
matched consonant in group number 1, and then match the value of group 1 again (i.e., it will match repeated consonants.)
(\c)\1Named Back References
\{name} matches the same content captured by the named group. The following pattern will match a consonant, store the value of the matched consonant in a group named C1, and then match
the sequence stored in group C1.
(C1=\c)\{C1}Group names are case sensitive, so in the above example \{c1} would result in an error as there is no group named c1 with a lower-case C. Quantifiers may be applied to back references but supplementary matchers are not allowed.
Negated Back References
Prefix a back reference with ^ to match content different from the referenced group:
(\c)\{^1}This matches two different consonants. Named negated references work the same way:
(C1=\c)\{^C1}Advanced Group References
Ignore Diacritics
Prefix a group name with # to compare base characters only, ignoring all diacritics:
(C=\c)\{#C}For example, if group C captured tʰ, then \{#C} matches t, tʰ, t̪, etc.
Element Indexing
When a group captures multiple phones, use [n] to reference a specific element by index (0-based). Use [$] for the last element:
(O=\c+)\v\{O}[$]This matches the last consonant of the onset group. Use [0] for the first element:
(O=\c+)\v\{O}[0]Compound Phone Part Selection
When a group contains a compound phone (e.g., an affricate), use [n_] or [_n] to select a specific part:
| Syntax | Selects |
|---|---|
\{V}[0_] | First part of compound phone at index 0 |
\{V}[_0] | Second part of compound phone at index 0 |
(V={v}_{v})\{V}[0_]This matches the first vowel of a captured diphthong.
Group Reference Feature Sets
Use {features({group})} to match a phone that has the same features as a captured group. The features portion is a dimension function — a string of characters that selects which phonological dimensions to compare against the referenced group. Only the features in the selected dimensions are checked; all other features are ignored.
(C=\c){PV({#C})}This matches a phone whose primary place and voicing features match those of group C (ignoring diacritics).
Dimension Characters
Dimension functions are formed by combining one or more of the following characters. Characters can be combined in any order.
Consonant dimensions:
| Character | Dimension | Features included |
|---|---|---|
P | Primary place | labial, coronal, dorsal, lingual, anterior, posterior, guttural |
p | Secondary place | bilabial, labiodental, interdental, alveolar, alveopalatal, retroflex, palatal, velar, uvular, pharyngeal, laryngeal, epiglottal, dental, apical, laminal, distributed, grooved, subapical, velopharyngeal |
M | Primary manner | obstruent, nasal, liquid, glide, approximant, continuant, sonorant |
m | Secondary manner | stop, affricate, fricative, nasal, oral, lateral, rhotic, click, implosive, flap, trill, ejective, prenasalized, strident, quasiresonant, semiresonant, raspberry, transition, narealfricative, percussive |
V | Primary voicing | voiced, voiceless |
v | Secondary voicing | aspirated, plain, unreleased, weaklyaspirated, unaspirated |
Vowel dimensions:
| Character | Dimension | Features included |
|---|---|---|
H or h | Height | high, mid, low |
B or b | Backness | front, central, back |
T or t | Tenseness | tense, lax |
R or r | Rounding | rounded, unrounded |
Negation
Prefix the feature set with ^ to match phones whose selected dimensions differ from the referenced group:
(?< {^ PV({C1})} )This checks that the preceding phone's place and voicing features differ from group C1.
Examples
Common dimension function combinations for phonological processes:
Consonant patterns:
| Pattern | Meaning |
|---|---|
{PV({#C})} | Match place + voicing of group C (e.g., affrication, deaffrication) |
{MV({#C})} | Match manner + voicing of group C (e.g., backing, depalatalization) |
{MmV({#C})} | Match manner (primary + secondary) + voicing of group C (e.g., backing, fronting) |
{PpV({#C})} | Match place (primary + secondary) + voicing of group C (e.g., denasalization, spirantization) |
{PpMm({#C})} | Match place + manner of group C (e.g., devoicing, voicing) |
{PpMV({#C})} | Match place + primary manner + voicing of group C (e.g., spirantization) |
{V({#C})} | Match only voicing of group C (e.g., deaffrication, aspiration changes) |
{P({#C})} | Match only primary place of group C (e.g., nasalization, stopping) |
{mV({#C})} | Match secondary manner + voicing of group C (e.g., fronting) |
Vowel patterns:
| Pattern | Meaning |
|---|---|
{HTRB({V})} | Match height + tenseness + rounding + backness of group V (e.g., centralization) |
{HT({V})} | Match height + tenseness of group V (e.g., fronting) |
{BHR({V})} | Match backness + height + rounding of group V (e.g., laxing, tensing) |
{BRT({V})} | Match backness + rounding + tenseness of group V (e.g., lowering, raising) |
{HBT({V})} | Match height + backness + tenseness of group V (e.g., rounding, unrounding) |
See Aligned Phonex for how group reference feature sets are used with the alignment operator.
