Groups

Phonex allows defining groups by placing any subpattern between the parenthesis - ( and ) - metacharacters.

Some reasons to use groups:

  • Repeating subpatterns
  • Extract information for further processing
  • Exclude part of the pattern from the final match
  • Denote different possible subpatterns

Capture Groups

Capture groups are used to extract portions of matches for further processing. Groups are numbered left to right starting at 1 (group 0 is the entire match). In Phon this is often used to create a new column in query result listings containing the data matched by the group subpattern.

For example, say you were searching for any CV pattern (e.g., \c\v) but you wanted the consonant and vowel in their own separate columns in a Phon query report. You would place each phone matcher into a group using parenthesis. (It's also required to 'name' the group in this situation, see 'Group Names' below.)

E.g.

(\c)(\v)

Capture groups may be quantified. The following expression will match a consonant followed by a vowel repeatedly:

(\c\v)+

Lookahead and Lookbehind Groups

Lookahead and lookbehind groups allow matching subpatterns around a pattern without including the content matched by the lookahead or lookbehind group. These groups are considered to be zero-width assertions (i.e., the length of matched content is zero) like the start-of-input ^ and end-of-input $ boundary matchers.

Lookahead patterns are contained within parenthesis like regular groups with the special prefix ?>. An example of using a lookahead group would be to search for all consonants \c which are followed by a high vowel {v, high}.

\c(?>{v, high})

Lookbehind patterns are specified by the group prefix ?<. They behave in the same manner as lookahead groups, but look backwards in the input rather than forwards. An example would be to search for all vowels \v which are preceded by a b.

(?<b)\v

Lookahead and lookbehind groups can be used together in the same pattern:

(?<\c)\v(?>\c)

This matches vowels between two consonants.

Alternation

Alternation allows for choices within patterns. To specify choices, subpatterns in a group are separated by the logical-or (or pipe) | metacharacter. The following example will match the sequence bab as well as dib.

(ba|di)b

Alternation groups may be quantified.

Group Numbers

Groups in a phonex pattern are numbered left to right. Each open parenthesis ( metacharacter will increment the group index by 1 unless the group is 'non-capturing' such as for lookbehind and lookahead groups. The following example pattern has two groups, the first group includes both the consonant \c and vowel \v matchers; the second group includes only the vowel \v matcher:

(\c(\v))

The next example also has two groups as the lookbehind group is not included in group indexing:

(?<^\S)(\c(\v))

Phonex includes syntax to exclude a group from indexing (the group's content will not be stored.) These groups are called non-capturing or organizational groups. To exclude a group from indexing the group content must start with ?=. The following phonex pattern has two capturing groups: group 1 includes a syllable boundary \S, consonant \c, and vowel \v matcher; group 2 includes just the vowel \v matcher. There is one non-capturing group containing the consonant \c matcher.

(\S(?=\c)(\v))

Note that while the consonant is considered part of a non-capturing group it will still be included in the enclosing group's matched data.

Group Names

Capturing groups may also be named. To name a group the group content should start with the desired group name followed by an equals = metacharacter. The group name must start with a letter and consist of only letters, numbers, and underscore _. The following expression has two named groups; the first group name is 'onset' and will match a consonant in the onset position \c:O; the second group name is 'nucleus' and will match a vowel in the nucleus position \v:N.

(onset=\c:O)(nucleus=\v:N)

When used in Phon queries named groups will be added to result listings in a new column with a title matching the phonex group name. The group name X is reserved in Phon queries to mark the portion of the phonex pattern to be used as the query result.

Back References

Back references are used to match a subpattern previously matched by a capture group. Back references compare the base phone character only; they do not match syllable constituent type or supplementary matcher information.

Numbered Back References

\n matches the same content previously captured by group n. The following pattern will match a consonant, store the value of the matched consonant in group number 1, and then match the value of group 1 again (i.e., it will match repeated consonants.)

(\c)\1

Named Back References

\{name} matches the same content captured by the named group. The following pattern will match a consonant, store the value of the matched consonant in a group named C1, and then match the sequence stored in group C1.

(C1=\c)\{C1}

Group names are case sensitive, so in the above example \{c1} would result in an error as there is no group named c1 with a lower-case C. Quantifiers may be applied to back references but supplementary matchers are not allowed.

Negated Back References

Prefix a back reference with ^ to match content different from the referenced group:

(\c)\{^1}

This matches two different consonants. Named negated references work the same way:

(C1=\c)\{^C1}

Advanced Group References

Ignore Diacritics

Prefix a group name with # to compare base characters only, ignoring all diacritics:

(C=\c)\{#C}

For example, if group C captured , then \{#C} matches t, , , etc.

Element Indexing

When a group captures multiple phones, use [n] to reference a specific element by index (0-based). Use [$] for the last element:

(O=\c+)\v\{O}[$]

This matches the last consonant of the onset group. Use [0] for the first element:

(O=\c+)\v\{O}[0]

Compound Phone Part Selection

When a group contains a compound phone (e.g., an affricate), use [n_] or [_n] to select a specific part:

SyntaxSelects
\{V}[0_]First part of compound phone at index 0
\{V}[_0]Second part of compound phone at index 0
(V={v}_{v})\{V}[0_]

This matches the first vowel of a captured diphthong.

Group Reference Feature Sets

Use {features({group})} to match a phone that has the same features as a captured group. The features portion is a dimension function — a string of characters that selects which phonological dimensions to compare against the referenced group. Only the features in the selected dimensions are checked; all other features are ignored.

(C=\c){PV({#C})}

This matches a phone whose primary place and voicing features match those of group C (ignoring diacritics).

Dimension Characters

Dimension functions are formed by combining one or more of the following characters. Characters can be combined in any order.

Consonant dimensions:

CharacterDimensionFeatures included
PPrimary placelabial, coronal, dorsal, lingual, anterior, posterior, guttural
pSecondary placebilabial, labiodental, interdental, alveolar, alveopalatal, retroflex, palatal, velar, uvular, pharyngeal, laryngeal, epiglottal, dental, apical, laminal, distributed, grooved, subapical, velopharyngeal
MPrimary mannerobstruent, nasal, liquid, glide, approximant, continuant, sonorant
mSecondary mannerstop, affricate, fricative, nasal, oral, lateral, rhotic, click, implosive, flap, trill, ejective, prenasalized, strident, quasiresonant, semiresonant, raspberry, transition, narealfricative, percussive
VPrimary voicingvoiced, voiceless
vSecondary voicingaspirated, plain, unreleased, weaklyaspirated, unaspirated

Vowel dimensions:

CharacterDimensionFeatures included
H or hHeighthigh, mid, low
B or bBacknessfront, central, back
T or tTensenesstense, lax
R or rRoundingrounded, unrounded

Negation

Prefix the feature set with ^ to match phones whose selected dimensions differ from the referenced group:

(?< {^ PV({C1})} )

This checks that the preceding phone's place and voicing features differ from group C1.

Examples

Common dimension function combinations for phonological processes:

Consonant patterns:

PatternMeaning
{PV({#C})}Match place + voicing of group C (e.g., affrication, deaffrication)
{MV({#C})}Match manner + voicing of group C (e.g., backing, depalatalization)
{MmV({#C})}Match manner (primary + secondary) + voicing of group C (e.g., backing, fronting)
{PpV({#C})}Match place (primary + secondary) + voicing of group C (e.g., denasalization, spirantization)
{PpMm({#C})}Match place + manner of group C (e.g., devoicing, voicing)
{PpMV({#C})}Match place + primary manner + voicing of group C (e.g., spirantization)
{V({#C})}Match only voicing of group C (e.g., deaffrication, aspiration changes)
{P({#C})}Match only primary place of group C (e.g., nasalization, stopping)
{mV({#C})}Match secondary manner + voicing of group C (e.g., fronting)

Vowel patterns:

PatternMeaning
{HTRB({V})}Match height + tenseness + rounding + backness of group V (e.g., centralization)
{HT({V})}Match height + tenseness of group V (e.g., fronting)
{BHR({V})}Match backness + height + rounding of group V (e.g., laxing, tensing)
{BRT({V})}Match backness + rounding + tenseness of group V (e.g., lowering, raising)
{HBT({V})}Match height + backness + tenseness of group V (e.g., rounding, unrounding)

See Aligned Phonex for how group reference feature sets are used with the alignment operator.