Alignment Specification¶
[Tutorial: Word Alignment] [Tutorial: Audio–Text Alignment] [Example: Word Alignment] [Example: Audio–Text Alignment]
This page covers fields specific to the Alignment flavor. For fields common to all burritos see Scripture Burrito Structure.
An alignment burrito contains word-level or timecode-level alignment data between two texts or between audio and text. Content files follow the Scripture Burrito Alignment Format.
Type Fields¶
type.flavorType.nameMUST be
"alignment".type.flavorType.flavor.nameMUST be
"alignment".type.flavorType.currentScopeOPTIONAL. If present, keys MUST be valid USFM book codes and values MUST be arrays of chapter strings or an empty array meaning the whole book.
Ingredients¶
At least one alignment ingredient MUST be present. Alignment ingredients
MUST use "mimeType": "application/json".
Alignment ingredients SHOULD include a scope indicating which Scripture
content they cover.
Each alignment ingredient MUST be a valid alignment format file. A valid alignment format file:
MUST have
"format": "alignment"MUST have
"version": "0.4"MUST have a
"groups"arrayWithin each group, every record MUST have either a
"references"array or named role keys (e.g."source"and"target")Reference units MUST be lists of string selectors when
schemeis hoisted to the group’sdocumentsarray; otherwise they MUST be objects withscheme,docid, andselectors
Alignment Types¶
The alignment format is extensible. The following alignment types are defined in the specification and MAY be used:
"translation"— target is a translation of source; roles:"source","target""audio-reference"— maps audio timecodes to text references; roles:"timecode","text-reference""related"— generic undirected relationship; no roles"directed"— generic directed relationship; roles:"from","to"
Custom alignment types MAY be used; they SHOULD be prefixed with "x-".
Reference Schemes¶
The alignment format is extensible. The following reference schemes MAY be used:
"BCVWP"— word-level biblical reference using a 12-characterBBCCCVVVWWWPstring (book, chapter, verse, word, part)"vtt-timecode"— WebVTT timecode range"MM:SS.mmm --> MM:SS.mmm";docidis the audio filename"u23003"— scripture reference with sub-verse granularity;docidis embedded in the selector (e.g."en+ulb.EPH 1:1")"ws-token"— whitespace-tokenised offset;docidis the filename"nfc-char"— Unicode NFC character offset range;docidis the filename
Custom reference schemes MAY be used.