Textmodell: IDS-XCES
Korpora der geschriebenen Sprache
IDS-Textmodell: Unterschiede gegenüber XCES
Das IDS-Textmodell ist formal realisiert durch IDS-XCES, das auf dem internationalen Kodierungsstandard XCES basiert und darüberhinaus einige Ergänzungen und Änderungen enthält, die wiederum großteils an den Standard TEI P5 angelehnt und teilweise durch die spezifische Korpusstruktur der IDS-Korpora motiviert sind. Diese Ergänzungen und Änderungen werden im Folgenden dargestellt, separat für die DTD-Datei und die zugehörige Header-Datei.
Hierbei werden hier nur die Elemente aufgelistet, deren Element- oder Attribut-Deklaration sich in XCES und IDS-XCES unterscheiden. Betreffen die Unterschiede nur die Element-Deklaration, so ist die zugehörige Attribut-Deklaration nicht aufgeführt.
Unterschiede xcesDoc.dtd vs. ids.xcesdoc.dtd
XCES (Revision 4.3) | IDS-XCES (Version 1.0) | Kommentar |
ENTITY DECLARATIONS Sub-paragraph elements | ||
<!ENTITY % x.token ' ' > | <!ENTITY % x.token 'gloss | byline | head | ' > | |
<!ENTITY % m.token '%x.token; abbr | date | num | measure | name | term | time | ' > | <!ENTITY % m.token '%x.token; abbr | date | num | dateRange | numRange | timeRange | measure | name | term | time | w | ' > | |
<!ENTITY % m.phrase '%m.token; corr | distinct | foreign | gap | hi | list | mentioned | ptr | q | ref | reg | s | title' > | <!ENTITY % m.phrase '%m.token; corr | distinct | foreign | gap | hi | list | mentioned | orig | q | ref | reg | s | title | table | xref' > | |
<!ENTITY % ids.milestones 'pb | lb | ptr | xptr' > | Einbetten von Seiten-und Zeilenumbrüchen (pb und lb) und Pointer-Elementen (ptr und xptr) | |
Content model declarations | ||
<!ENTITY % base.seq '(%x.token; #PCDATA | num | abbr)*' > | <!ENTITY % base.seq '#PCDATA | %x.token; num | numRange | abbr | hi' > | |
ELEMENT DECLARATIONS HIGH-LEVEL COMPONENTS (Übergeordnete Struktur) | ||
<!ELEMENT cesCorpus (cesHeader, (cesDoc+ | cesCorpus+)) > <!ATTLIST cesCorpus %a.global; type CDATA #IMPLIED version CDATA #REQUIRED TEIform CDATA 'teiCorpus.2' > | <!ELEMENT idsCorpus (idsHeader, (idsDoc+)) > <!ATTLIST idsCorpus %a.global; type CDATA #IMPLIED version CDATA #REQUIRED TEIform CDATA 'teiCorpus.2' > <!ELEMENT idsDoc (idsHeader, idsText+) > <!ATTLIST idsDoc %a.global; type CDATA "text" version CDATA #REQUIRED TEIform CDATA 'TEI.2' > <!ELEMENT idsText ((idsHeader , text)) > <!ATTLIST idsText %a.global; version CDATA #REQUIRED > | Hier wurde gegenüber XCES eine Zwischenebene eingezogen, indem das IDS-XCES zwischen Dokumenten und Texten unterscheidet und erstere als eine Gruppierung mehrerer Texte definiert (vgl. Korpusstruktur). |
WRITTEN TEXTS | ||
<!ELEMENT text (body | group) > <!ATTLIST text %a.global; complete (y | n ) "y" decls IDREFS #IMPLIED > | <!ELEMENT text (front | body | back | %ids.milestones;)* > <!ATTLIST text %a.global; > | Hinweis: Das body-Element ist unverändert. |
<!ELEMENT group (%par.seq;, body+) > <!ATTLIST group %a.text; decls IDREFS #IMPLIED > | Das group-Element existiert in IDS-XCES nicht. | |
<!ELEMENT front (titlePage?, div*) > <!ATTLIST front %a.global; > <!ELEMENT titlePage ((docTitle | byline | docEdition | docImprint | epigraph)+) > <!ATTLIST titlePage %a.global; > <!ELEMENT docTitle (titlePart+) > <!ATTLIST docTitle %a.global; type (main | sub) #IMPLIED > <!ELEMENT epigraph (quote) > <!ATTLIST epigraph %a.global; > <!ELEMENT docEdition (#PCDATA) > <!ATTLIST docEdition %a.global; > <!ELEMENT docImprint (#PCDATA) > <!ATTLIST docImprint %a.global; > <!ELEMENT titlePart (#PCDATA | s)* > <!ATTLIST titlePart %a.global; type (main | sub | desc | unspecified) #IMPLIED > <!ELEMENT back (%par.seq;, div*) > <!ATTLIST back %a.text; > | Die interne Struktur des text-Elements wurde in IDS-XCES weitgehend umgestaltet. | |
<!ELEMENT div ((opener | head | byline)*, (((p | sp | %m.inter;)+, div*) | div+), (closer | byline)* ) > | <!ELEMENT div (opener | head | byline | p | sp | stage | %m.inter; | div | closer | %ids.milestones; )* > | |
Opening elements | ||
<!ELEMENT opener (%phrase.seq; | dateline | keywords )* > <!ATTLIST opener %a.text; > | <!ELEMENT opener (%phrase.seq; | dateline | keywords | salute | %ids.milestones;)* > <!ATTLIST opener %a.text; type (lead | unspecified) "unspecified" > | |
<!ELEMENT head (%phrase.seq;)* > <!ATTLIST head %a.text; type CDATA #IMPLIED > | <!ELEMENT head (%phrase.seq; | ptr)* > <!ATTLIST head %a.text; type (top | main | sub | cross | desc | unspecified) "unspecified" > | |
Keyword lists, bylines, datelines | ||
<!ELEMENT byline (%phrase.seq; | docAuthor)* > | <!ELEMENT byline (%phrase.seq; | docAuthor | %ids.milestones;)* > | |
<!ELEMENT docAuthor (%base.seq;)* > | <!ELEMENT docAuthor (%base.seq; | %ids.milestones; )* > | |
<!ELEMENT dateline (%base.seq; | date | time | name | address)* > | <!ELEMENT dateline (%base.seq; | date | time | dateRange | timeRange | name | address | %ids.milestones;)* > | |
<!ELEMENT salute (#PCDATA | %ids.milestones;)* > <!ATTLIST salute %a.text; > | ||
Closing element | ||
<!ELEMENT closer (%phrase.seq; | dateline | keywords)* > | <!ELEMENT closer (%phrase.seq; | dateline | keywords | salute | signed | %ids.milestones;)* > | |
<!ELEMENT signed (#PCDATA | %ids.milestones;)* > <!ATTLIST signed %a.text; > | ||
Written paragraphs | ||
<!ELEMENT p (%phrase.seq;)* > | <!ELEMENT p (%phrase.seq; | %ids.milestones;)* > | |
Quotations | ||
<!ELEMENT quote ((%phrase.seq;) | (p | poem)+)* > | <!ELEMENT quote (%phrase.seq; | p | poem | %ids.milestones; )* > | |
Lists | ||
<!ELEMENT list (head?, (item+ | (label, item)+)) > <!ATTLIST list %a.text; > | <!ELEMENT list (head?, (item | (label, (%ids.milestones;)*, item) | %ids.milestones;)*) > <!ATTLIST list %a.text; type CDATA #IMPLIED > | |
<!ELEMENT item ((%phrase.seq;) | p+)* > <!ATTLIST item %a.text; > | <!ELEMENT item (%phrase.seq; | p | %ids.milestones;)* > <!ATTLIST item %a.text; > | |
Annotations | ||
<!ELEMENT note (%phrase.seq; | p)* > | <!ELEMENT note (%phrase.seq; | p | bibl | poem | quote | sp | %ids.milestones;)* > | |
<!ELEMENT bibl (%phrase.seq; | author)* > | <!ELEMENT bibl (%phrase.seq; | author | %ids.milestones;)* > | |
Poems | ||
<!ELEMENT poem (head?, (lg | l )+ ) > | <!ELEMENT poem (head?, (lg | l | %ids.milestones;)+ ) > | |
<!ELEMENT lg (l | lg)+ > | <!ELEMENT lg (l | lg | %ids.milestones;)+ > | |
Figures | ||
<!ELEMENT figure (head?, p*, figDesc?, text?) > | <!ELEMENT figure (head?, (p | %m.inter; | %ids.milestones; )*, figDesc?, text?) > | |
Tables | ||
<!ELEMENT table (head?, row+) > | <!ELEMENT table (head?, (row | %ids.milestones;)+ ) > | |
<!ELEMENT cell (%phrase.seq)* > | <!ELEMENT cell (%phrase.seq; | %ids.milestones;)* > | |
Captions | ||
<!ELEMENT caption (%phrase.seq;)* > | <!ELEMENT caption ( head*, (p | %m.inter; | %ids.milestones; )+ ) > | |
Transcriptions of dialogues, speeches, debates, interviews, etc., and drama | ||
<!ELEMENT sp (speaker | p | stage)+ > <!ATTLIST sp %a.text; who NMTOKEN #IMPLIED > | <!ELEMENT sp (speaker | p | quote | poem | stage | %ids.milestones; )* > <!ATTLIST sp %a.text; who CDATA #IMPLIED > | |
<!ELEMENT speaker (%base.seq;)* > | <!ELEMENT speaker (%base.seq; | %ids.milestones; )* > | |
<!ELEMENT stage (%base.seq;)* > | <!ELEMENT stage (%base.seq; | p | %ids.milestones; )* > | |
SENTENCES, QUOTED DIALOGUE WITHIN PARAGRAPHS | ||
<!ELEMENT s (%phrase.seq;)* > | <!ELEMENT s (%phrase.seq; | %ids.milestones; | stage )* > | |
<!ELEMENT q (%phrase.seq;)* > <!ATTLIST q %a.text; next IDREF #IMPLIED prev IDREF #IMPLIED type CDATA #IMPLIED direct (y | n | unspecified) "unspecified" who CDATA #IMPLIED broken (yes | no) "no" > | <!ELEMENT q (%phrase.seq; | %ids.milestones; )* > <!ATTLIST q %a.text; type (w | o | unspec) "unspec" next IDREF #IMPLIED prev IDREF #IMPLIED direct (y | n | unspecified) "unspecified" who CDATA #IMPLIED broken (yes | no) "no" > | |
PHRASE-LEVEL ELEMENTS THE CLASS M.PHRASE Editorial Changes | ||
<!ELEMENT orig (%phrase.seq;)* > <!ATTLIST orig %a.text; reg CDATA #IMPLIED regalt CDATA #IMPLIED resp CDATA #IMPLIED cert CDATA #IMPLIED > | ||
Highlighted text | ||
<!ELEMENT hi (%phrase.seq)* > | <!ELEMENT hi (%phrase.seq; | %ids.milestones;)* > | |
Other Phrase-level Elements | ||
<!ELEMENT foreign (%phrase.seq;)* > | <!ELEMENT foreign (%phrase.seq; | %ids.milestones;)* > | |
<!ELEMENT distinct (%phrase.seq;)* > | <!ELEMENT distinct (%phrase.seq; | %ids.milestones;)* > | |
<!ELEMENT mentioned (%phrase.seq;)* > | <!ELEMENT mentioned (%phrase.seq; | %ids.milestones;)* > | |
<!ELEMENT name (%base.seq;)* > | <!ELEMENT name (%base.seq; | %ids.milestones;)* > | |
<!ELEMENT term (%base.seq;)* > | <!ELEMENT term (%base.seq; | %ids.milestones;)* > | |
<!ELEMENT time (%base.seq;)* > | <!ELEMENT time (%base.seq; | %ids.milestones;)* > | |
<!ELEMENT title (%phrase.seq;)* > | <!ELEMENT title (%phrase.seq; | %ids.milestones;)* > | |
<!ELEMENT gloss (%phrase.seq;)* > <!ATTLIST gloss %a.global; target IDREF #IMPLIED > <!ELEMENT w (#PCDATA) > <!ATTLIST w %a.text; ana CDATA #IMPLIED ctag CDATA #IMPLIED type CDATA #IMPLIED > <!ELEMENT dateRange (%base.seq;)* > <!ATTLIST dateRange %a.text; from CDATA #IMPLIED to CDATA #IMPLIED > <!ELEMENT numRange (%base.seq;)* > <!ATTLIST numRange %a.text; from CDATA #IMPLIED to CDATA #IMPLIED type CDATA #IMPLIED > <!ELEMENT timeRange (%base.seq;)* > <!ATTLIST timeRange %a.text; from CDATA #IMPLIED to CDATA #IMPLIED > | ||
SEGMENTATION, LINKING, ALIGNMENT Simple cross references | ||
<!ELEMENT ref (%phrase.seq;)* > <!ATTLIST ref %a.text; corresp IDREFS #IMPLIED next IDREF #IMPLIED prev IDREF #IMPLIED type CDATA #IMPLIED resp CDATA #IMPLIED crdate CDATA #IMPLIED targType NMTOKENS #IMPLIED targOrder (y | n | u) "u" evaluate (all | one | none) #IMPLIED target IDREFS #IMPLIED > | <!ELEMENT ref (%phrase.seq;)* > <!ATTLIST ref %a.text; corresp IDREFS #IMPLIED next IDREF #IMPLIED prev IDREF #IMPLIED type CDATA #IMPLIED resp CDATA #IMPLIED crdate CDATA #IMPLIED targType NMTOKENS #IMPLIED targOrder (y | n | u) "u" evaluate (all | one | none) #IMPLIED target CDATA #IMPLIED > | Entsprechend der TEI-Konvention wird CDATA als Wert fuer das Attribut target des Elements ref zugelassen. |
<!ELEMENT xptr EMPTY > <!ATTLIST xptr corresp IDREFS #IMPLIED next IDREF #IMPLIED prev IDREF #IMPLIED ana IDREFS #IMPLIED id ID #IMPLIED n CDATA #IMPLIED lang IDREF #IMPLIED rend CDATA #IMPLIED type CDATA #IMPLIED resp CDATA #IMPLIED crdate CDATA #IMPLIED targType CDATA #IMPLIED targOrder (y | n | u) "u" evaluate (all | one | none) #IMPLIED doc CDATA #IMPLIED from CDATA "ROOT" to CDATA "DITTO" TEIform CDATA "xptr" > | ||
<!ELEMENT xref (%phrase.seq;)* > <!ATTLIST xref %a.text; corresp IDREFS #IMPLIED next IDREF #IMPLIED prev IDREF #IMPLIED ana IDREFS #IMPLIED type CDATA #IMPLIED resp CDATA #IMPLIED crdate CDATA #IMPLIED targType CDATA #IMPLIED targOrder (y | n | u) "u" evaluate (all | one | none) #IMPLIED doc ENTITY #IMPLIED from CDATA "ROOT" to CDATA "DITTO" TEIform CDATA "xref" > | ||
Milestone tags (neu hinzugefügter Abschnitt) | ||
<!ELEMENT pb EMPTY > <!ATTLIST pb id ID #IMPLIED lang IDREF #IMPLIED rend CDATA #IMPLIED ed CDATA #IMPLIED n CDATA #IMPLIED TEIform CDATA "pb" > | Kodierung von Seitenumbrüchen | |
<!ELEMENT lb EMPTY > <!ATTLIST lb id ID #IMPLIED lang IDREF #IMPLIED rend CDATA #IMPLIED ed CDATA #IMPLIED n CDATA #IMPLIED TEIform CDATA "pb" > | Kodierung von Zeilenumbrüchen |
Unterschiede xheader.ent vs. ids.xheader.ent
XCES (Revision 4.3) | IDS-XCES (Version 1.0) | Kommentar |
<!ELEMENT cesHeader (fileDesc, encodingDesc?, profileDesc?, revisionDesc?) > <!ATTLIST cesHeader %a.header; type CDATA "text" creator CDATA #IMPLIED status (new | update) "new" date.created CDATA #IMPLIED date.updated CDATA #IMPLIED version CDATA #REQUIRED TEIform CDATA "teiHeader" > | <!ELEMENT idsHeader (fileDesc, encodingDesc?, profileDesc?, revisionDesc?) > <!ATTLIST idsHeader %a.header; type CDATA "text" pattern CDATA "text" creator CDATA #IMPLIED status (new | update) "new" date.created CDATA #IMPLIED date.updated CDATA #IMPLIED version CDATA #REQUIRED TEIform CDATA 'teiHeader' > | |
Title statement | ||
<!ELEMENT titleStmt (h.title, respStmt* ) > | <!ELEMENT titleStmt ((korpusSigle , c.title , respStmt*) | (dokumentSigle , d.title , respStmt* ) | (textSigle , t.title , respStmt* ) | (x.title , respStmt* )) > | |
<!ELEMENT h.title (#PCDATA) > <!ATTLIST h.title %a.header; > | <!ELEMENT h.title (#PCDATA) > <!ATTLIST h.title %a.header; type (main | sub | abbr) "main" level (m | a) #IMPLIED > | |
<!ELEMENT korpusSigle (#PCDATA) > <!ATTLIST korpusSigle %a.header; > <!ELEMENT c.title (#PCDATA) > <!ATTLIST c.title %a.header; > <!ELEMENT dokumentSigle (#PCDATA) > <!ATTLIST dokumentSigle %a.header; > <!ELEMENT d.title (#PCDATA) > <!ATTLIST d.title %a.header; > <!ELEMENT textSigle (#PCDATA) > <!ATTLIST textSigle %a.header; > <!ELEMENT t.title (#PCDATA) > <!ATTLIST t.title %a.header; assemblage (external | regular | non-automatic) #IMPLIED > <!ELEMENT x.title (#PCDATA) > <!ATTLIST x.title %a.header; > | Diese zusätzlichen Elemente reflektieren die drei Ebenen der Korpusstruktur. | |
Publication statement | ||
<!ELEMENT pubDate (#PCDATA) > <!ATTLIST pubDate %a.header; value CDATA #IMPLIED > | <!ELEMENT pubDate (#PCDATA) > <!ATTLIST pubDate %a.header; type (year | month | day) #IMPLIED > | |
Source description | ||
<!ELEMENT sourceDesc ((biblFull | biblStruct)+) > | <!ELEMENT sourceDesc ((biblFull | biblStruct)+, reference*) > | |
<!ELEMENT reference (#PCDATA) > <!ATTLIST reference %a.header; type (complete | super | short | former) #IMPLIED assemblage (external | regular | non-automatic) #IMPLIED existence (no | yes) #IMPLIED origin (BOTfile | notBOTfile) #IMPLIED > | ||
Bibliographic citation for non-electronic source | ||
<!ELEMENT analytic (h.author | respStmt | h.title)* > | <!ELEMENT analytic (h.title+, (h.author | editor)*, (biblScope | biblNote)*, (edition, respStmt?)*, imprint+, idno*, (biblNote | biblScope)* ) > | |
<!ELEMENT monogr (h.title+, (h.author | respStmt)*, (edition, respStmt?)*, imprint+, idno*, (biblNote | biblScope)* ) > | <!ELEMENT monogr (h.title+, (h.author | editor)*, (biblScope | biblNote)*, (edition, respStmt?)*, imprint+, idno*, (biblNote | biblScope)* ) > | |
<!ELEMENT editor (#PCDATA) > <!ATTLIST editor %a.header; > | ||
<!ELEMENT edition (#PCDATA ) > <!ATTLIST edition %a.header; > | <!ELEMENT edition (further, kind, appearance) > <!ATTLIST edition %a.header; > | |
<!ELEMENT further (#PCDATA) > <!ATTLIST further %a.header; > <!ELEMENT kind (#PCDATA) > <!ATTLIST kind %a.header; > <!ELEMENT appearance (#PCDATA) > <!ATTLIST appearance %a.header; > | ||
<!ELEMENT biblScope (#PCDATA) > <!ATTLIST biblScope %a.header; type (pp | vol | issue) #IMPLIED > | <!ELEMENT biblScope (#PCDATA) > <!ATTLIST biblScope %a.header; type (subsume | pp | vol | issue | issueplace | suppl | suppltitle | volume-title) #IMPLIED > | |
Encoding description | ||
<!ELEMENT encodingDesc (projectDesc, samplingDecl*, editorialDecl*, tagsDecl?, refsDecl*, classDecl?) > <!ATTLIST encodingDesc %a.header; > | <!ELEMENT encodingDesc (projectDesc?, samplingDecl*, editorialDecl*, tagsDecl?, refsDecl*, classDecl?) > <!ATTLIST encodingDesc %a.header; > | |
Editorial declaration | ||
<!ELEMENT editorialDecl (correction | quotation | hyphenation | segmentation | transduction | normalization | conformance)+ > <!ATTLIST editorialDecl %a.header; %a.declarable; > | <!ELEMENT editorialDecl (pagination | correction | quotation | hyphenation | segmentation | transduction | normalization | conformance)+ > <!ATTLIST editorialDecl %a.header; %a.declarable; > | |
<!ELEMENT pagination (#PCDATA) > <!ATTLIST pagination %a.header; type (yes | no) #IMPLIED > | ||
<!ELEMENT hyphenation (#PCDATA) > <!ATTLIST hyphenation %a.header; %a.declarable; > | <!ELEMENT hyphenation (p+) > <!ATTLIST hyphenation %a.global; %a.declarable; eol (all | some | none) "some" >' > | |
References declaration | ||
<!ELEMENT refsDecl (#PCDATA) > | <!ELEMENT refsDecl (state) > | |
<!ELEMENT state EMPTY > <!ATTLIST state %a.global; ed CDATA #IMPLIED unit CDATA #REQUIRED length NMTOKEN #IMPLIED delim CDATA #IMPLIED > | ||
Profile description | ||
<!ELEMENT profileDesc (creation?, langUsage?, wsdUsage?, textClass?, translations?, annotations?) > | <!ELEMENT profileDesc (creation?, langUsage?, wsdUsage?, textClass?, translations?, annotations?, textDesc ) > | |
Creation element | ||
<!ELEMENT creation (#PCDATA ) > <!ATTLIST creation %a.header; date CDATA #REQUIRED > | <!ELEMENT creation (creatDate, creatRef?, creatRefShort?) > <!ATTLIST creation %a.header; > | |
<!ELEMENT creatDate (#PCDATA) > <!ATTLIST creatDate %a.header; > <!ELEMENT creatRef (#PCDATA) > <!ATTLIST creatRef %a.header; > <!ELEMENT creatRefShort (#PCDATA) > <!ATTLIST creatRefShort %a.header; > | ||
<!ELEMENT language (#PCDATA) > <!ATTLIST language id ID #IMPLIED wsd CDATA #IMPLIED n CDATA #IMPLIED type CDATA #IMPLIED iso639 CDATA #REQUIRED > | <!ELEMENT language (#PCDATA) > <!ATTLIST language id ID #IMPLIED usage CDATA #IMPLIED > | |
TextDesc (neu hinzugefügter Abschnitt) | ||
<!ELEMENT textDesc ((textType?, textTypeRef?), (textTypeArt?, textDomain?, column?)) > <!ATTLIST textDesc %a.header; > <!ELEMENT textType (#PCDATA) > <!ATTLIST textType %a.header; > <!ELEMENT textTypeRef (#PCDATA) > <!ATTLIST textTypeRef %a.header; > <!ELEMENT textTypeArt (#PCDATA) > <!ATTLIST textTypeArt %a.header; > <!ELEMENT textDomain (#PCDATA) > <!ATTLIST textDomain %a.header; > <!ELEMENT column (#PCDATA) > <!ATTLIST column %a.header; > |