Korpora der geschriebenen Sprache
IDS-Textmodell: Unterschiede gegenüber XCES
Das IDS-Textmodell ist formal realisiert durch IDS-XCES, das auf dem internationalen Kodierungsstandard XCES basiert und darüberhinaus einige Ergänzungen und Änderungen enthält, die wiederum großteils an den Standard TEI P5 angelehnt und teilweise durch die spezifische Korpusstruktur der IDS-Korpora motiviert sind. Diese Ergänzungen und Änderungen werden im Folgenden dargestellt, separat für die DTD-Datei und die zugehörige Header-Datei.
Hierbei werden hier nur die Elemente aufgelistet, deren Element- oder Attribut-Deklaration sich in XCES und IDS-XCES unterscheiden. Betreffen die Unterschiede nur die Element-Deklaration, so ist die zugehörige Attribut-Deklaration nicht aufgeführt.
Unterschiede xcesDoc.dtd vs. ids.xcesdoc.dtd
XCES (Revision 4.3) | IDS-XCES (Version 1.0) | Kommentar |
ENTITY DECLARATIONS Sub-paragraph elements | ||
<!ENTITY % x.token ' ' > | <!ENTITY % x.token 'gloss | byline | head | ' > | |
<!ENTITY % m.token '%x.token; abbr | date | num | measure | name | term | time | ' > | <!ENTITY % m.token '%x.token; abbr | date | num | dateRange | numRange | timeRange | measure | name | term | time | w | ' > | |
<!ENTITY % m.phrase '%m.token; corr | distinct | foreign | gap | hi | list | mentioned | ptr | q | ref | reg | s | title' > | <!ENTITY % m.phrase '%m.token; corr | distinct | foreign | gap | hi | list | mentioned | orig | q | ref | reg | s | title | table | xref' > | |
<!ENTITY % ids.milestones 'pb | lb | ptr | xptr' > | Einbetten von Seiten-und Zeilenumbrüchen (pb und lb) und Pointer-Elementen (ptr und xptr) | |
Content model declarations | ||
<!ENTITY % base.seq '(%x.token; #PCDATA | num | abbr)*' > | <!ENTITY % base.seq '#PCDATA | %x.token; num | numRange | abbr | hi' > | |
ELEMENT DECLARATIONS HIGH-LEVEL COMPONENTS (Übergeordnete Struktur) | ||
<!ELEMENT cesCorpus (cesHeader, (cesDoc+ | cesCorpus+)) > | <!ELEMENT idsCorpus (idsHeader, (idsDoc+)) > | Hier wurde gegenüber XCES eine Zwischenebene eingezogen, indem das IDS-XCES zwischen Dokumenten und Texten unterscheidet und erstere als eine Gruppierung mehrerer Texte definiert (vgl. Korpusstruktur). |
WRITTEN TEXTS | ||
<!ELEMENT text (body | group) > | <!ELEMENT text (front | body | back | %ids.milestones;)* > | Hinweis: Das body-Element ist unverändert. |
<!ELEMENT group (%par.seq;, body+) > | Das group-Element existiert in IDS-XCES nicht. | |
<!ELEMENT front (titlePage?, div*) > | Die interne Struktur des text-Elements wurde in IDS-XCES weitgehend umgestaltet. | |
<!ELEMENT div ((opener | head | byline)*, (((p | sp | %m.inter;)+, div*) | div+), (closer | byline)* ) > | <!ELEMENT div (opener | head | byline | p | sp | stage | %m.inter; | div | closer | %ids.milestones; )* > | |
Opening elements | ||
<!ELEMENT opener (%phrase.seq; | dateline | keywords )* > | <!ELEMENT opener (%phrase.seq; | dateline | keywords | salute | %ids.milestones;)* > | |
<!ELEMENT head (%phrase.seq;)* > | <!ELEMENT head (%phrase.seq; | ptr)* > | |
Keyword lists, bylines, datelines | ||
<!ELEMENT byline (%phrase.seq; | docAuthor)* > | <!ELEMENT byline (%phrase.seq; | docAuthor | %ids.milestones;)* > | |
<!ELEMENT docAuthor (%base.seq;)* > | <!ELEMENT docAuthor (%base.seq; | %ids.milestones; )* > | |
<!ELEMENT dateline (%base.seq; | date | time | name | address)* > | <!ELEMENT dateline (%base.seq; | date | time | dateRange | timeRange | name | address | %ids.milestones;)* > | |
<!ELEMENT salute (#PCDATA | %ids.milestones;)* > | ||
Closing element | ||
<!ELEMENT closer (%phrase.seq; | dateline | keywords)* > | <!ELEMENT closer (%phrase.seq; | dateline | keywords | salute | signed | %ids.milestones;)* > | |
<!ELEMENT signed (#PCDATA | %ids.milestones;)* > | ||
Written paragraphs | ||
<!ELEMENT p (%phrase.seq;)* > | <!ELEMENT p (%phrase.seq; | %ids.milestones;)* > | |
Quotations | ||
<!ELEMENT quote ((%phrase.seq;) | (p | poem)+)* > | <!ELEMENT quote (%phrase.seq; | p | poem | %ids.milestones; )* > | |
Lists | ||
<!ELEMENT list (head?, (item+ | (label, item)+)) > | <!ELEMENT list (head?, (item | (label, (%ids.milestones;)*, item) | %ids.milestones;)*) > | |
<!ELEMENT item ((%phrase.seq;) | p+)* > | <!ELEMENT item (%phrase.seq; | p | %ids.milestones;)* > | |
Annotations | ||
<!ELEMENT note (%phrase.seq; | p)* > | <!ELEMENT note (%phrase.seq; | p | bibl | poem | quote | sp | %ids.milestones;)* > | |
<!ELEMENT bibl (%phrase.seq; | author)* > | <!ELEMENT bibl (%phrase.seq; | author | %ids.milestones;)* > | |
Poems | ||
<!ELEMENT poem (head?, (lg | l )+ ) > | <!ELEMENT poem (head?, (lg | l | %ids.milestones;)+ ) > | |
<!ELEMENT lg (l | lg)+ > | <!ELEMENT lg (l | lg | %ids.milestones;)+ > | |
Figures | ||
<!ELEMENT figure (head?, p*, figDesc?, text?) > | <!ELEMENT figure (head?, (p | %m.inter; | %ids.milestones; )*, figDesc?, text?) > | |
Tables | ||
<!ELEMENT table (head?, row+) > | <!ELEMENT table (head?, (row | %ids.milestones;)+ ) > | |
<!ELEMENT cell (%phrase.seq)* > | <!ELEMENT cell (%phrase.seq; | %ids.milestones;)* > | |
Captions | ||
<!ELEMENT caption (%phrase.seq;)* > | <!ELEMENT caption ( head*, (p | %m.inter; | %ids.milestones; )+ ) > | |
Transcriptions of dialogues, speeches, debates, interviews, etc., and drama | ||
<!ELEMENT sp (speaker | p | stage)+ > | <!ELEMENT sp (speaker | p | quote | poem | stage | %ids.milestones; )* > | |
<!ELEMENT speaker (%base.seq;)* > | <!ELEMENT speaker (%base.seq; | %ids.milestones; )* > | |
<!ELEMENT stage (%base.seq;)* > | <!ELEMENT stage (%base.seq; | p | %ids.milestones; )* > | |
SENTENCES, QUOTED DIALOGUE WITHIN PARAGRAPHS | ||
<!ELEMENT s (%phrase.seq;)* > | <!ELEMENT s (%phrase.seq; | %ids.milestones; | stage )* > | |
<!ELEMENT q (%phrase.seq;)* > | <!ELEMENT q (%phrase.seq; | %ids.milestones; )* > | |
PHRASE-LEVEL ELEMENTS THE CLASS M.PHRASE Editorial Changes | ||
<!ELEMENT orig (%phrase.seq;)* > | ||
Highlighted text | ||
<!ELEMENT hi (%phrase.seq)* > | <!ELEMENT hi (%phrase.seq; | %ids.milestones;)* > | |
Other Phrase-level Elements | ||
<!ELEMENT foreign (%phrase.seq;)* > | <!ELEMENT foreign (%phrase.seq; | %ids.milestones;)* > | |
<!ELEMENT distinct (%phrase.seq;)* > | <!ELEMENT distinct (%phrase.seq; | %ids.milestones;)* > | |
<!ELEMENT mentioned (%phrase.seq;)* > | <!ELEMENT mentioned (%phrase.seq; | %ids.milestones;)* > | |
<!ELEMENT name (%base.seq;)* > | <!ELEMENT name (%base.seq; | %ids.milestones;)* > | |
<!ELEMENT term (%base.seq;)* > | <!ELEMENT term (%base.seq; | %ids.milestones;)* > | |
<!ELEMENT time (%base.seq;)* > | <!ELEMENT time (%base.seq; | %ids.milestones;)* > | |
<!ELEMENT title (%phrase.seq;)* > | <!ELEMENT title (%phrase.seq; | %ids.milestones;)* > | |
<!ELEMENT gloss (%phrase.seq;)* > | ||
SEGMENTATION, LINKING, ALIGNMENT Simple cross references | ||
<!ELEMENT ref (%phrase.seq;)* > | <!ELEMENT ref (%phrase.seq;)* > | Entsprechend der TEI-Konvention wird CDATA als Wert fuer das Attribut target des Elements ref zugelassen. |
<!ELEMENT xptr EMPTY > | ||
<!ELEMENT xref (%phrase.seq;)* > | ||
Milestone tags (neu hinzugefügter Abschnitt) | ||
<!ELEMENT pb EMPTY > | Kodierung von Seitenumbrüchen | |
<!ELEMENT lb EMPTY > | Kodierung von Zeilenumbrüchen |
Unterschiede xheader.ent vs. ids.xheader.ent
XCES (Revision 4.3) | IDS-XCES (Version 1.0) | Kommentar |
<!ELEMENT cesHeader (fileDesc, encodingDesc?, profileDesc?, revisionDesc?) > | <!ELEMENT idsHeader (fileDesc, encodingDesc?, profileDesc?, revisionDesc?) > | |
Title statement | ||
<!ELEMENT titleStmt (h.title, respStmt* ) > | <!ELEMENT titleStmt ((korpusSigle , c.title , respStmt*) | (dokumentSigle , d.title , respStmt* ) | (textSigle , t.title , respStmt* ) | (x.title , respStmt* )) > | |
<!ELEMENT h.title (#PCDATA) > | <!ELEMENT h.title (#PCDATA) > | |
<!ELEMENT korpusSigle (#PCDATA) > | Diese zusätzlichen Elemente reflektieren die drei Ebenen der Korpusstruktur. | |
Publication statement | ||
<!ELEMENT pubDate (#PCDATA) > | <!ELEMENT pubDate (#PCDATA) > | |
Source description | ||
<!ELEMENT sourceDesc ((biblFull | biblStruct)+) > | <!ELEMENT sourceDesc ((biblFull | biblStruct)+, reference*) > | |
<!ELEMENT reference (#PCDATA) > | ||
Bibliographic citation for non-electronic source | ||
<!ELEMENT analytic (h.author | respStmt | h.title)* > | <!ELEMENT analytic (h.title+, (h.author | editor)*, (biblScope | biblNote)*, (edition, respStmt?)*, imprint+, idno*, (biblNote | biblScope)* ) > | |
<!ELEMENT monogr (h.title+, (h.author | respStmt)*, (edition, respStmt?)*, imprint+, idno*, (biblNote | biblScope)* ) > | <!ELEMENT monogr (h.title+, (h.author | editor)*, (biblScope | biblNote)*, (edition, respStmt?)*, imprint+, idno*, (biblNote | biblScope)* ) > | |
<!ELEMENT editor (#PCDATA) > | ||
<!ELEMENT edition (#PCDATA ) > | <!ELEMENT edition (further, kind, appearance) > | |
<!ELEMENT further (#PCDATA) > | ||
<!ELEMENT biblScope (#PCDATA) > | <!ELEMENT biblScope (#PCDATA) > | |
Encoding description | ||
<!ELEMENT encodingDesc (projectDesc, samplingDecl*, editorialDecl*, tagsDecl?, refsDecl*, classDecl?) > | <!ELEMENT encodingDesc (projectDesc?, samplingDecl*, editorialDecl*, tagsDecl?, refsDecl*, classDecl?) > | |
Editorial declaration | ||
<!ELEMENT editorialDecl (correction | quotation | hyphenation | segmentation | transduction | normalization | conformance)+ > | <!ELEMENT editorialDecl (pagination | correction | quotation | hyphenation | segmentation | transduction | normalization | conformance)+ > | |
<!ELEMENT pagination (#PCDATA) > | ||
<!ELEMENT hyphenation (#PCDATA) > | <!ELEMENT hyphenation (p+) > | |
References declaration | ||
<!ELEMENT refsDecl (#PCDATA) > | <!ELEMENT refsDecl (state) > | |
<!ELEMENT state EMPTY > | ||
Profile description | ||
<!ELEMENT profileDesc (creation?, langUsage?, wsdUsage?, textClass?, translations?, annotations?) > | <!ELEMENT profileDesc (creation?, langUsage?, wsdUsage?, textClass?, translations?, annotations?, textDesc ) > | |
Creation element | ||
<!ELEMENT creation (#PCDATA ) > | <!ELEMENT creation (creatDate, creatRef?, creatRefShort?) > | |
<!ELEMENT creatDate (#PCDATA) > | ||
<!ELEMENT language (#PCDATA) > | <!ELEMENT language (#PCDATA) > | |
TextDesc (neu hinzugefügter Abschnitt) | ||
<!ELEMENT textDesc ((textType?, textTypeRef?), (textTypeArt?, textDomain?, column?)) > |