corpusTextNgramInfo
definition
Groups together information required for n-gram resources; information can be provided both as regards features drawn from the source corpus (e.g. language coverage, size, format, domains etc.) and features pertaining to the n-gram output itself (e.g. range of n-grams, type of item included, etc.)
type
component
elements
- mediaType
- Status: mandatory with value textNgram
- Repeatability: 1
- ngramInfo
- Status: mandatory
- Repeatability: 1
- lingualityInfo
- Status: mandatory
- Repeatability: 1
- languageInfo
- Status: mandatory
- Repeatability: unbounded
- modalityInfo
- Status: recommended
- Repeatability: 1
- sizeInfo
- Status: mandatory
- Repeatability: unbounded
- textFormatInfo
- Status: recommended
- Repeatability: unbounded
- characterEncodingInfo
- Status: recommended
- Repeatability: unbounded
- annotationInfo
- Status: recommeded
- Repeatability: unbounded
- domainInfo
- Status: recommended
- Repeatability: unbounded
- textClassificationInfo
- Status: recommended
- Repeatability: unbounded
- timeCoverageInfo
- Status: recommended
- Repeatability: unbounded
- geographicCoverageInfo
- Status: recommended
- Repeatability: unbounded
- creationInfo
- Status: recommended
- Repeatability: 1
tips
It should be clear that this is intended to describe the n-gram corpus and not the source corpus, i.e. annotation and creation components must include information on the procedure, methods and tools used to create and enrich the n-gram corpus itself rather than the annotation and creation of the source corpus; the source corpus can be described as a separate resource and the element "originalSource" (in the creationInfo component) can be used to provide the appropriate link