Romanian Balanced Corpus (ROMBAC)
ID:
ELRA-W0088
ROMBAC is a Romanian corpus containing equal shares of texts from 5 different genres: journalism, legalese, fiction, medicine and biographical data for Romanian literary personalities. For each genre, texts have been selected containing around 7,000,000 words, so that the entire corpus counts around 41,000,000 words, including punctuation.
People who looked at this resource also viewed the following: