GeRedE is a 270 million token German CMC corpus containing approximately 380,000 submissions and 6,800,000 comments posted on Reddit between 2010 and 2018. Reddit is …
Este projeto conjugará duas importantes áreas de estudo da linguística e das humanidades digitais, nomeadamente o bilinguismo e a análise e tratamento de corpora. Para tal, foram …
B Kabashi - of the 11th Conference on computer-mediated …, 2024 - shs.hal.science
In addition to the standard variant of a language, a lot is also spoken and written in non- standard variants. The processing of data that is available in a non-standard variant is …