[Histling-l] New Release of the Reference Corpus of Middle High German (ReM)

Stefanie Dipper stefanie.dipper at rub.de
Fri Dec 5 05:57:32 EST 2025


We are pleased to announce the new Version 2 of the Reference Corpus of
Middle High German (ReM), which is available for download via the project
website:

https://linguistics.rub.de/rem

The Reference Corpus of Middle High German (1050–1350) consists of more
than two million tokens, providing a mostly complete collection of written
records from Early Middle High German (1050–1200) as well as a careful
selection of Middle High German texts from 1200 to 1350. The corpus was
compiled in the context of a series of projects at the Universities of
Cologne, Bonn, and Bochum, beginning in the mid-1980s.

This new version of the corpus contains numerous corrections and
improvements, both to the tokenization and to the linguistic annotations,
as well as several new documents that were added to the corpus.

In addition to CorA-XML, various new formats are available for download,
including TEI XML and GraphML, which, among other things, is usable with a
local ANNIS 4 instance. There is also a JSON-based format that contains all
available annotations and provides easy access for data analysis scripts.

The new version of the corpus can be accessed via ANNIS 4 at the following
URL:

https://newannis.linguistics.rub.de/rem

The Reference Corpus of Middle High German is licensed under the Creative
Commons Attribution-ShareAlike 4.0 license (CC BY-SA 4.0).

--
Prof. Dr. Stefanie Dipper (she/her) - Professur fuer Computerlinguistik
Sprachwiss. Institut - Ruhr-Universitaet Bochum - 44780 Bochum - Germany
Email: stefanie.dipper at rub.de - https://www.linguistics.rub.de/~dipper/


More information about the histling-l mailing list