[Histling-l] Announcing the release of two parsed corpora of historical French: MCVF and Penn-BFM Parsed Corpus of Historical French

Santorini, Beatrice beatrice at sas.upenn.edu
Wed Jun 2 11:19:48 EDT 2021


Dear friends and colleagues,

I'm pleased to announce the release of two parsed corpora of Old and Middle
French, containing a total of roughly 1.6 million words:

- Modéliser le changement: Voies de français (MCVF) (ca. 850,000 words)
- Penn-BFM Parsed Corpus of Historical French (PPCHF) (ca. 750,000 words)

The MCVF is distributed in two versions:

- version 1.0, the original 2009 release (with XML, tagged, and parsed files)
- version 2.0, containing revised and corrected versions of the parsed files

The PPCHF contains additional texts, especially from the Early Old French
period before 1200.  More than half of the material is based on online editions
published by the Base de Français Médiéval (BFM).

The corpora are distributed under a Creative Commons License
Attribution-NonCommercial-ShareAlike 4.0 International CC BY-NC-SA 4.0
(https://creativecommons.org/licenses/by-nc-sa/4.0/), which allows you to
add information of your own to the data (say, lemmatization).  The corpora
can be downloaded at

       https://github.com/beatrice57/mcvf-plus-ppchf/

The following information is included in the github repository, but is also
available online for stand-alone reference, so that you can get a sense of
the contents before downloading:

Source information about the texts

       https://www.ling.upenn.edu/~beatrice/corpus-ling/french-corpora-sources/index.html

Annotation guidelines

       https://www.ling.upenn.edu/~beatrice/corpus-ling/annotation-french


Best regards,

Beatrice Santorini







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.yale.edu/pipermail/histling-l/attachments/20210602/0b39774a/attachment.html>


More information about the histling-l mailing list