GerManC Plus: the complete project
Work on the complete GerManC project commenced in September 2008 with
Professor Martin Durrell
as Principal Investigator,
Dr Paul Bennett
as Co-Investigator and
Dr Silke Scheible
and
Dr Richard J. Whitt
as Research Associates.
In the first instance this will involve extending the corpus by including the remaining genres, i.e. drama, sermons, personal letters, journals, narrative prose (fiction and biographies), academic, medical and legal texts. The parameters established in the pilot project will be followed for these genres, in other words three 2000 word samples will be taken for each of the five regions within each of the three fifty-year sub-periods.
Building on the achievements of the pilot project, software programs will be developed to enable full analysis of the corpus material. In particular, in collaboration with colleagues at the Institute for German Language (IDS) and other institutions in Germany working on the Deutsch Diachron Digital Project, we shall be aiming to find ways in which all occurrences of particular words in the corpus can be found automatically despite the considerable variation in spelling at this time. It is also intended to adapt existing software which identifies the part of speech (noun, verb, adjective, etc.) for each word and classifies them according to grammatical category (case, gender, tense, etc.), as well as automatically specifying the basic structure of each sentence. Such programs will ideally have the potential for wider application to other languages whose grammar is similarly complex to that of German. The whole corpus will be set up with interfaces to ensure maximum ease of access.