Ongoing projects

Multi-CAST: Multilingual Corpus of Annotated Spoken Texts

The official Multi-CAST-Website is available here.

For a comprehensive one-stop overview, see the following document.

Personalpronomen und Personenklitika im Tabassaranischen: Auf dem Weg zu einer Theorie der Person

Gefördert von der Deutschen Forschungsgesellschaft

Forschende: Dr. Natalia Bogomolova

Projektleiterin: Dr. Natalia Bogomolova, Projektleiter: Prof. Dr. Geoffrey Haig

Förderzeitraum: 01.09.2022 bis  31.08.2025 (36 Monate)

Das Projekt verfolgt zwei Hauptziele: eine eingehende Untersuchung der grammatischen Kategorie der Person im Tabassaranischen (Nakh-Daghestanisch) und die Erhebung neuer empirischer Daten aus dieser wenig untersuchten Sprache, um die syntaktische Theorie des Person-Begriffes weiterzuentwickeln. Die Kategorie der Person zeigt sich im Tabassaranischen durch zwei Systeme: unabhängige Personalpronomen und ein komplexes System von Personenklitika, die eine Reihe interessanter Eigenschaften aufweisen. Erstens können sowohl Subjekt- als auch Nicht-Subjekt-Argumente am finiten Verb durch ein Klitikum markiert werden. Auch Cluster von Subjekt- und Nicht-Subjekt-Klitika sind erlaubt. Zweitens verhalten sich kanonische Subjekte anders als nicht-kanonische Subjekte in Bezug auf Klitika. In deklarativen Hauptsätzen werden kanonische Subjekte immer doppelt ausgedrückt (also auch durch ein Klitikum), während in Sätzen mit nicht-kanonischen Subjekten entweder das Subjekt oder das zweite Argument, welches kein Subjekt ist, durch ein Klitikum doppelt markiert wird. Drittens weist das Tabassaranischen eine Eigenschaft auf, die als Person-Case Constraint bezeichnet wird und sich ähnlich in romanischen Sprachen findet. Es gibt allerdings interessante Unterschiede zu den romanischen Sprachen. Viertens zeigen sowohl Pronomen als auch Klitika eine indexikalische Verschiebung in Konstruktionen der Redewiedergabe auf, da sie ihre indexikalische Semantik verlieren und sich auf die Argumente des Matrixsatzes beziehen. Im vorgeschlagenen Projekt sollen umfassend neue Daten gesammelt und analysiert werden, die eine Herausforderung für syntaktische Theorien darstellen, und die aktuellen Ansätze auf den Prüfstand stellen. Ziel ist es somit auch, diese Ansätze zu modifizieren, um ein besseres Verständnis darüber zu erlangen, wie Informationen über die Kategorie der Person in menschlicher Sprache kodiert werden.

Post-predicate Elements in Iranian: Inheritance, Contact, and Information Structure

Funded by the Alexander-von-Humboldt-Stiftung

Funding period: 01.07.2019-30.06.2022
PI's: Geoffrey Haig (Bamberg); Mohammad Rasekh-Mahand (Hamedan)
 

Iranian languages are routinely classified as "verb final". While this is true with regard to the position of (non-pronominal) direct objects, which are generally pre-verbal, in several West Iranian languages, certain other constituents occur more or less systematically after the verb. The result is a typologically unusual and hitherto largely ignored OVX word order type within West Iranian. Furthermore, OVX word order has been identified in unrelated languages in contact with Iranian, including Turkic, and Neo-Aramaic.

This project brings together leading international experts on Iranian and neighbouring languages in order to explore

  • the extent of OVX word order within Iranian, and its genesis within the family
  • the areal spread of OVX word order in neighbouring languages, and the pathways of transmission
  • information-structural correlates of  OVX word order
  • typological implications of OVX word order.

For more information click here

Previous projects

Does morphosyntactic alignment shape discourse? Implementing a corpus-based approach to linguistic typology

This project is a proof-of-concept study for corpus-based approaches to typology. We address the question of whether typological differences in the morphosyntax of individual languages are reflected in the organization of spontaneous spoken discourse of those languages, with a special focus on so-called ergative languages. While claims of a co-dependence between grammar and discourse have regularly been made in the literature (Hopper 1983, Du Bois 2003, Durie 2003), the issue has never been systematically investigated on a more representative language sample.

The project builds on an existing language archive architecture (Multi-CAST, The Multilingual Corpus of Annotated Spoken Texts, online here), and implements an expanded version of the syntactic annotation system GRAID (Grammatical Relations and Animacy in Discourse, Haig & Schnell 2014, manual here). The existing language sample in Multi-CAST is being extended by the inclusion of ergative languages from the Nakh-Daghestanian language family and from Australia, and of data from Phillippine-type languages. All corpora are subjected to a standardized annotation procedure, and the resulting data feed into quantitative cross-corpus analysis in order to identify significant statistical patterns in connected discourse, for example:

  • the distribution of referential expressions across syntactic functions,
  • the density of zero-anaphora,
  • patterns of new-referent introduction,
  • division of labour among pronouns and lexical expressions,
  • the impact of animacy on syntactic configurations

The resulting dataset, the first of its kind worldwide, aids the detection of possible correlations between the alignment of morphosyntax, and probabilistic patterning in the way connected spoken language is organized.

The project is being coordinated by Geoffrey Haig, Stefan Schnell, and Nils Schiborr at the University of Bamberg, and runs in collaboration with researchers from the Centre of Excellence for Dynamics of Language, Canberra and Melbourne (Nick Thieberger), and the University of Jena (Diana Forker).

The project is supported by a DFG grant (project number 323627599), for an initial period of 2017–2020.

Bamberg Lexical Database for Contemporary Iranian Languages (BLDCIL)

Background and aims

The sub-classification of Iranian languages has proven to be a particularly recalcitrant problem in historical linguistics (see Korn 2016 for recent proposals, DOI: 10.1515/if-2016-0021). This project aims to complement and extend existing scholarship by applying a phylogenetic  approach, based on lexical comparison, to the problem; see e.g. Heggarty et al. (2010) for the background to this kind of approach (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2981917/pdf/rstb20100099.pdf).

The projects point of  departure is the question of the sub-grouping of Zazaki, a West Iranian language spoken in Central Anatolia (in todays Turkey), but historically not closely related to its current geographical neighbours (Northern Kurdish); see Gippert (2007/2008 for a summary or relevant scholarship, titus.fkidg1.uni-frankfurt.de/personal/jg/pdf/jg2008e.pdf)

The aims of the project are thus two-fold: (i) to apply a novel methodology to an old problem in the sub-classification of  Iranian languages; (ii) to serve as a proof-of-concept for the efficacy (or otherwise) of phylogenetic models in resolving classic problems of philology. The first phase, beginning in November 2016, involves the comilation of standardized lexical data sets, together with sound files, from a representative set of Iranian languages, focussing initially on the West Iranian languages.


Cooperation

The project is closely linked with two existing initiatives: 

The CoBL (Cognacy in Basic Lexicon) database at the Max-Planck-Institute in Jena (Paul Heggarty and Cormac Anderson,  see www.shh.mpg.de/207610/cobldatabase).

The Atlas of the Languages of Iran project (Editor Erik Anonby, http://iranatlas.net)


The Jena/Bamberg Iranian List (JBIL) of meanings

The JBIL-list is a list of meanings, which includes the 200 items used in the CoBL project, and 80 items used in the Atlas of the Languages of Iran, plus a number of other items deemed of interest for Iranian languages. The items themselves, plus explanations and instructions for investigators, are available as downloads  below:

  • The JBIL-list, with explanations and example sentences, and instructions for investigators (pdf(247.0 KB, 20 Seiten))
  • The JBIL-list, with Persian translations and Persian example sentences (pdf(470.0 KB, 25 Seiten))

  • The Data Entry Form, into which the actual forms for each language may be entered. (doc(98.5 KB, 21 Seiten))

 

Languages

Data sets have been compiled, or are in the process of compilation, for the following languages:

Kumzari

Behdinî Kurdish

Mazanderani

Persian

Jafi Kurdish

Tati

Bakhtiari

Delvari

 

Sample Data

Sample data sets will be made available shortly here

 

Team

The LDBCIL project is coordinated by Geoffrey Haig (Bamberg) and Erik Anonby (Carleton/Bamberg). Data collection and handling is undertaken with the assistance of (in alphabetical order): Shirin Adibifar (Bamberg), Raheleh Izadifar (Hamedan), Mina Salehi (Bamberg), Mortaza Taheri-Ardali (Shahr-e Kord University).

 

Support

The project gratefully acknowledges the financial and technical support of the Max-Planck Institute for the Science of Human History (CoBL-Database), the University of Bamberg for departmental funding, and the Dept. of Linguistics at the University of Hamedan as a cooperation partner in the Islamic Republic of Iran.

Atlas of the Languages of Iran (Chief Editor: Erik Anonby)

Project for the creation of a digital language map of Iran. For more information click here

Documenting Dargi languages in Daghestan - Shiri and Sanzhi

In this project, three linguists (Diana Forker, Rasul Mutalov and Oleg Belvaev) and an ethnographer (Iwona Kaliszewska) will document and analyze Shiri and Sanzhi and the culture of the Shiri and Sanzhi people. Shiri and Sanzhi belong to two different Dargi languages (Nakh-Daghestanian), spoken in the central part of Daghestan in the Caucasus (Russia). The languages are heavily endangered. We estimate that there are only about 200 Shiri families and about 100 Sanzhi families left.

The project aims at a detailed and in-depth documentation of Shiri and Sanzhi through the collection of texts from a wide range of genres. These texts will be made available to the public via the DoBeS archive (http://www.mpi.nl/DOBES/).

In the linguistic documentation and analysis of Shiri and Sanzhi we will pay special attention to those features that are unusual for the Nakh-Daghestanian language family and of broader typological interest. Two of these features are person agreement, which is based on the person hierarchy and not determined by grammatical roles, and extraordinarily rich TAM and evidentiality paradigms.

In our project we will collaborate with Russian colleagues (e.g. Nina Sumbatova) and colleagues from the University of Jena (Kevin Tuite, Florian Mühlfried). But our main cooperation partners will be Daghestanian researchers, students and the Shiri and Sanzhi communities.

The project ist funded by the DoBeS program of the VW foundation (http://www.volkswagenstiftung.de/service/aktuelles/article/129/chancen-fuer.html?no_cache=1&cHash=fefa1ac99f). It started in summer 2012 and runs for three years.

For further information please visit our project page: http://www.kaukaz.net/cgi-bin/blosxom.cgi/english/dargwa.

Chirag Documentation Project

Researchers:

Prof. Dr. Geoffrey Haig, Dr. Dmitry Ganenkov, Dr. Natasha Bogomolova

Project Details:

Major Documentation Project. Duration: 2014-2017. 126.000 EUR

Project Summary:

The project will document Chirag, an endangered language from the Dargwa branch of the East Caucasian (Nakh-Daghestanian) family, spoken in Daghestan, Russia (2100-2400 speakers). The main goal of the project is to collect a rich corpus of audio/video data from both traditional narratives and everyday communication. I propose to record about 110 hours of Chirag (spontaneous speech, lexical and grammatical elicitation), of which at least 25 hours of spontaneous speech will be transcribed, morphologically analyzed and translated to produce an annotated corpus of Chirag available on the internet.

Information can be found here.

Compilation and critical edition of pre-19th century Kurmanji Kurdish

Researcher:

Dr. Ergin Öpengin

Project Details:

Deutsche Forschungsgemeinschaft (DFG). Duration: 10/2014 - 03/2016. 134.767 EUR

Project Summary:

Kurmanji Kurdish is one of the most widely-spoken languages of the Middle East, but research on its history and development is severely hampered due to the lack of written attestation prior to the 15th century. Furthermore, the few samples of Kurdish prose that can reliably be ascribed to the period 15th-19th centuries are largely inaccessible to a wider scholarly audience, and lack reliable critical apparatus. This project will compile a selection of 10 Kurdish texts from prior to 1800, transliterated in a standardized format and supplied with English translations and an authoritative critical apparatus. The texts will also be made fully accessible as a digital corpus, accompanied by a concordance, and the resulting two volumes will be published on the open-access portal of the University of Bamberg. Issues of authorship and localization of the texts will also be assessed in the light of the applicant’s ongoing research on regional variation in Kurdish, which allows a much finer-grained evaluation than has previously been possible. The project will thus lay the foundation for serious academic research on the history of Kurdish by creating an open-source research resource for questions relating to the history of the Kurdish language(s) itself, to the issue of the position of Kurdish within west Iranian languages, reconstructing the linguistic ecology of Kurdistan in the Ottoman period, assessing the timing of contact phenomena and of language change, and of issues of literary and religious scribal practices in the period.

Agreement in Discourse

This project explores the function of agreement in natural texts. The concept of agreement has played a key role for various domains of linguistic theory (morphology, syntax, semantics), and there are a number of different approaches to modeling it. However, there is still no generally accepted explanation of its function: Why should languages so often develop agreement in their grammars? In his seminal work on agreement, Corbett (2006: 274-275; see also Lehmann 1988, Levin 2001: 21-27, Kibrik 2011) proposes four possible functions of agreement, among which the most important are:

  1. Agreement provides additional redundant (repeated) information to facilitate understanding for the hearer.
  2. Agreement helps the hearer to keep track of the different referents in a discourse.

Remarkably, the two central claims (agreement is redundant, and agreement is referential) continue to be repeated in the literature, despite the fact that, with very few exceptions, neither has ever been subjected to more rigorous testing (cf. Siewierska 1998, Bickel 2003), and both clearly admit counter examples.

Thus, the aim of this project is to test the proposed functions of agreement against a sample of 20 languages from all around the world. The results will be of central importance to the language science, and a test case for the applicability of text-based, as opposed to grammar-based, typology.

The project is conducted by Diana Forker and financially supported by the Daimler and Benz Foundation (http://www.daimler-benz-stiftung.de/cms/index.php?page=postdoc-stipendiaten-2012).

As part of the project Diana Forker and Geoffrey Haig organize a workshop at the University of Bamberg (1-2 February, 2013).

Documentation of Gorani, an endangered language of West Iran

This is a collaborative project, funded by the VW-foundation’s Programme Dokumentation Bedrohter Sprachen (DoBeS). The project was originally granted for three years (2007-2010), but has been extended till 2012. The project is a collaborative project, conducted together with Professor Ludwig Paul (Hamburg) and Professor Philip Kreyenbroek (Göttingen). Information on the project can be found here.