Ressourcen
Emotion Corpora for appraisal theories and emotion component process model
Emotion Coping (2024)
We created a corpus based on crowdworkers who role play personalities that apply different coping strategies.
Authors: Enrica Troiano, Sofie Labat, Marco Antonio Stranisci, Rossana Damiano, Viviana Patti, and Roman Klinger
- Corpus Name: COPING
- Data source: Role-playing based experiment
- Annotation procedure: Crowdsourcing
- Paper: will come soon
- More information
Cumulated Emotion Progression Analysis in Dreams and Customer Service Dialogues (2024)
We created this corpus to analyze emotions as the cumulate in dreams and customer service dialogues. The assigned labeles represent the emotion up until the respective part of the instance, in context of the previous content.
Authors: Eileen Wemmer, Sofie Labat, Roman Klinger
- Corpus name: EmoProgress
- Data source: Dreams and customer agent dialogues
- Annotation procedure: Crowdsourcing
- Paper preprint
- Original Location
Appraisal Theories for Dimensional Modelling of Emotions in Text (2022/2023)
We created this corpus with emotion and appraisal dimensions with labels from two perspectives – the person who lived through a described event and readers who only have access to the text. Each text has been generated by asking people on Prolific to complete the sentence (for a given emotion): I felt [emotion] when/that/if…
Authors: Enrica Troiano, Laura Oberlaender, Roman Klinger
- Corpus name: crowd-enVENT
- Data source: Emotion self-reports
- Annotation procedure: Self-annotation by authors and by external readers via crowdsourcing
- Paper
- Download
Emotion Component Process Model Reannotation of REMAN and TEC (KONVENS 2021)
We reannotate parts of the TEC corpus and the REMAN corpus following the emotion component process model by Scherer, namely that the emotion is communicated by describing an event appraisal, a bodily reaction, an action tendency, a subjective feeling or an expression.
When using this corpus, please make sure to also cite the original publications by Saif Mohammad on the TEC corpus and our REMAN corpus publication.
Authors: Felix Casel, Amelie Heindl, Roman Klinger
- Corpus name: CPM-Corpus
- Data source: Twitter (TEC) and Literature (REMAN)
- Annotation procedure: Postannotation of existing emotion corpora
- Paper
- Download
Experiencer-specific Emotion and Appraisal Annotation (2022)
We reannotate event descriptions with 22 appraisal dimensions and emotions, for each person mentioned in an event description. This enables joint modelling experiments across multiple people in an event and analyses that are person-specific.
Authors: Enrica Troiano, Laura Oberlaender, Maximilian Wegge, Roman Klinger
- Corpus name: x-enVENT
- Data source: Self reports
- Annotation procedure: Postannotation with 4 annotators
- Paper
- Paper with experiments (to be published soon at NLPCSS@EMLPP 2022)
- Data Download
Appraisal enISEAR: A reannotation of the enISEAR corpus with Cognitive Appraisal (2020, 2021)
We reannotate the enISEAR corpus with cognitive appraisal dimensions following the Smith/Ellsworth model. The corpus consists of 1001 English event descriptions, annotated with the emotion the event has been described for and the appraisal dimensions of pleasantness, insecurity, self- and situational control, attention, and effort.
Authors: Jan Hofmann, Enrica Troiano, Roman Klinger
- Corpus name: Appraisal-enISEAR
- Data source: Self reports
- Annotation procedure: Postannotation
- Original Paper which introduces the concept of appraisal for emotion analysis.
- Paper with describes experiments on different annotation strategies.
- Data Download
- Data Download including different annotation strategies
- Repository with code and data
Emotion Communication Channels (2019)
The author of fictional texts can decide to let the character of a story to express in emotions in different ways, for instance by facial expressions, body movements, voice. With this corpus, we provide a resource in which we annotated these communication channels. This corpus is an extension of the emotion relation corpus mentioned above.
Authors: Evgeny Kim, Roman Klinger
- Corpus name: Fanfic
- Data source: Fan Fiction
- Annotation procedure: Expert Annotation
- Paper
- Download
- Alternative location
Emotion Classification Corpora
MMEmo: MultiModal Emotion Analysis on Reddit (2022)
The MMEmo Corpus is a corpus of Reddit posts which contains images and text for the emotion the posts express, an emotion stimulus category, and the relation between the image and the text.
Authors: Anna Khlyzova, Carina Silberer, Roman Klinger
- Corpus name: MMEmo
- Data source: Reddit
- Annotation procedure: Crowdsourcing
- Paper
- Download
- Download with image data please contact us
deISEAR, enISEAR: Self-reports of events associated with given emotions (2019)
deISEAR and enISEAR are a German and an English corpus created in the spirit of the original ISEAR data set, but via crowdsourcing in a two-step procedure, to ensure quality. The corpora consist of 1001 event descriptions which are associated with a predefined emotion.
Authors: Enrica Troiano, Sebastian Pado, Roman Klinger
- Corpus name: enISEAR and deISEAR (not to be confused with the original ISEAR corpora!)
- Data source: Self reports
- Annotation procedure: Crowdsourcing
- Paper
- Download
- Alternative location
Unified Emotions (2018)
Several emotion corpora exist nowadays, many in different file formats and with different label sets. We aggregated these corpora with an automatic download and conversion pipeline such that these resources are easier to be used and compared.
Authors: Laura Bostan, Roman Klinger
- Corpus name: Unified Emotions
- Data source: Different existing corpora
- Annotation procedure: diverse
- Paper
- Download
- Alternative location
Implicit Emotions Shared Task (2018)
For this shared task which took place with WASSA 2018, we collected data to have similar properties as the ISEAR data, but via distant supervision on Twitter. These data therefore mainly consist of event description without the explicit mention of an emotion word.
The test data is freely available. Contact me for a password to directly access the training data.
Authors: Roman Klinger, Saif Mohammad, Alexandra Balahur, Veronique Hoste, Orphee de Clercq
SSEC Corpus: Annotation of SemEval 2016 Stance Sentiment Corpus with Emotions (2018)
We reannotated the existing SemEval 2016 corpus, a resource already labeled with stances and sentiment, with emotions in a multiclass setting. This enables comparisons of these annotation layers. We publish all annotations of all annotators.
Authors: Hendrik Schuff, Jeremy Barnes, Julian Mohme, Sebastian Pado, Roman Klinger
- Corpus Name: SSEC
- Data source: Tweets from SemEval Stance Corpus 2016
- Annotation procedure: Experts
- Paper
- Direct Download
- More information
- Alternative location
Emotion Analysis from Text and Images (2017)
Emotion analysis in social media might need to consider images together with the text which refers to them, for instance on Twitter. For analyzing this complementarity, we collected a corpus of Tweets which contain images. It is automatically labeled based on hashtags. We only provide Tweet-IDs. If you need help with downloading the corresponding data via the Twitter API, contact us.
Author: Roman Klinger
- Data source: Twitter
- Annotation procedure: Distant labeling
- Paper
- Paper preprint
- Downloads:
- Alternative location
Relational Emotion and Emotion Stimulus Corpora
GerSti: A German Emotion Stimulus Corpus of News Headlines (KONVENS 2021)
Emotion stimulus detection became a popular task in emotion analysis recently, but most resources are only available in Mandarin and English. We contribute a German resource of token-level emotion stimulus annotations in a novel German news headline corpus and perform cross-lingual experiments in which we train on an English corpus and apply the model on our German resource.
Authors: Bao Minh Doan Dang, Laura Oberländer, Roman Klinger
Emotion relation corpus for the recognition of emotional relations of characters in fan fiction (2019)
Semantic role labeling of emotion events is a challenging task. In this corpus, we simplify this to a binary relation extraction task, in which character mention pairs are labeled with directed emotional relations between them, i.e., a character is either an emotion experiencer or the cause of an emotion.
Authors: Evgeny Kim, Roman Klinger
- Corpus name: FanFic
- Data source: Fan Fiction
- Annotation procedure: Expert Annotation
- Paper
- Download
- Alternative location
REMAN and GoodNewsEveryone: Emotion Corpora for Semantic Roles of Emotion Events (2019)
Emotions are commonly expressed in context of a mention of an experiencer (which can be the author of a text), with specific trigger words, and can describe the target and the stimulus of the emotion. We publish two corpora with such annotations, one of literature from Project Gutenberg and one of news headlines (additionally annotated with the reader perspective of emotions).
Authors: Laura Bostan, Evgeny Kim, Roman Klinger
Corpus 1: REMAN
- Data source: Literature
- Annotation procedure: Expert Annotation
- Paper
- Download
- Alternative location
Corpus 2: GoodNewsEveryone
- Data source: News headlines
- Annotation procedure: Expert Annotation
- Paper at ArXiv
- Paper
- Download
Resources and Dictionaries for Emotion Analysis
Emotion Intensity Lexicon of Nonsense Words
The goal in this study was to understand if nonsense words are reliably attributed emotions of particular intensity. To study this, we asked annotators in a best-worst-scaling setup to assign emotion intensities to nonsense words.
Authors: Valentino Sabbatino, Enrica Troiano, Antje Schweitzer, Roman Klinger
IMS Participation in EmoInt 2018
We participated in the shared task on emotion intensity prediction at WASSA in 2018 and scored second. Our model and results consist of a comparably standard neural architecture informed with different dictionaries of emotions, abstractness, concreteness, valence, arousal. We make all these resources and our implementation available.
Authors: Maximilian Koeper, Evgeny Kim, Roman Klinger
- Data source: EmoInt data set, automatically generated dictionaries
- Annotation procedure: Automatic
- Paper
- Resource download
- Code
- Alternative location
German Emotion Dictionaries created for the Analysis of Franz Kafka's Texts (2016)
We manually created German dictionaries for emotion analysis in Kafka’s Schloss and Amerika. These dictionaries are more specific than general dictionaries and might perform worse on other texts, however, they might be a good starting point for related text analyses.
Authors: Roman Klinger, Surayya Samat Suliya
Irony, Sarcasm and Satire
Twitter Corpus to compare irony to sarcasm (2016)
The concepts of irony and sarcasm are often used interchangeably, though they are not the same. With this corpus (and paper), we analze if a difference between these concepts can empirically be found on Twitter. We publish the Tweets themselves, together with meta information.
Authors: Jennifer Ling, Roman Klinger
- Data source: social media
- Annotation procedure: Distant labeling
- Paper
- Paper Preprint
- Download
- More information
German Satire Detection Corpus (2019)
We publish the first German corpus for satire detection. It is also the first corpus available with the information from which source an article came which enables training models with adversarial methods to not overfit to such confounding variables.
Authors: Robert McHardy, Heike Adel, Roman Klinger
- Data source: Regular and satirical news
- Annotation procedure: Distant labeling
- Paper
- Download
- Alternative location
Resources for Sentiment Analysis, Opinion Mining, Hate Speech Detection, Claim Detection, Fact Checking, Deception
Hierarchical Detox
Toxic language sometimes contains the mention of target entity, particularly in the case of hate speech. A model that detects toxic language may learn that a particular entity is an indicator for the presense of toxic language. Correcting for such biases might harm the performance of a classifier, but improve the generalization. We improve this situation by correcting for specific targets, while keeping general target categories intact.
Authors: Johannes Schäfer, Ulrich Heid, Roman Klinger
Science Communication Distortions
Distorted science communication harms individuals and society as it can lead to unhealthy behavior change and decrease trust in scientific institutions. Given the rapidly increasing volume of science communication in recent years, a fine-grained understanding of how findings from scientific publications are reported to the general public, and methods to detect distortions from the original work automatically, are crucial. This dataset contains 1,600 instances of scientific findings from academic papers paired with corresponding findings as reported in news articles and tweets annotated wrt. four characteristics: causality, certainty, generality and sensationalism.
Authors: Amelie Wuehrl, Dustin Wright, Roman Klinger, Isabelle Augenstein
Can Factual Statements be Deceptive? The DeFaBel Corpus of Belief-based Deception (2024)
A data set of German argumentative texts that are argue for or against a specific statement, independent of their own belief. If the author does not believe in the statement, it is marked as deceptive. Therefore, this corpus is the first resource (we are aware of) that combines factuality, belief, arguments, and deception.
Authors: Aswathy Velutharambath, Amelie Wuehrl, Roman Klinger
BEAR-Fact: A Twitter dataset with fact-checking labels, evidence texts and entity and relation annotations (2024)
A dataset of tweets annotated with fact-checking labels, evidence texts and structured knowledge, i.e., biomedical entities and relations.
Authors: Amelie Wührl, Yarik Menchaca Resendiz, Lara Grimminger und Roman Klinger
UniDecor: A Unified Deception Corpus for Cross-Corpus Deception Detection (2023)
We aggregated a set of existing deception dataset in a unified format and performed cross-corpus and in-corpus experiments.
Authors: Aswathy Velutharambath, Roman Klinger
- Download
- Original data location
- Paper will come soon
- Original location
CoVERT: A Corpus of Crowdsourced Fact-checking Verdicts for Biomedical COVID-19 Tweets (2022)
A corpus of 300 Twitter posts with claims about Covid-19. All tweets are annotated with crowdsourced fact-checking verdicts (supports, refutes, not enough info) and evidence texts supporting the verdicts.
Authors: Isabelle Mohr, Amelie Wuehrl, Roman Klinger
Biomedical Claims in Social Media (2021)
This corpus consists of Tweets regarding a set of medical conditions. We annotated the Tweets for containing an argumentative claim (or not). If the claim is explicitly mentioned, we also mark the claim phrase.
Authors: Amelie Wuehrl, Roman Klinger
Stance/HOF in the US 2020 Elections (2021)
This corpus consists of Tweets from the US 2020 elections and is annotated for hate-speech and stance towards the two main candidates.
Authors:Lara Grimminger, Roman Klinger
SCARE: German Corpus for Aspect-based Sentiment Analysis in App-Reviews (2016)
There are not many resources for aspect-based sentiment analysis in German. We contribute a corpus of Google Play reviews annotated with subjective phrases, aspects, and their relation.
Authors: Mario Saenger, Roman Klinger
USAGE: German and English Corpora for Aspect-based Sentiment Analysis in Product Reviews (2014)
There are not many resources for aspect-based sentiment analysis in German. We contribute a corpus of Amazon reviews annotated with subjective phrases, aspects, and their relation.
Authors: Roman Klinger
Resources for Biomedical and Chemical Text Mining
BEAR: Biomedical Entities and Relations in Tweets (2022)
A dataset of 2100 Twitter posts annotated with 14 different types of biomedical entities (e.g., disease, treatment, risk factor, etc.) and 20 relation types (including caused, treated, worsens, etc.).
Authors: Amelie Wuehrl, Roman Klinger
Corpus and resources for the detection of miRNA mentions in scientific text (2014)
Authors: Shweta Bagewadi, Tamara Bobic, Martin Hofmann-Apitius, Juliane Fluck, Roman Klinger
Weakly labeled corpus for protein-protein and drug-drug interactions (2012)
Authors: Philippe Thomas, Tamara Bobic, Martin Hofmann-Apitius, Ulf Leser, Roman Klinger
Corpus for testing normalization of variation mentions (2011)
Authors: Philippe E Thomas, Roman Klinger, Laura I Furlong, Martin Hofmann-Apitius, Christoph Friedrich
Corpus of Medline abstracts annotated with chemical entities (2008)
Authors: Corinna Kolarik, Roman Klinger, Christoph M. Friedrich, Martin Hofmann-Apitius, and Juliane Fluck
Corpus of Medline abstracts annotated with IUPAC entities (2009)
Authors: Roman Klinger, Corinna Kolářik, Juliane Fluck, Martin Hofmann-Apitius, Christoph M. Friedrich
- Paper
- Download train and test
- Original data location
Other Resources
Obituary Corpus Annotated for Logical Zones (2020)
Authors: Valentino Sabbatino, Laura Bostan, Roman Klinger
- Paper (LREC 2020)
- Alternative location
- Data Download
- You need a password for to access the data. Send us a mail clearly stating that you do not redistribute the data and that you will only use it for research.