Simone Conia | publications

2022

LREC

Universal Semantic Annotator: the First Unified API for WSD, SRL and Semantic Parsing

Orlando Riccardo, Conia Simone, Faralli Stefano, and Navigli Roberto

In Proceedings of LREC, 2022.

Abstract Paper

In this paper, we present the Universal Semantic Annotator (USeA), which offers the first unified API for high-quality automatic annotations of texts in 100 languages through state-of-the-art systems for Word Sense Disambiguation, Semantic Role Labeling and Semantic Parsing. Together, such annotations can be used to provide users with rich and diverse semantic information, help second-language learners, and allow researchers to integrate explicit semantic knowledge into downstream tasks and real-world applications.
ACL

Probing for Predicate Argument Structures in Pretrained Language Models

Conia Simone, and Navigli Roberto

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022.

Abstract Paper

Thanks to the effectiveness and wide availability of modern pretrained language models (PLMs), recently proposed approaches have achieved remarkable results in dependency- and span-based, multilingual and cross-lingual Semantic Role Labeling (SRL). These results have prompted researchers to investigate the inner workings of modern PLMs with the aim of understanding how, where, and to what extent they encode information about SRL. In this paper, we follow this line of research and probe for predicate argument structures in PLMs. Our study shows that PLMs do encode semantic structures directly into the contextualized representation of a predicate, and also provides insights into the correlation between predicate senses and their structures, the degree of transferability between nominal and verbal structures, and how such structures are encoded across languages. Finally, we look at the practical implications of such insights and demonstrate the benefits of embedding predicate argument structure information into an SRL model.
ACL

SRL4E – Semantic Role Labeling for Emotions: A Unified Evaluation Framework

Campagnano Cesare, Conia Simone, and Navigli Roberto

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022.

Abstract Paper

In the field of sentiment analysis, several studies have highlighted that a single sentence may express multiple, sometimes contrasting, sentiments and emotions, each with its own experiencer, target and/or cause. To this end, over the past few years researchers have started to collect and annotate data manually, in order to investigate the capabilities of automatic systems not only to distinguish between emotions, but also to capture their semantic constituents. However, currently available gold datasets are heterogeneous in size, domain, format, splits, emotion categories and role labels, making comparisons across different works difficult and hampering progress in the area. In this paper, we tackle this issue and present a unified evaluation framework focused on Semantic Role Labeling for Emotions (SRL4E), in which we unify several datasets tagged with emotions and semantic roles by using a common labeling scheme. We use SRL4E as a benchmark to evaluate how modern pretrained language models perform and analyze where we currently stand in this task, hoping to provide the tools to facilitate studies in this complex area.
ACL

Nibbling at the Hard Core of Word Sense Disambiguation

Maru Marco, Conia Simone, Bevilacqua Michele, and Navigli Roberto

In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022.

Abstract Paper

With state-of-the-art systems having finally attained estimated human performance, Word Sense Disambiguation (WSD) has now joined the array of Natural Language Processing tasks that have seemingly been solved, thanks to the vast amounts of knowledge encoded into Transformer-based pre-trained language models. And yet, if we look below the surface of raw figures, it is easy to realize that current approaches still make trivial mistakes that a human would never make. In this work, we provide evidence showing why the F1 score metric should not simply be taken at face value and present an exhaustive analysis of the errors that seven of the most representative state-of-the-art systems for English all-words WSD make on traditional evaluation benchmarks.In addition, we produce and release a collection of test sets featuring (a) an amended version of the standard evaluation benchmark that fixes its lexical and semantic inaccuracies, (b) 42D, a challenge set devised to assess the resilience of systems with respect to least frequent word senses and senses not seen at training time, and (c) hardEN, a challenge set made up solely of instances which none of the investigated state-of-the-art systems can solve. We make all of the test sets and model predictions available to the research community at https://github.com/SapienzaNLP/wsd-hard-benchmark.

2021

EMNLP

InVeRo-XL: Making Cross-Lingual Semantic Role Labeling Accessible with Intelligible Verbs and Roles

Conia Simone, Orlando Riccardo, Brignone Fabrizio, Cecconi Francesco, and Navigli Roberto

In Proceedings of EMNLP: System Demonstrations, 2021.

Abstract Paper

Notwithstanding the growing interest in cross-lingual techniques for Natural Language Processing , there has been a surprisingly small number of efforts aimed at the development of easy-to-use tools for cross-lingual Semantic Role Labeling. In this paper, we fill this gap and present InVeRo-XL, an off-the-shelf state-of-the-art system capable of annotating text with predicate sense and semantic role labels from 7 predicate-argument structure inventories in more than 40 languages. We hope that our system-with its easy-to-use RESTful API and Web interface-will become a valuable tool for the research community , encouraging the integration of sentence-level semantics into cross-lingual downstream tasks. InVeRo-XL is available online at http://nlp.uniroma1.it/invero.
EMNLP

AMuSE-WSD: An All-in-one Multilingual System for Easy Word Sense Disambiguation

Orlando Riccardo, Conia Simone, Brignone Fabrizio, Cecconi Francesco, and Navigli Roberto

In Proceedings of EMNLP: System Demonstrations, 2021.

Abstract Paper

Over the past few years, Word Sense Disambiguation (WSD) has received renewed interest: recently proposed systems have shown the remarkable effectiveness of deep learning techniques in this task, especially when aided by modern pretrained language models. Unfortunately, such systems are still not available as ready-to-use end-to-end packages, making it difficult for researchers to take advantage of their performance. The only alternative for a user interested in applying WSD to downstream tasks is to use currently available end-to-end WSD systems, which, however, still rely on graph-based heuristics or non-neural machine learning algorithms. In this paper, we fill this gap and propose AMuSE-WSD, the first end-to-end system to offer high-quality sense information in 40 languages through a state-of-the-art neural model for WSD. We hope that AMuSE-WSD will provide a stepping stone for the integration of meaning into real-world applications and encourage further studies in lexical semantics. AMuSE-WSD is available online at http://nlp.uniroma1.it/amuse-wsd.
EMNLP

Named Entity Recognition for Entity Linking: What Works and What’s Next

Tedeschi Simone, Conia Simone, Cecconi Francesco, and Navigli Roberto

In Findings of the Association for Computational Linguistics: EMNLP 2021, 2021.

Abstract Paper

Entity Linking (EL) systems have achieved impressive results on standard benchmarks, mainly thanks to the contextualized representations provided by recent pretrained language models. However, such systems still require massive amounts of data — millions of labeled examples — to perform at their best, with training times that often exceed several days, especially when limited computational resources are available. In this paper, we look at how Named Entity Recognition (NER) can be exploited to narrow the gap between EL systems trained on high and low amounts of labeled data. More specifically, we show how and to what extent an EL system can benefit from NER to enhance its entity representations, improve candidate selection, select more effective negative samples and enforce hard and soft constraints on its output entities. We release our software — code and model checkpoints — at https://github. com/Babelscape/ner4el.
EMNLP

UniteD-SRL: A Unified Dataset for Span-and Dependency-Based Multilingual and Cross-Lingual Semantic Role Labeling

Tripodi Rocco, Conia Simone, and Navigli Roberto

In Findings of the Association for Computational Linguistics: EMNLP 2021, 2021.

Abstract Paper

Multilingual and cross-lingual Semantic Role Labeling (SRL) have recently garnered increasing attention as multilingual text representation techniques have become more effective and widely available. While recent work has attained growing success, results on gold multilingual benchmarks are still not easily comparable across languages, making it difficult to grasp where we stand. For example, in CoNLL-2009, the standard benchmark for multilingual SRL, language-to-language comparisons are affected by the fact that each language has its own dataset which differs from the others in size, domains, sets of labels and annotation guidelines. In this paper, we address this issue and propose UNITED-SRL, a new benchmark for multilingual and cross-lingual, span-and dependency-based SRL. UNITED-SRL provides expert-curated parallel annotations using a common predicate-argument structure inventory, allowing direct comparisons across languages and encouraging studies on cross-lingual transfer in SRL. We release UNITED-SRL v1.0 at https://github.com/SapienzaNLP/united-srl.
IJCAI

Ten Years of BabelNet: A Survey

Navigli Roberto, Bevilacqua Michele, Conia Simone, Montagnini Dario, and Cecconi Francesco

In Proceedings of IJCAI, 2021.

Abstract Paper

The intelligent manipulation of symbolic knowledge has been a long-sought goal of AI. However, when it comes to Natural Language Processing (NLP), symbols have to be mapped to words and phrases, which are not only ambiguous but also language-specific: multilinguality is indeed a desirable property for NLP systems, and one which enables the generalization of tasks where multiple languages need to be dealt with, without translating text. In this paper we survey BabelNet, a popular wide-coverage lexical-semantic knowledge resource obtained by merging heterogeneous sources into a unified semantic network that helps to scale tasks and applications to hundreds of languages. Over its ten years of existence, thanks to its promise to interconnect languages and resources in structured form, BabelNet has been employed in countless ways and directions. We first introduce the BabelNet model, its components and statistics, and then overview its successful use in a wide range of tasks in NLP as well as in other fields of AI.
IJCAI

Generating Senses and RoLes: An End-to-End Model for Dependency-and Span-based Semantic Role Labeling

Blloshmi Rexhina, Conia Simone, Tripodi Rocco, and Navigli Roberto

In Proceedings of IJCAI, 2021.

Abstract Paper

Despite the recent great success of the sequence-to-sequence paradigm in Natural Language Processing, the majority of current studies in Semantic Role Labeling (SRL) still frame the problem as a sequence labeling task. In this paper we go against the flow and propose GSRL (Generating Senses and RoLes), the first sequence-to-sequence model for end-to-end SRL. Our approach benefits from recently-proposed decoder-side pretraining techniques to generate both sense and role labels for all the predicates in an input sentence at once, in an end-to-end fashion. Evaluated on standard gold benchmarks, GSRL achieves state-of-the-art results in both dependency- and span-based English SRL, proving empirically that our simple generation-based model can learn to produce complex predicate-argument structures. Finally, we propose a framework for evaluating the robustness of an SRL model in a variety of synthetic low-resource scenarios which can aid human annotators in the creation of better, more diverse, and more challenging gold datasets. We release GSRL at github.com/SapienzaNLP/gsrl.
NAACL

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources

Conia Simone, Bacciu Andrea, and Navigli Roberto

In Proceedings of NAACL, 2021.

Abstract Paper

While cross-lingual techniques are finding increasing success in a wide range of Natural Language Processing tasks, their application to Semantic Role Labeling (SRL) has been strongly limited by the fact that each language adopts its own linguistic formalism, from PropBank for English to AnCora for Spanish and PDT-Vallex for Czech, inter alia. In this work, we address this issue and present a unified model to perform cross-lingual SRL over heterogeneous linguistic resources. Our model implicitly learns a high-quality mapping for different formalisms across diverse languages without resorting to word alignment and/or translation techniques. We find that, not only is our cross-lingual system competitive with the current state of the art but that it is also robust to low-data scenarios. Most interestingly, our unified model is able to annotate a sentence in a single forward pass with all the inventories it was trained with, providing a tool for the analysis and comparison of linguistic theories across different languages.
EACL

Framing Word Sense Disambiguation as a Multi-Label Problem for Model-Agnostic Knowledge Integration

Conia Simone, and Navigli Roberto

In Proceedings of EACL, 2021.

Abstract Paper

Recent studies treat Word Sense Disambiguation (WSD) as a single-label classification problem in which one is asked to choose only the best-fitting sense for a target word, given its context. However, gold data labelled by expert annotators suggest that maximizing the probability of a single sense may not be the most suitable training objective for WSD, especially if the sense inventory of choice is fine-grained. In this paper, we approach WSD as a multi-label classification problem in which multiple senses can be assigned to each target word. Not only does our simple method bear a closer resemblance to how human annotators disambiguate text, but it can also be extended seamlessly to exploit structured knowledge from semantic networks to achieve state-of-the-art results in English all-words WSD.

2020

COLING

Bridging the Gap in Multilingual Semantic Role Labeling: A Language-Agnostic Approach

Conia Simone, and Navigli Roberto

In Proceedings of COLING, 2020.

Abstract Paper

Recent research indicates that taking advantage of complex syntactic features leads to favorable results in Semantic Role Labeling. Nonetheless, an analysis of the latest state-of-the-art multilingual systems reveals the difficulty of bridging the wide gap in performance between high-resource (e.g., English) and low-resource (e.g., German) settings. To overcome this issue, we propose a fully language-agnostic model that does away with morphological and syntactic features to achieve robustness across languages. Our approach outperforms the state of the art in all the languages of the CoNLL-2009 benchmark dataset, especially whenever a scarce amount of training data is available. Our objective is not to reject approaches that rely on syntax, rather to set a strong and consistent language-independent baseline for future innovations in Semantic Role Labeling. We release our model code and checkpoints at https://github.com/SapienzaNLP/multi-srl.
COLING

Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations

Conia Simone, and Navigli Roberto

In Proceedings of COLING, 2020.

Abstract Paper

To date, the most successful word, word sense, and concept modelling techniques have used large corpora and knowledge resources to produce dense vector representations that capture semantic similarities in a relatively low-dimensional space. Most current approaches, however, suffer from a monolingual bias, with their strength depending on the amount of data available across languages. In this paper we address this issue and propose Conception, a novel technique for building language-independent vector representations of concepts which places multilinguality at its core while retaining explicit relationships between concepts. Our approach results in high-coverage representations that outperform the state of the art in multilingual and cross-lingual Semantic Word Similarity and Word Sense Disambiguation, proving particularly robust on low-resource languages. Conception - its software and the complete set of representations - is available at https://github.com/SapienzaNLP/conception.
EMNLP

InVeRo: Making Semantic Role Labeling Accessible with Intelligible Verbs and Roles

Conia Simone, Brignone Fabrizio, Zanfardino Davide, and Navigli Roberto

In Proceedings of EMNLP, 2020.

Abstract Paper

Semantic Role Labeling (SRL) is deeply dependent on complex linguistic resources and sophisticated neural models, which makes the task difficult to approach for non-experts. To address this issue we present a new platform named Intelligible Verbs and Roles (InVeRo). This platform provides access to a new verb resource, VerbAtlas, and a state-of-the-art pre-trained implementation of a neural, span-based architecture for SRL. Both the resource and the system provide human-readable verb sense and semantic role information, with an easy to use Web interface and RESTful APIs available at http://nlp.uniroma1.it/invero.

2019

EMNLP

VerbAtlas: A Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling

Di Fabio Andrea, Conia Simone, and Navigli Roberto

In Proceedings of EMNLP, 2019.

Abstract Paper

We present VerbAtlas, a new, hand-crafted lexical-semantic resource whose goal is to bring together all verbal synsets from WordNet into semantically-coherent frames. The frames define a common, prototypical argument structure while at the same time providing new concept-specific information. In contrast to PropBank, which defines enumerative semantic roles, VerbAtlas comes with an explicit, cross-frame set of semantic roles linked to selectional preferences expressed in terms of WordNet synsets, and is the first resource enriched with semantic information about implicit, shadow, and default arguments. We demonstrate the effectiveness of VerbAtlas in the task of dependency-based Semantic Role Labeling and show how its integration into a high-performance system leads to improvements on both the in-domain and out-of-domain test sets of CoNLL-2009. VerbAtlas is available at http://verbatlas.org.