International Identifier for serials
and other continuing resources, in the electronic and print world

2022/05/09

ParaNames: A Massively Multilingual Entity Name Corpus

 Print

 Download

 Share

 Send to a friend

This preprint describes work in progress on ParaNames, a multilingual parallel name resource consisting of names for approximately 14 million entities. The included names span over 400 languages, and almost all entities are mapped to standardized entity types. Using Wikidata as a source, this is the largest resource of this type to-date. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking.