Identificación Internacional de Publicaciones en Serie
y otros recursos continuados, electrónicos e impresos

2022/05/09

ParaNames: A Massively Multilingual Entity Name Corpus

 Imprimir

 Cargar

 Compartir

 Enviar a un amigo

This preprint describes work in progress on ParaNames, a multilingual parallel name resource consisting of names for approximately 14 million entities. The included names span over 400 languages, and almost all entities are mapped to standardized entity types. Using Wikidata as a source, this is the largest resource of this type to-date. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking.