Languages

TextRazor supports the automatic detection of 142 languages.

TextRazor also supports named entity recognition, entity disambiguation and linking, topic detection and taxonomy classification for the following languages:

  • Arabic
  • English
  • Chinese
  • Danish
  • Dutch
  • Finnish
  • French
  • German
  • Greek
  • Italian
  • Japanese
  • Korean
  • Norwegian
  • Polish
  • Portuguese
  • Russian
  • Spanish
  • Swedish
  • Ukrainian

Linking

TextRazor disambiguates and links all entities in all supported languages. Where an entity can be linked to an English equivalent both the localized and English Wikipedia IDs are returned.

Language Detection

TextRazor can automatically detect 142 languages using the contents of your text. The detected language is used for the appropriate processing logic, and returned to you in the TextRazor response. For short text (Tweets for example), there may not be enough context to accurately determine the language. If this is likely to be a problem you can pass "languageOverride" with the request to specify a processing language and skip the automatic detection stage.

Language Detection - Supported Languages

LanguageISO-639-2 Code
Englisheng
Danish dan
Dutch dut
Finnish fin
French fre
German ger
Hebrew heb
Italian ita
Japanese jpn
Korean kor
Norwegian nor
Polish pol
Portuguese por
Russian rus
Spanish spa
Swedish swe
Chinese chi
Czech cze
Greek gre
Icelandic ice
Latvian lav
Lithuanian lit
Romanian rum
Hungarian hun
Estonian est
Bulgarian bul
Croatian scr
Serbian scc
Irish gle
Galician glg
Turkish tur
Ukrainian ukr
Hindi hin
Macedonian mac
Bengali ben
Indonesian ind
Latin lat
Malay may
Malayalam mal
Welsh wel
Nepali nep
Telugu tel
Albanian alb
Tamil tam
Belarusian bel
Javanese jav
Occitan oci
Urdu urd
Bihari bih
Gujarati guj
Thai tha
Arabic ara
Catalan cat
Esperanto epo
Basque baq
Interlingua ina
Kannada kan
Punjabi pan
Scots_Gaelic gla
Swahili swa
Slovenian slv
Marathi mar
Maltese mlt
Vietnamese vie
Frisian fry
Slovak slo
Faroese fao
Sundanese sun
Uzbek uzb
Amharic amh
Azerbaijani aze
Georgian geo
Tigrinya tir
Persian per
Bosnian bos
Sinhalese sin
Norwegian_N nno
Xhosa xho
Zulu zul
Guarani grn
Sesotho sot
Turkmen tuk
Kyrgyz kir
Breton bre
Twi twi
Yiddish yid
Somali som
Uighur uig
Kurdish kur
Mongolian mon
Armenian arm
Laothian lao
Sindhi snd
Rhaeto_Romance roh
Afrikaans afr
Luxembourgish ltz
Burmese bur
Khmer khm
Tibetan tib
Dhivehi div
Oriya ori
Assamese asm
Corsican cos
Interlingue ine
Kazakh kaz
Lingala lin
Moldavian mol
Pashto pus
Quechua que
Shona sna
Tajik tgk
Tatar tat
Tonga tog
Yoruba yor
Maori mao
Wolof wol
Abkhazian abk
Afar aar
Aymara aym
Bashkir bak
Bislama bis
Dzongkha dzo
Fijian fij
Greenlandic kal
Hausa hau
Inupiak ipk
Inuktitut iku
Kashmiri kas
Kinyarwanda kin
Malagasy mlg
Nauru nau
Oromo orm
Rundi run
Samoan smo
Sango sag
Sanskrit san
Siswant ssw
Tsonga tso
Tswana tsn
Volapuk vol
Zhuang zha
Ganda lug
Manx glv