TextRazor goes Multilingual

Today we're excited to announce a big TextRazor upgrade.

We've had a great response to our entity recognition and linking system, and one of the biggest requests we've had is for expansion beyond English. Today we're extending support to French, German, Spanish, Portugese, Dutch, Polish, Russian, Swedish and Italian, with support for up to 15 more available on demand. Entity recognition, disambiguation, linking and topic detection is now available natively in all languages.

Language Independence

Our aim is to make multilingual text analyics completely transparent to your application. Typically internationalization is prohibitively expensive - generating multilingual training data requires native speaking linguists to mark up a large amount of text, then building a language specific model for each. With English speakers representing only 30% of the world's population, English-only support can be a costly constraint.

TextRazor makes it possible to build multilingual classifiers and text extraction applications without any knowledge of the underlying language and without relying on any noisy translation steps. We do this with the help of linked data and the semantic web, building on the links between different language Wikipedias and Freebase.

Our algorithms automatically detect your content's language, selecting a language specific analysis model that has been specifically optimized on localized content.

We then use this to extract a set of localized results, link them to English DBPedia ID and Freebase ID wherever available, and return them all to your application. This makes it possible to work directly with English tags regardless of the language of the underlying content. Inghilterra, Англия, Engeland, Anglia, Inglaterra, Angleterre and England - we know that these are all the same place so your application doesn't have to.

And More

We've taken this opportunity to use multilingual data to boost our English accuracy and greatly improve topic recall across the board. We've also rolled out a new grammatical parser and improve the speed and robustness of the platform as a whole. If you're already running TextRazor you dont need to do anything to pick up these changes, you should already be analyzing text faster and more accurately than ever. Thanks to all who have been sending in their valuable feedback and bug reports during our beta, keep it coming!

If you still need to get an API key, sign up for free and you'll be running in minutes!

Blog