Blog

Language Update: TextRazor for Japanese

Thu 09 November 2017

Earlier this year we launched TextRazor support for Chinese, since then we've analyzed millions of Chinese documents for our customers around the world.

Next up is Japanese. We've just enabled the new language in Beta - you can try it right now through the API, or try pasting some Japanese documents into our demo.

We've also been busy over the past few months with a number of general enhancements to the Entity Extraction system, designed to help us hold on to our accuracy crown in this core feature. We've augmented our knowledgebase with information from several new sources to help increase our recall of smaller unlisted companies, startups, people and products. You should also have noticed significant improvements to our "coreference" system, where we link up and disambiguate multiple mentions of the same entity in a document.

We're also starting to rollout a brand new multilingual Deep Entity Tagger. This new system augments our current tagger by learning the meanings of rarer words across different languages and in noisier English documents. This boosts our entity accuracy by several percent in such documents - we're excited to deploy this to production for all over the next few weeks.

Thanks to all who have provided the great feedback on our results that has helped shape these enhancements. Please get in touch if you have any questions or comments.