com.textrazor.annotations
Class Entity
java.lang.Object
com.textrazor.annotations.Annotation
com.textrazor.annotations.Entity
public class Entity
- extends Annotation
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Entity
public Entity()
getMatchingWords
public java.util.List<Word> getMatchingWords()
- Returns:
- List of the
Word
objects that make up this entity.
getId
public int getId()
- Returns:
- The ID of this annotation.
getEntityId
public java.lang.String getEntityId()
- Returns:
- the ID for this entity, or null if this entity could not be disambiguated. This ID is from the localized Wikipedia for this document's language.
getFreebaseId
public java.lang.String getFreebaseId()
- Returns:
- the disambiguated Freebase ID for this entity, or null if either this entity could not be disambiguated, or a Freebase link doesn't exist.
getWikiLink
public java.lang.String getWikiLink()
- Returns:
- the full canonical link to Wikipedia for this entity, or null if either this entity could not be disambiguated or a Wikipedia link doesn't exist.
getMatchedText
public java.lang.String getMatchedText()
- Returns:
- the source text string that matched this entity.
getStartingPos
public int getStartingPos()
- Returns:
- The start offset in the input text for this entity. Note that TextRazor treats multi byte utf8 characters as a single position.
getEndingPos
public int getEndingPos()
- Returns:
- The end offset in the input text for this entity. Note that TextRazor treats multi byte utf8 characters as a single position.
getFreebaseTypes
public java.util.List<java.lang.String> getFreebaseTypes()
- Returns:
- List of Freebase types for this entity, or an empty list if there are none.
getDBPediaTypes
public java.util.List<java.lang.String> getDBPediaTypes()
- Returns:
- List of DBPedia types for this entity, or an empty list if there are none.
getRelevanceScore
public double getRelevanceScore()
- Returns:
- The relevance this entity has to the source text. This is a float on a scale of 0 to 1, with 1 being the most relevant. Relevance is determined by the contextual similarity between the entities context and facts in the TextRazor knowledgebase.
getConfidenceScore
public double getConfidenceScore()
- Returns:
- The confidence that TextRazor is correct that this is a valid entity. TextRazor uses an ever increasing number of signals to help spot valid entities, all of which contribute to this score. These include the contextual agreement between the words in the source text and our knowledgebase, agreement between other entities in the text, agreement between the expected entity type and context, prior probabilities of having seen this entity across wikipedia and other web datasets. The score ranges from 0.5 to 10, with 10 representing the highest confidence that this is a valid entity.
getMatchingTokens
public java.util.List<java.lang.Integer> getMatchingTokens()
- Returns:
- List of the word positions in the current sentence that make up this entity.
getType
public java.util.List<java.lang.String> getType()
- Returns:
- List of DBPedia types for this entity, or an empty array if there are none.
getEntityEnglishId
public java.lang.String getEntityEnglishId()
- Returns:
- The disambiguated entityId in the English Wikipedia, where a link between localized and English ID could be found. Null if either the entity could not be linked, or where a language link did not exist.