16.6 aztext entities

THIS SECTION IS UNDER DEVELOPMENT. PLEASE COME BACK LATER

Named Entity Recognition

The entities command identifies the entities from the text together with other information, including the type of entity and a Wikipedia link. For each entity identified the output consists of a single line reporting the entity name, the type of entity and sub-type, the confidence of the type of entity, the offset to the entity in the original text, then text length of the entity, the Wikipedia confidence, language, entity name, and URL.

$ ml entities aztext I had a wonderful trip to Seattle last week and even visited the Space Needle 2 times!
Seattle,Location,,0.82,26,7,0.24,en,Seattle,https://en.wikipedia.org/wiki/Seattle
last week,DateTime,DateRange,0.80,34,9,,,,
Space Needle,Location,,0.80,65,12,0.39,en,Space Needle,https://en.wikipedia.org/wiki/Space_Needle
Space Needle,Organization,,0.94,65,12,,,,
2,Quantity,Number,0.80,78,1,,,,

As part of a command line we could count the number of unique entities in the text:

$ ml entities aztext I had a wonderful trip to Seattle last week and even visited the Space Needle 2 times! |
  cut -d, -f1 |
  sort -u |
  wc -l
4

How many unique locations are identified in the text:

$ ml entities aztext I had a wonderful trip to Seattle last week and even visited the Space Needle 2 times! |
  awk -F, '$2=="Location"{print}' |
  cut -d, -f1 |
  sort -u |
  wc -l
2


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.