23.6 aztext entities 🚧

Named Entity Recognition

The entities command identifies the entities from the text together with other information, including the type of entity and a Wikipedia link. For each entity identified the output consists of a single line reporting the entity name, the type of entity and sub-type, the confidence of the type of entity, the offset to the entity in the original text, then text length of the entity, the Wikipedia confidence, language, entity name, and URL.

$ ml entities aztext I had a wonderful trip to Seattle last week and even visited the Space Needle 2 times!
Seattle,Location,,0.82,26,7,0.24,en,Seattle,https://en.wikipedia.org/wiki/Seattle
last week,DateTime,DateRange,0.80,34,9,,,,
Space Needle,Location,,0.80,65,12,0.39,en,Space Needle,https://en.wikipedia.org/wiki/Space_Needle
Space Needle,Organization,,0.94,65,12,,,,
2,Quantity,Number,0.80,78,1,,,,

As part of a command line we could count the number of unique entities in the text:

$ ml entities aztext I had a wonderful trip to Seattle last week and even visited the Space Needle 2 times! |
  cut -d, -f1 |
  sort -u |
  wc -l
4

How many unique locations are identified in the text:

$ ml entities aztext I had a wonderful trip to Seattle last week and even visited the Space Needle 2 times! |
  awk -F, '$2=="Location"{print}' |
  cut -d, -f1 |
  sort -u |
  wc -l
2


Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0