21.33 Quantitative Analysis of Text
The (Rinker 2023) package provides an extensive suite of functions to support the quantitative analysis of text.
We can obtain simple summaries of a list of words, and to do so we
will illustrate with the terms from our Term Document Matrix
tdm
. We first extract the shorter terms from each of our
documents into one long word list. To do so we convert tdm
into a matrix, extract the column names (the terms) and retain those
shorter than 20 characters.
We can then summarise the word list. Notice, in particular, the use of qdap::dist_tab() from to generate frequencies and percentages.
## [1] 6456
## [1] "abstract" "academi" "accur" "accuraci" "acnntex" "acsi"
## [7] "act" "address" "adjust" "adopt" "advanc" "advantag"
## [13] "advers" "affect" "algorithm"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 6.644 8.000 19.000
##
## 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 579 867 1044 1114 935 651 397 268 200 138 79 63 34 28 22 21
## 19
## 16
## interval freq cum.freq percent cum.percent
## 1 3 579 579 8.97 8.97
## 2 4 867 1446 13.43 22.40
## 3 5 1044 2490 16.17 38.57
## 4 6 1114 3604 17.26 55.82
## 5 7 935 4539 14.48 70.31
## 6 8 651 5190 10.08 80.39
## 7 9 397 5587 6.15 86.54
## 8 10 268 5855 4.15 90.69
## 9 11 200 6055 3.10 93.79
## 10 12 138 6193 2.14 95.93
....
References
———. 2023. Qdap: Bridging the Gap Between Qualitative Data and Quantitative Analysis. https://trinker.github.io/qdap/.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0