21.3 Corpus Sources and Readers
There are a variety of sources supported by . We can use tm::getSources() to list them.
## [1] "DataframeSource" "DirSource" "URISource" "VectorSource"
## [5] "XMLSource" "ZipSource"
In addition to different kinds of sources of documents, our documents for text analysis will come in many different formats. A variety are supported by :
## [1] "readDataframe" "readDOC"
## [3] "readPDF" "readPlain"
## [5] "readRCV1" "readRCV1asPlain"
## [7] "readReut21578XML" "readReut21578XMLasPlain"
## [9] "readTagged" "readXML"
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0