21.5 PDF Documents
If instead of text documents we have a corpus of PDF documents then we can use the tm::readPDF() reader function to convert PDF into text and have that loaded as out Corpus.
This will use, by default, the pdftotext
command from
xpdf
to convert the PDF into text format. The xpdf
application needs to be installed for tm::readPDF() to work.
Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0