66.17 PDF Properties (MetaData)

20190829

A pdf document can have metadata associated with it, often noting the author, creator (software used to create it), and various dates such as the creation date. The standalone pdfinfo command from poppler-utils will show the document metadata as will pdftk of the pdftk package:

  $ pdfinfo survivor.pdf 
Title:          GNU/Linux: Survival Guide
Subject:        Linux Open Source
Keywords:       linux,ubuntu,debian,open source,gnu/linux
Author:         Graham Williams
Creator:        LaTeX with hyperref
Producer:       pdfTeX-1.40.20
CreationDate:   Sun Mar  8 10:44:03 2020 AEDT
ModDate:        Sun Mar  8 10:44:03 2020 AEDT
[...]
Pages:          1300
Page size:      595.276 x 841.89 pts (A4)
Page rot:       0
PDF version:    1.5

  $ pdftk mydoc.pdf dump_data
InfoBegin
InfoKey: Keywords
InfoValue: linux,ubuntu,debian,open source,gnu/linux
InfoBegin
InfoKey: Creator
InfoValue: LaTeX with hyperref
InfoBegin
InfoKey: ModDate
InfoValue: D:20200308105101+11'00'
InfoBegin
InfoKey: CreationDate
InfoValue: D:20200308105101+11'00'
InfoBegin
InfoKey: Subject
InfoValue: Linux Open Source
InfoBegin
InfoKey: Producer
InfoValue: pdfTeX-1.40.20
[...]

The metadata from this pdf file can be used to replace the metadata of another pdf file using the update_info command of pdftk:

$ pdftk mydoc.pdf dump_data > mydoc.info
$ pdftk newdoc.pdf update_info mydoc.info output updated.pdf

The updated.pdf file will now have the same metadata as mydoc.pdf. The relevance of the metadata to the updated file is a separate issue.

Another useful tool that will allow adding relevant metadata to a pdf file is exiftool, exiftool, exiftool.

$ exiftool -Title="My Doc Title" \
           -Author="Kayon Toga" \
           -Subject="My Doc Subject" \
           -Keywords="data science;open source;linux" \
           -Creator="Handmade PDF Tech" \
           -Producer="Togaware Productions" \
  mydoc.pdf

The tags that can be modified within a pdf document are listed in exiftool’s documentation.



Your donation will support ongoing availability and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984. Copyright © 1995-2022 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0