2019-09-26 16:20:05 +02:00
2019-09-26 16:41:37 +02:00

pdfgrab

What is it?

This is a reborn tool, used during the epoche dinosaurs were traipsing the earth. Basically it analyses PDF files for Metadata. You can direct it to a file or directory with pdfs. You can show it the url of a pdf or use the integrated googlesearch (thanx to mario vilas) class to search for pdfs at target site, download and analyse them

What information can be gathered?

This depends on the software used to create the pdf. And if it has been cleaned. However, common are the following things:

  • Producer
  • Creator
  • CreationDate
  • ModificationDate
  • Author
  • Title
  • Subject

and some more :)

How does it work?

Every more complex filetype above .txt or alike uses metadata for convinience, customer support or only to spread it has been used. There is a lot information about metadata in different sort of files like pictures, documents, videos, music online. This tool focuses on pdf only. If you are new to that term have a look here: https://en.wikipedia.org/wiki/Metadata

Usage

Those are your options major options:

  • grab pdf from url and analyse
  • search site for pdfs via google, grab and analyse
  • analyse a local pdf
  • analyse a local directory with pdfs in it

Single Url Mode

# ./pdfgrab.py -u https://www.kernel.org/doc/mirror/ols2004v2.pdf

Result:

[+] Grabbing https://www.kernel.org/doc/mirror/ols2004v2.pdf
[+] Written 3893173 bytes for File: pdfgrab/ols2004v2.pdf
[+] Opening pdfgrab/ols2004v2.pdf
--------------------------------------------------------------------------------
File: pdfgrab/ols2004v2.pdf
/Producer pdfTeX-0.14h
/Creator TeX
/CreationDate D:20040714015300
--------------------------------------------------------------------------------

Single File Mode

# ./pdfgrab.py -f pdfgrab/ols2004v2.pdf 

Result:

[+] Parsing pdfgrab/ols2004v2.pdf
[+] Opening pdfgrab/ols2004v2.pdf
--------------------------------------------------------------------------------
File: pdfgrab/ols2004v2.pdf
/Producer pdfTeX-0.14h
/Creator TeX
/CreationDate D:20040714015300
--------------------------------------------------------------------------------

Google Search Mode

# ./pdfgrab.py -s site:kernel.org

Result:

[+] Seek and analysing site:kernel.org
http://vger.kernel.org/lpc_bpf2018_talks/bpf_global_data_and_static_keys.pdf
http://vger.kernel.org/netconf2018_files/JiriPirko_netconf2018.pdf
http://vger.kernel.org/netconf2018_files/PaoloAbeni_netconf2018.pdf
http://vger.kernel.org/lpc_net2018_talks/LPC_XDP_Shirokov_paper_v1.pdf
http://vger.kernel.org/netconf2018_files/FlorianFainelli_netconf2018.pdf
http://vger.kernel.org/lpc_net2018_talks/tc_sw_paper.pdf
https://www.kernel.org/doc/mirror/ols2009.pdf
https://www.kernel.org/doc/mirror/ols2004v2.pdf
http://vger.kernel.org/lpc_net2018_talks/ktls_bpf.pdf
http://vger.kernel.org/lpc_net2018_talks/ktls_bpf_paper.pdf

[+] Written 211391 bytes for File: pdfgrab/bpf_global_data_and_static_keys.pdf
[+] Opening pdfgrab/bpf_global_data_and_static_keys.pdf
--------------------------------------------------------------------------------
File: pdfgrab/bpf_global_data_and_static_keys.pdf
/Author 
/Title 
/Subject 
/Creator LaTeX with Beamer class version 3.36
/Producer pdfTeX-1.40.17
/Keywords 
/CreationDate D:20181102231821+01'00'
/ModDate D:20181102231821+01'00'
/Trapped /False
/PTEX.Fullbanner This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) kpathsea version 6.2.2

Google

Search: filetype:pdf site:com Results: 264.000.000

Disclaimer

Have fun!

Description
No description provided
Readme 144 KiB
Languages
Python 100%