| Developer's Daily | Unix by Example |
| main | java | perl | unix | dev directory | web log |
|
pdftotext − Portable Document Format (PDF) to text converter (version 0.91) |
|
pdftotext [options] [PDF-file [text-file]] |
|
Pdftotext converts Portable Document Format (PDF) files to plain text. Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is ´-’, the text is sent to stdout. |
|
−f number |
|
Specifies the first page to convert. |
|
−l number |
|
Specifies the last page to convert. |
|
−ascii7 |
|
Convert the text to 7-bit ASCII; the default is to use the 8-bit ISO Latin-1 character set. |
|
−latin2 |
|
Convert the text to the Latin-2 (ISO-8859-2) character set. (This will only be useful if the font encodings are specified correctly in the PDF file.) |
|
−latin5 |
|
Convert the text to the Latin-5 (ISO-8859-9) character set. (This will only be useful if the font encodings are specified correctly in the PDF file.) |
|
−eucjp |
Convert Japanese text to EUC-JP. This is currently the only option for converting Japanese text -- the only effect is to switch to 7-bit ASCII for non-Japanese text, in order to fit into the EUC-JP encoding. (This option is only available if pdftotext was compiled with Japanese support.) |
||
|
−raw |
Keep the text in content stream order. This is a hack which often "undoes" column formatting, etc. This option will likely be replaced with something more sophisticated when pdftotext is rewritten to use a smarter text placement algorithm. |
|
−upw password |
|
Specify the user password for the PDF file. |
|
−q |
Don’t print any messages or errors. |
|||
|
−v |
Print copyright and version information. |
|||
|
−h |
Print usage information. (−help is equivalent.) |
|
Some PDF files contain fonts whose encodings have been mangled beyond recognition. There is no way (short of OCR) to extract text from these files. |
|
The pdftotext software and documentation are copyright 1996-2000 Derek B. Noonburg (derekn@foolabs.com). |
|
xpdf(1), pdftops(1), pdfinfo(1),
pdftopbm(1), pdfimages(1) |