Developer's Daily | Unix by Example |
main | java | perl | unix | dev directory | web log |
pdftotext ? Portable Document Format (PDF) to text converter (version 0.91) |
pdftotext [options] [PDF-file [text-file]] |
Pdftotext converts Portable Document Format (PDF) files to plain text. Pdftotext reads the PDF file, PDF-file, and writes a text file, text-file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is ´-’, the text is sent to stdout. |
?f number |
Specifies the first page to convert. |
?l number |
Specifies the last page to convert. |
?ascii7 |
Convert the text to 7-bit ASCII; the default is to use the 8-bit ISO Latin-1 character set. |
?latin2 |
Convert the text to the Latin-2 (ISO-8859-2) character set. (This will only be useful if the font encodings are specified correctly in the PDF file.) |
?latin5 |
Convert the text to the Latin-5 (ISO-8859-9) character set. (This will only be useful if the font encodings are specified correctly in the PDF file.) |
?eucjp |
Convert Japanese text to EUC-JP. This is currently the only option for converting Japanese text -- the only effect is to switch to 7-bit ASCII for non-Japanese text, in order to fit into the EUC-JP encoding. (This option is only available if pdftotext was compiled with Japanese support.) |
||
?raw |
Keep the text in content stream order. This is a hack which often "undoes" column formatting, etc. This option will likely be replaced with something more sophisticated when pdftotext is rewritten to use a smarter text placement algorithm. |
?upw password |
Specify the user password for the PDF file. |
?q |
Don’t print any messages or errors. |
|||
?v |
Print copyright and version information. |
|||
?h |
Print usage information. (?help is equivalent.) |
Some PDF files contain fonts whose encodings have been mangled beyond recognition. There is no way (short of OCR) to extract text from these files. |
The pdftotext software and documentation are copyright 1996-2000 Derek B. Noonburg (derekn@foolabs.com). |
xpdf(1), pdftops(1), pdfinfo(1),
pdftopbm(1), pdfimages(1) |