Optical Character Recognition w/ Tesseract
Using Tesseract OCR w/ from a .PDF file;
-
Tesseract doesn’t read PDFs.
-
However Imagemagik can convert PDF to .tiff.
-
Warning: Tiff files can be very large and require lots of RAM.
First try an 72dpi TIFF, If this does not work then go to 300 dpi bit size.
convert file.pdf -depth 8 file.tiff
tesseract file.tiff output
- 300 dpi size - when using Tesseract the output is default for .txt suffix.
convert -density 300 file.pdf -depth 8 file.tiff
tesseract file.tiff output
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.