This was referenced in my ScanSnap workflow series, but I thought I would provide it in its own article as well.
I have a ScanSnap S300M and Adobe Acrobat, and was getting pretty tired of sitting there OCRing the PDFs manually in Acrobat.
I came across this article by MacWorld which had a great Applescript Folder Action that would kick off Acrobat’s OCR whenever a PDF was dropped into the folder.
It worked well but I found that then I had to sit there and watch the OCR go after each document, and it seemed to have problems if I scanned another file while the OCR was still going.
I wanted a solution where I could just throw a bunch of PDFs at Acrobat and walk away.
Thanks to this thread on MacScripter, I turned the Macworld script into a droplet. Now I just go through and scan a bunch of PDFs to a folder, then drag the files onto the droplet, and go to bed. Acrobat OCRs each one one by one.
Here is the script. You can download it for free, but make sure you go to the Macworld article because it is 90% his work.
To use it:
- Download and uncompress the file and save it to your Desktop, Dock or wherever
- Drag one or more PDFs onto the icon
- Enjoy
Update: User nodis in the comments pointed out a great optimization that makes significantly smaller PDFs. Thanks!
Here is the source code:
property mytitle : "ocrIt-Acrobat"
-- Modified from a script created by Macworld http://www.macworld.com/article/60229/2007/10/nov07geekfactor.html
-- I am called when the user open the script with a double click
on run
tell me
activate
display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of PDF files onto my icon to batch OCR them." buttons {"OK"} default button 1 with title mytitle with icon note
end tell
end run
-- I am called when the user drops Finder items onto the script icon
-- Timeout of 36000 seconds to allow for OCRing really big documents
on open droppeditems
with timeout of 36000 seconds
try
repeat with droppeditem in droppeditems
set the item_info to info for droppeditem
tell application "Adobe Acrobat Professional"
activate
open droppeditem
end tell
tell application "System Events"
tell application process “Acrobat”
click the menu item “Recognize Text Using OCR…” of menu 1 of menu item “OCR Text Recognition” of the menu “Document” of menu bar 1
try
click radio button “All pages” of group 1 of group 2 of group 1 of window “Recognize Text”
end try
click button “OK” of window “Recognize Text”
end tell
end tell
tell application “Adobe Acrobat Professional”
save the front document with linearize
close the front document
end tell
end repeat
— catching unexpected errors
on error errmsg number errnum
my dsperrmsg(errmsg, errnum)
end try
end timeout
end open
-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
tell me
activate
display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
end tell
end dsperrmsg
Update 2: If you use Acrobat X, please see this post about OCR AppleScript for Acrobat X.