Acrobat Applescript For ScanSnap OCR

Acrobat Applescript For ScanSnap OCR

This was referenced in my ScanSnap workflow series, but I thought I would provide it in its own article as well.

I have a ScanSnap S300M and Adobe Acrobat, and was getting pretty tired of sitting there OCRing the PDFs manually in Acrobat.

I came across this article by MacWorld which had a great Applescript Folder Action that would kick off Acrobat’s OCR whenever a PDF was dropped into the folder.

It worked well but I found that then I had to sit there and watch the OCR go after each document, and it seemed to have problems if I scanned another file while the OCR was still going.

I wanted a solution where I could just throw a bunch of PDFs at Acrobat and walk away.

Thanks to this thread on MacScripter, I turned the Macworld script into a droplet. Now I just go through and scan a bunch of PDFs to a folder, then drag the files onto the droplet, and go to bed. Acrobat OCRs each one one by one.

Here is the script. You can download it for free, but make sure you go to the Macworld article because it is 90% his work.

OCRIt-Acrobat – Droplet to batch OCR PDFs in Adobe Acrobat

To use it:

  • Download and uncompress the file and save it to your Desktop, Dock or wherever
  • Drag one or more PDFs onto the icon
  • Enjoy

Update: User nodis in the comments pointed out a great optimization that makes significantly smaller PDFs. Thanks!

Here is the source code:

property mytitle : "ocrIt-Acrobat" -- Modified from a script created by Macworld http://www.macworld.com/article/60229/2007/10/nov07geekfactor.html

-- I am called when the user open the script with a double click
on run
tell me
activate
display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of PDF files onto my icon to batch OCR them." buttons {"OK"} default button 1 with title mytitle with icon note
end tell
end run

-- I am called when the user drops Finder items onto the script icon
-- Timeout of 36000 seconds to allow for OCRing really big documents
on open droppeditems
with timeout of 36000 seconds
try
repeat with droppeditem in droppeditems
set the item_info to info for droppeditem
tell application "Adobe Acrobat Professional"
activate
open droppeditem
end tell
tell application "System Events"

tell application process “Acrobat”

click the menu item “Recognize Text Using OCR…” of menu 1 of menu item “OCR Text Recognition” of the menu “Document” of menu bar 1
try
click radio button “All pages” of group 1 of group 2 of group 1 of window “Recognize Text”
end try
click button “OK” of window “Recognize Text”

end tell

end tell
tell application “Adobe Acrobat Professional”
save the front document with linearize
close the front document
end tell
end repeat
— catching unexpected errors
on error errmsg number errnum
my dsperrmsg(errmsg, errnum)
end try
end timeout
end open

-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
tell me
activate
display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
end tell
end dsperrmsg

Update 2: If you use Acrobat X, please see this post about OCR AppleScript for Acrobat X.

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 51 comments

macfred - May 17, 2014 Reply

Having trouble with OCR-iT Acrobat, for a while now.

Sorry, an error occured:

OCRIt-Acrobat is not allowed assistive access. (-1719)

Workarounds? Alternatives.?

theDuck - March 3, 2014 Reply

I get the same error….

tt slow - August 27, 2013 Reply

I get the following error and am not sure how to correct it…

Sorry, an error occurred:

System Events got an error: Can't get application process "Acrobat". (-1728)

dave - August 15, 2012 Reply

Can this be automated with Hazel? I'd like to scan documents into a folder watched by Hazel which would automatically ocr the documents in the folder using Adobe Acrobat. However, I'm not sure how to set that up. Can anyone point me in the right direction? Many thanks.

Johannes - April 10, 2012 Reply

Is there something similar for the Windows version of Adobe Acrobat Professional?

    Brooks Duncan - April 10, 2012 Reply

    Not that I'm aware of.<p style=”color: #A0A0A8;”>

Brooks Duncan - August 5, 2011 Reply

For those asking about Acrobat X, please see this post: http://www.documentsnap.com/ocr-applescript-for-a

j-lon - April 19, 2011 Reply

Anyone have a copy of the Acrobat 7 script? The link above appears to be dead.

Etienne - April 15, 2011 Reply

Hi,

Did you get the chance to make it work with Acrobat X pro?

Thanks.

Orin - January 21, 2011 Reply

As the last commenter mentioned, I just upgraded to Acrobat X (10) and the script no longer works. If someone with some scripting know how could make a new new/updated OCR script droplet it would be MUCH appreciated. I really miss it!
Thanks!

    Brooks Duncan - January 21, 2011 Reply

    This is going to sound weird but could you send me some screenshots of the OCR steps in Acrobat X? I might be able to fix the script without actually having it yet. If you (or anyone) can do this, mail it to brooks@documentsnap.com and I'll try to get it fixed up.

      Orin - January 24, 2011 Reply

      That is a great offer. I will email you the info. However I think there could be a bigger problem at work here. The catch seems to be that the new Acrobat X does not allow you to access the OCR tool from the menu system. It is now accessed from the toolbar via a "Tools" button, "Recognize Text" button, and finally "In This File" button. Makes it easier to access within Acrobat, but does not work with a script that uses the program's menus. I tried to edit the existing AppleScript without any luck. I was able to make an Acrobat action that ran the OCR but it has a success dialog box with an OK button to click at the end of the action that I can't seem to make the script wait for and click when it appears. The action is cool because it will save the file as well. I currently have the droplet send the PDF to Acrobat and run the action. I then have to manually click the OK button and close the file and I can only send one file to Acrobat at a time. All bummers!

        Etienne - April 15, 2011 Reply

        Did you finally got it to make it work with Acrobat X Pro?

        Thanks!

@hackeron - December 19, 2010 Reply

Doesn't work on Acrobat X 🙁

Karl - August 10, 2010 Reply

Hi,
does anyone know how to change the script that it saves the text recognized file into a special folder like the DevonThink Inbox?

    Karl - August 13, 2010 Reply

    I got it:

    tell application "Adobe Acrobat Pro"
    set theName to name of front document
    set file_path to "Platon:Users:name:Library:Application Support:Devonthink Pro 2:inbox:" & theName

    save the front document to file file_path with linearize
    close the front document
    end tell
    tell application "Finder"
    delete droppeditem
    end tell

      Brooks Duncan - August 13, 2010 Reply

      Nice work Karl thanks for posting this!

Nick S. - July 23, 2010 Reply

Hi, I am using snow leopard and adobe professional 8

Getting the following error message:
"Sorry, an error occured: System Events got an error: Can't get window 'Recognize Text' of application process 'Acrobat' (-1728)

    Brooks Duncan - July 23, 2010 Reply

    Hi Nick, can you go to System Preferences > Universal Access and check to see if "Enable access for assistive devices" is checked? If so, are you using an English version of Acrobat or some other language?

    Karl - August 10, 2010 Reply

    Hi Nick,
    I got the german Acrobat 9 for mac and for the script to run I needed to translate the correspendet menu commands as you can see in the following script excerpt:

    tell application process "Acrobat"

    click the menu item "Text mit OCR erkennen…" of menu 1 of menu item "OCR-Texterkennung" of the menu "Dokument" of menu bar 1
    try
    click radio button "Alle Seiten" of group 1 of group 2 of group 1 of window "Text erkennen"
    end try
    click button "OK" of window "Text erkennen"

    end tell

sims - February 17, 2010 Reply

The linked app requires me to install Rosetta. Since I do not need it for anything else, I have avoided it. Is there a way to save the script in AppleScript editor and use it without Rosetta?

I tried to copy and paste it into AppleScriptEditor. Upon saving the script it asked me to help it locate Acrobat, which I did by point it to "Adobe Acrobat Pro.app". But then it failed and pointed me to this line in the script – tell application process “Acrobat”.

The cursor is at the " before the ACrobat in the line above and I receive a Syntax Error – Expected expression, property or key form etc but found unknown token.

Any ideas? Thanks!

    nodis - February 19, 2010 Reply

    Hmmm. You're right. The revised script is saved as a PPC app, as opposed to Universal. You can fix that easily though.

    In Snow Leopard, launch AppleScript Editor (you'll find it in your /Applications/Utilities folder). Open the revised droplet from within ApplesScript Editor. Choose "Save As…" to re-save and, voila, a Universal version of the app.

      Brooks Duncan - February 22, 2010 Reply

      Hi guys, I've replaced the linked file with one that should be Universal. Sorry about that!

        sims - February 25, 2010 Reply

        thank you guys.
        this one works and works beautifully!

Brooks Duncan - February 16, 2010 Reply

OK everyone, I have updated the post and posted the updated version at http://www.documentsnap.com/files/OCRIt-Acrobat-1… with nodis' changes. Thanks again!!!

nodis - February 12, 2010 Reply

Not a big deal. I must say that the difference in size and quality between OCR-ing PDFs using the latest ABBYY FinePrint for ScanSnap and Acro 9/ClearScan OCR downsampling at 600 dpi/and your droplet is amazing.

I just scanned a 25 page B&W Word document. The original image-only PDF is 6.2 MB. ABBYY produces an OCR-ed PDF that weighs in at 10.5 MB choosing "High" quality; it is 1.8 MB at Medium quality (and looks like crap visually at Medium quality, I should add — lots of compression artifacts).

By comparison, using your Applescript droplet, with the edit I suggested, along with Acrobat 9 and the "ClearScan" and 600 dpi OCR options, my OCR-d PDF comes out at 356 KB. This version looks superb, and the OCR quality seems fine.

I don't in any way mean to knock ABBYY — but the size/quality ratio of Acro 9+ClearScan 600dpi+your AppleAScript is just absurdly good. I'm frankly surprised not to see Adobe market this feature more — or Fujitsu, their OEM scanner partner.

    Chris - November 5, 2010 Reply

    I've got the script working with my ScanSnap 1300M. However, my OCR'd PDFs are much larger than yours – even though I'm starting with a lower resolution scan (at least I think I am; I have "Auto" selected in the file resolution drop-down menu in ScanSnap Manager). For example, a 2-page (front and back) document ended up being 2.5 MB. Any ideas?

    Chris - November 5, 2010 Reply

    As a matter of fact, I just noticed that the output after OCR in Acrobat 9 is actually 3 TIMES LARGER than it was before the OCR operation. What the heck?

      Brooks Duncan - November 8, 2010 Reply

      Hi Chris, do you have ClearScan selected in Acrobat? (Sorry I can't send you a screeenshot- I don't have Acrobat 9).

nodis - February 12, 2010 Reply

Great tip, but the script needs one modification. The line in the AppleScript where the document is saved should be changed to:

save the front document with linearize

This is the equivalent in scriptese as saving as "optimized" (or a "Save as…"), which results in a lot of Cruft being thrown away and the saved PDF optimized for progressive Web download. When I run the script on test OCRd PDFs, both with and without this change to the AppleScript, I get ~95% smaller PDFs with this change (note that this is also with Acrobat's new ClearScan OCR method selected — the new default).

    msim - February 12, 2010 Reply

    hi @nodis: can you point to exactly which line should be changed.
    i have not been using this script essentially because the PDFs become bloated.
    and i am unable to understand why addition of scanned text should lead to any bloating! perhaps you have found the answer!

      nodis - February 12, 2010 Reply

      Msim,

      The line that needs to be changed is towards the end:

      save the front document

      One simply adds "with linearize" at the end. As you will see below, our intrepid host already plans to make this change.

      On Adobe's Web site, linearizing a PDF is described as follows:

      "A linearized PDF document is organized to enable incremental access in a network environment. For example, a linearized PDF document can be displayed in a web browser before the entire PDF document is downloaded."

      Old versions of Acrobat (like version 3) described this same thing as "Optimizing."

      For whatever reason, "linearizing" the save from within the AppleScript causes a bunch of redundant data in the OCR-d PDF to be thrown away, resulting in a much smaller file.

        Michael - March 21, 2010 Reply

        I am trying this with Acrobat Pro 7. Getting that error message mentioned by Rodger. "System Events got an error:Can't get menu item "OCR Text Recognition" of menu "Document" of menu bar 1 of application process "Acrobat". (-1728)
        Also in the batch processing of my software it is changing the document to RTF file. Don't see option to keep it as .pdf. Please let me know what you know about either of these issues. Thank you.

        Michael - March 28, 2010 Reply

        Are these files all supposed to be getting smaller than original in Acrobat 9 with Clearscan? f so, what settings? Even with clearscan and 72 dpi downscale? I am still getting larger (50% more than original). Although not the bloated sizes before that were 4x. What am I missing here?

Phil Boardman - September 14, 2009 Reply

Hacked script to work with Acrobat 7.
http://phil.boardman.id.au/journal/489/

    Brooks Duncan - September 14, 2009 Reply

    Nice, thank you so much Phil!

      Justine - September 8, 2010 Reply

      The link to the hacked script to work with Acrobat 7 no longer works….help!

Rodger - August 8, 2009 Reply

Thanks for writing this applet and making it available, I am looking forward to using it.
I am having trouble using Adobe 7.0 professional. Dragging a pdf file on the droplet opens the file in acrobat but an error window quickly appears saying

"System Events got an error:Can't get menu item "OCR Text Recognition" of menu "Document" of menu bar 1 of application process "Acrobat". (-1728)

Could you please tell me what this means and perhaps offer some advice on how I might fix it. Would using Acrobat 9 solve my problem? I have never written/edited and applescript but I'd be glad to have a go if necessary.
Thanks

    Brooks Duncan - August 8, 2009 Reply

    Hi Rodger,
    I don't have Acrobat 7 so I can't look into it, but I am guessing that it must be something to do with differences in the menu between 7 and 8.

    Basically if you look at the script, the code needs to match the menu titles exactly.

    If you'd like, if you can send me screenshots of each step of the Acrobat 7 Ocr process, I can probably make you a 7.0 version (assuming you can script to 7).

    You can send them in an email to brooks@documentsnap.com.

pendolino - July 22, 2009 Reply

this looks great but it does not work for acrobat 7.0 standard. i tried to modify the script but it kept throwing up errors. can you please attach the script file as text? i suspect the copy paste from the site may be messing up the formatting although i cant be sure.

antony - July 16, 2009 Reply

Hi,
Thanks for the nice script.
I have a problem withe the saving command.
How can i save this file as a txt.
thanks in advance.
regards
Antony

    Brooks Duncan - July 16, 2009 Reply

    @antony
    You can just highlight the code in the post above, Copy, and then Paste it into your text editor of choice. Then just File Save As to a .txt extension if that is what you want.

kenny - July 9, 2009 Reply

hi. anybody else know why the size of the document increases by as many as 6x when you run a PDF through this script. it works great – but the size increase is substantial. unless i am wrong, only text should be added plus some location information, and therefore size should not go up by as much.

is it possible to have a version of this script where the size of the document does not super expand!?

thanks.

Brooks Duncan - June 7, 2009 Reply

Hah thanks! Glad to hear it helped out.

barbara - June 7, 2009 Reply

you rock the house. thank you so much for this. This saved my life.

Brooks Duncan - March 26, 2009 Reply

Hi Barry, you got it- universal access. Good job!

For multiple files, assuming you've saved the droplet somewhere, all you need to do is drag a bunch of files onto the icon. It'll then OCR them one by one.

Let me know how it goes!

Barry - March 26, 2009 Reply

Ok…I figured it out. I had to adjust the settings on Universal Access in my Mac. Now, how do I do multiple files at once???

Barry - March 26, 2009 Reply

I get an error message when I drag a file into the Droplet. It says "Sorry an error occurred. Access for assistive devices is disabled (-1719)" How do I get this droplet to automatically OCR my files? Thanks for your help!

Brooks Duncan - November 28, 2008 Reply

Ha, no worries at all Dave. I mostly just built on what other people had done, but glad I could help!

Dave - November 28, 2008 Reply

Man, I love you!!!!!!! I can’t tell you how long I’ve been looking for something like this. It’s just perfect!! THANK YOU! THANK YOU! THANK YOU!!!!!!!!!

Leave a Reply: