Posted by virantha on Thu 25 October 2012

Getting rid of paper clutter

Note

A lot of this information is deprecated now. Please see my PyPDFOCR package for how I do everything described in this article

skip_better Picture

I've tried to go paperless for a long time now, but the overhead of starting my laptop, scanning, importing, and the filing in a folder always made me go back to just keeping stacks of paper around. But now I've finally found a solution that works well enough that I haven't abandoned it after a few months.  The key is a stand-alone scanner that does OCR, converts to PDF, and then uploads to Evernote automatically. Details below:

Hardware required

  1. Fujitsu ScanSnap 1500 - This was the game-changer; it's a little pricey but after having gone through a bunch of inferior flat-bed and Neatworks scanners, this stand-alone scanner really makes things simple.
  2. Dedicated PC running on your network -  You could just use your laptop, but  I have a bunch of different desktops running file servers etc that makes things simpler

 Software required

  1. Evernote - I'm a big fan of this service, and use it for all my notes/receipts/documents/lists.  Get the premium service, it's definitely worth it!
  2. Abby FineReader for OCR (this ships with the Fujitsu ScanSnap)
  3. Download a program that can watch a folder for file changes and run a script.  Since the server I'm using at home runs Windows, I use Watch 4 Folder, but this should be much easier on Mac OS X.

Workflow Scripts

  • Setup up a profile on your ScanSnap image scanner that scans to Abby FineReader (OCR that comes with the software) and does a searchable PDF, that writes the file to a specific folder which I'll refer to as "Incoming".  The raw scan will first show up as "YYYY_MM_DD_HH_MM_SS.pdf", and once Abby finishes the OCR in the background, it will replace it with "YYYY_MM_DD_HH_MM_SS_OCR.pdf"
  • Setup the following batch file "move.bat".  This will watch the "Incoming" folder for any file ending with "_OCR.pdf", wait 5 minutes, and then copy it to a folder called "To evernote"
[ccw lang="dos" width="100%" strict="true"]
set noext=%file:~1,-9%
set ocr="%noext%_OCR.pdf"
IF EXIST %ocr% (
echo "Found %ocr%! Waiting 5 minutes before doing anything"
PING 1.1.1.1 -n 1 -w 300000 >NUL
move %ocr% "c:\users\virantha\Documents\ScanSnap\To evernote"
)
exit
[/cc]
  • Since we want multiple instances of this batch file running, we need to create another batch file "start.bat" to invoke this as a process:
  • [cc lang="dos"] start c:\Users\virantha\Documents\move.bat %1 [/cc]
  • Configure Watch 4 Folder to run at startup minimized, and monitor your "Incoming" folder for any changes, and then execute "start.bat"
  • Configure Evernote "Tools -> Import Folders" and add your "To evernote" directory to the list of folders Evernote watches for file imports.
  • You're done!  One press on your scanner, and your OCR'ed PDF documents will arrive in your Evernote default notebook in a few minutes.  You can then use evernote search to find whatever you need at a later date even if you don't bother to file these new notes.

© Virantha Ekanayake. Built using Pelican. Modified svbhack theme, based on theme by Carey Metcalfe