16 Aug

Reading and writing Microsoft Word docx files with Python


I've been wanting to script simple text scanning and substitution in Microsoft Word documents for a while now, and after a little digging, it turns out, it's fairly straight-forward to read and edit .docx (OpenXML) or the ECMA-376 original standard, and now under ISO as ISO/IEC 29500 ...

20 Apr

Python auto sort of OCR'ed PDFs

I'd previously written about how I was using a Fujitsu ScanSnap 1500 to reduce paper clutter and move to a paperless workflow at home.  So far, this system has been working great for me, with every scanned document getting OCR'ed and uploaded to my default Evernote notebook as ...

01 Mar

Better VIM for Python

As someone who spends a large fraction of their day editing text and code, I've often thought about just investing a few days learning the more advanced time-saving features of my text-editor, VIM. Unfortunately, "a few days" just doesn't happen, and the few times I did learn some ...

02 Nov

Class-based decorators in Python

I recently started using decorators in python (2.7) to clean up some existing code, and one big hurdle I had to surmount was the dearth of accurate information on using class-based decorators. The few examples I found were quite buggy, and it seemed that most people did not use ...

25 Oct

Getting rid of paper clutter


A lot of this information is deprecated now. Please see my PyPDFOCR package for how I do everything described in this article


I've tried to go paperless for a long time now, but the overhead of starting my laptop, scanning, importing, and the filing in a folder always ...

