Scripting Readability and Markdownify for clipping web pages

I wanted to share a handy tool that I realized I use daily but rarely talk about. I call it Read2Text, but it’s really just a Frankenstein script which combines Python Readability (license) with html2text (license). The combination allows you to grab web pages, process them with a port of Arc90’s Readability and convert the HTML to Markdown, ready for pasting or piping to a text file.
nvALT has this built in, but it’s been a little crashy lately. I find it more reliable to just do this from the command line. If you install it in your path (both the read2text script and the “readability” folder), you can run read2text http://brettterpstra.com/keybinding-madness/ | pbcopy.
You’ll get a Markdown-ified version of the page, with links, image links, headers, code blocks and text intact, but no comments, sidebars, ads, etc. It’s not perfect, but it does a solid job and cleanup only takes me a minute, even on huge sites. I use this most of the time instead of clipping to Evernote these days.
I alias it in my .bash_profile to rtt, and often redirect the output straight to a text file in my nvALT folder: rtt http://grml.org/zsh/zsh-lovers.html > ~/Dropbox/Notes/nvALT2.1/zsh\ lovers.md
Now I have a new note that automatically shows up in nvALT with the text of the zsh-lovers page (yeah, I tried switching to zsh this morning. I’ll have to come back to that). Anyway, I thought others might find this hack of use, so I’m making the download available below.
Read2Text — A Frankenstinian combination of html2text and Python Readability. This command line tool makes clipping web pages into Markdown text without ads and comments simple. More Info
By the way, I also have a web service for this. You can get raw markdown or a nice interface for previewing and copying. There’s also an API and bookmarklets for integration into your favorite browser. Have fun!

Awesome, but not working with CE characters. :(
Thanks so much Brett this is a really incredible tool that I am really grateful for. I was searching for hours for a reliable webpage to markdown tool just a couple of weeks ago. There are a few out there, but your web version is by far the slickest and smartest. My problem with bookmarks or Instapaper or rss feeds is the lack of easy naming, filing and retrieval, but pasting into nvALT as Markdown is a great way to store things like articles or recipes.
Thanks again!
Thanks from me too — I’ve also been looking for something like this. Now if I could just find a .doc -> MMD converter, I’d be really happy.
Well, here’s a start: https://gist.github.com/1746898
@BG — you could also try saving your word docs as html files and then running them through Grabber or some other html to markdown converter.
Wow! I wasn’t expecting you to actually write something for me. Thanks and thanks to ErgoOrgo for his suggestion too.
I have just very sexistly assumed ErgoOrgo is a man…
[…] I’d share. I’m calling it “Gather,” and it’s basically an “appified” version of my Readability/Markdownify work. A Cocoa version of Marky the Markdownifier, if you will. You can paste in a URL and it will […]