Just the text please!

What is this?

This site attempts to extract the relevant text from a webpage. It doesn't always work, but when it does, it can display the results in Markdown, or plain HTML.

Why?

I wanted to be able to read articles easily from GNU Emacs without having to load up a browser such as w3m. I wrote a simple mode, available on github, which uses this app to pull the article text and show it in a buffer. That mode is certainly still evolving, but it's usable now.

You said some pages will not work, what else should I know?

There is some rate limiting in place, by IP address, which amounts to about 20 unique URLs per hour. However, if someone else requests the same URL, and it's still cached, it doesn't go against your rate limiting.

To get HTML, just change /text/extract.md to /text/extract.html.

Who's behind this?

Andrew Gwozdziewycz is. Though, it uses libraries written by other people, such as html2text.py, and python-readability.