Meet Edward / Blog update

Published: 2013-01-18
Tagged: django

I was contacted by a trusted personage about an interesting project. I quickly understood the urgency of the matter - an online community wanting to be set free from the shackles of an authoritarian forum hosting service. At least two other people tried to help but didn't get far - there is no way to export the thousands upon thousands of posts, which are too valuable to simply abandon. Completing this project would require the ability to somehow move these posts without database access and setup a new community on the bedrock of freedom - private hosting. All of this is a bit above my current skill level.

How could I possibly refuse?

My first step is to build an automaton, who will scrape the existing forums of all those posts. His name is Edward. He works by using lxml to parse pages and Requests to handle communicating with the server. He's still not complete - all he can do now is produce a list of dictionaries, where each dictionary is a post. But in the near future he will be able to traverse the whole forum, saving the content, throwing it into SQLite. A little further along in time he will also be able to use that database to repopulate the new forum.

So far work is coming along handsomely, thanks to some previous web-scraping experience as well Using SQLite by Jay A. Kreibich.

Blog update

After deploying this blog yesterday, I already see a number of improvements to be made such as:

Looks like an awesome opportunity to learn!

Hi, I'm Matt.

This blog is an unordered set of thoughts extracted from the mind of a software developer.

About Me PGP key

Archives  Feed  The Photolog!  t: pr0tagon1st