CONTENTS OF THIS SITE

OUR OTHER CONTENTS

RECENT BLOG ENTRIES

I Love Python: BBC Language web scrape and encode to disk in 54 lines.

April 17th, 2008 by comment gloriajw

This module scrapes the BBC language web site (http://www.bbc.co.uk/worldservice/languages/)
for sample text from all 35 languages offered. It encodes the text snippets and writes to independent files, then test-reads one sample file.

The encoding requirements took some digging through obscure docs, but the rest wasn’t so bad. If you want to know how to do unicode language support to file in Python, this is for you.

import urllib2
import codecs
import BeautifulSoup
import re
import pdb
import os

class GetBBC:
	def __init__(self):
		print "In constructor"
		self.language_links = []
		self.dir = ‘BBC_Language_pages’
		try:
			os.makedirs(self.dir)
		except OSError:
			pass

	def getLanguageChoices(self):
		lang_page = urllib2.urlopen(”http://www.bbc.co.uk/worldservice/languages/”).read()
		self.soup = BeautifulSoup.BeautifulSoup(lang_page)
		# match langtexttop too
		links = self.soup.findAll(attrs={’class’:re.compile(’^langtext*’)})
		for x in links:
			self.language_links.append(x)
			print “Appending %s with link %s ” % (x.a.string,x.a['href'])

		print “There are %d language choices for the BBC news page!” % len(self.language_links)

	def archiveLanguagePages(self):
		os.chdir(self.dir)
		for x in self.language_links:
			lang_page = urllib2.urlopen(’http://www.bbc.co.uk’ + x.a['href']).read()
			clean_page = BeautifulSoup.BeautifulSoup(lang_page).prettify()
			rawfile = codecs.open(x.a.string,’wb+’,'ISO8859-1′)
			rawfile.write(unicode(clean_page,’ISO8859-1′))
			rawfile.close()
			print “Saved the %s page.” % x.a.string
		os.chdir(’..’)

	def readLanguagePage(self,language):
		os.chdir(self.dir)
		rawfile = codecs.open(language,’rb’,'ISO8859-1′)
		file = rawfile.read()
		rawfile.close()
		os.chdir(’..’)
		return rawfile

if __name__ == “__main__”:
	x=GetBBC()
	x.getLanguageChoices()
	x.archiveLanguagePages()
	y = x.readLanguagePage(’Portuguese’)

There are languages for which ISO8859-1 encoding may not work, so you may need to experiment with encoding codecs for languages not supported by the BBC.

I wrote this in May 2007, as a language support test for GrrlCamp, which is an online Open Source development group for women. We will be recruiting again in late June. If you are female, interested in volunteering development effort in exchange for learning, and have at least 6 hours free each week to do cutting edge fun Python design and development in a supportive and great online community, please post your email address and we will get back to you.

Gloria

The unmodified code

ˆ Back to top

Programming from the (under)ground up

January 5th, 2008 by comment lisa

Hello. Welcome to my first article.

And my brand spankin new, made-from-scratch stab at programming. It’s going to be a bumpy ride: bumpy like fun-old-rollercoaster-bumpy not trainwreck-bumpy (universe willing).

Please allow me to rattle off some quick background facts so you know what planet I’m coming from. I’m a 26 year old retired bartender. I did that for more years than I care to say (ok fine, 8). I fancy myself an amateur artist; basically, I paint for therapy and fun. I’ve always liked things of a nerdy nature (i.e. writing very basic html in a webshell on angelfire when I was 13, Magic the Gathering, guys who majored in Astrophysics, etc). I consider myself very confident and intelligent, and it’s a shame that went to waste for so many years. That being said, years of bartending with no substantial plans for the future wore me out and made me feel quite desperate for awhile.

Then something changed. I got beat down so much by the universe’s way of telling me to stop f’ing around, that I got fed up with being fed up. Well, Desi McAdam happens to be one of my favorite people on the planet and a very close friend, and she had always offered to teach me programming…intensively. She and my other longtime friend/ROR evangelist Obie Fernandez had always told me they thought I’d be a great programmer. I didn’t know what they were talking about. So I called up my dear Desi and said “I’ll do whatever it takes. Let’s do this thing.”

I thought I was going to be learning in my off time while still bartending and getting tutored whenever Desi was in town. I knew this would take a very, very, very long time, but I felt ready for the challenge.

As it turned out, she and Obie were down here in Florida on the beach working with this fabulous guy Mark Smith. I had met him some weeks before, and we all had a great time together. They wanted to bring an apprentice on to the small team, so voila! Here I am. I am now in full on training starting with nothing but my instinctual and intellectual abilities and no experience. I am extremely grateful for the opportunity I have, and I intend to give back to Desi and Obie by trying hard to be a bad ass programmer.

Desi is putting alot of effort into being my personal, full-time tutor, and I think she rocks socks for it.

So I’m offering up myself, my victories, and my many future foibles here for your musing and amusement.

Cheers and enjoy

ˆ Back to top