Parallel Python Review: In a Nutshell: Wow!


PythonThoughts

A colleague came to me about a month ago and asked: “Have you seen this?” “No, but it looks great, if it really works the way it claims.” I replied. We were ogling this link, with expressions of “Ooo”, “Ahhh”, and “Wow”:

http://www.parallelpython.com/

He installed it on his XP desktop, I installed it on my Fedora Core 6 Desktop, and on our Linux test server with eight CPUs. The installation in an existing Python distribution took five minutes. Reading the daemon instructions and starting Parallel Python daemons on all of these machines took another three minutes.

We modified a test example, a distributed sum of primes, to run for many minutes, giving it a huge upper limit. On the Linux machines, running the top command, then pressing the number 1 reveals the activity on all CPUs. We started the modified test example, closely watching top and the task table on XP. We saw an even distribution of jobs across machines, and a consistently even balance amongst all CPUs. Parallel Python is written so well that it balances jobs according to a machine’s capacity and speed.

I decided to integrate it into my web scraper project. I need to spawn jobs to scrape different pages in parallel. The problem I had was mapping my OO model to the existing procedural examples. I have a scraper class, and I want Parallel Python to run it’s run() method, and see it’s self variables.

Looking for help, I find a help forum on the Parallel Python web site. Not a bad start. I post my question, and get a response within 24 to 48 hours from the most helpful geek I have ever dealt with.

Vitalii Vanovschi, the author of the examples, and probably also the author of most of the Parallel Python code, offers damned near flawless, quick, and ego-free assistance with this software. To answer my question, he rewrote an example ‘on the fly’, to show how classes can be used in Parallel Python. His responses to everyone are consistently respectful, concise, and helpful, with a strong undertone of of the genuine love for the Python language and it’s capabilities. This is the best tech support I have ever gotten for any product, EVER.

To top it off, there is a link on his main page, asking people to talk about what drew people to Python. This adds a ‘Ruby-like’ social element to his web site which is still unfortunately rare in the Python realm. I was thrilled to find some social aspect to this web site and it’s subject matter. It is refreshing to be able to talk about the technical and social aspects of the language and the tools, simultaneously, with the same crowd of people. Rubyists seem to understand this, and Vitalii’s hope to bring this aspect into the Python world does not go unnoticed.

Vitalli has brought so many good things together in one application, on one web site. He has created a simple, easy-to-navigate web interface with related forums, great documentation and examples. The software itself is robust, well-written and so easy to use that one can integrate it into an existing project very quickly.

This project is an excellent example of good software, good presentation, good tech support, and good social support. I think Python programmers can learn a lot from this example. My hopes are that more Python project developers make better attempts at the social aspect of their projects, asking people to share their experiences with the project and the related languages in an ego-free environment. Maybe user communities will form from these environments. And maybe, just maybe, Python programmers can one day experience the sense of community that Ruby developers already enjoy.

Vitalli, kudos, and thank you for such a great, well-rounded project. Thank you for setting the the technical, social and presentation precedents. May all new projects follow this model.


11 Responses to “Parallel Python Review: In a Nutshell: Wow!”

  1. Vitalii

    Gloria,

    You’ve made a very relevant comparison between Python and Ruby communities. It’s a valuable insight for all software developers.
    Thanks to your article now there is an independent confirmation of what I thought are essential components of Parallel Python project.
    It is nice to see that this and many other articles on devchix motivate people to create better development communities.

    Thank you for this excellent review!

    Best regards,
    Vitalii

    Reply
  2. Alexander Botero-Lowry

    Wow. I like the way this looks. Usually parallel processing packages in any language seem to try to copy Erlang. Sometimes they do so in a way that fits into the language, but other times they try to jam Erlang in so hard that they totally lose some of the beauty of the language they’re working in. I’ve seen one guy that even hacked stuff up to overload ‘!’ which is used for message sending in Erlang into Python.

    Reply
  3. gloriajw

    True, Python does not have a unary operator “!”. But it sounds like Alexander was talking about overloading the “!” operator in erlang to write back to Python.

    Reply
  4. Doug

    Very nice review, and compliments what I was thinking of Parallel Python as I was reading the site. I was wondering if you could post the class example that you mention Vitalii coded for you. I prefer to work in OO and would like to use PP in that manner. :)

    Thanks!
    Doug

    Reply
  5. gloriajw

    yes, it can be found on Vitalli’s Forum:

    http://www.parallelpython.com/component/option,com_smf/Itemid,29/topic,45.0

    and also appears below:

    #!/usr/bin/env python
    # File: callback-new.py
    # Author: Vitalii Vanovschi
    # Desc: This program demonstrates parallel computations with pp module
    # using callbacks (available since pp 1.3).
    # Program calculates the partial sum 1-1/2+1/3-1/4+1/5-1/6+… (in the limit it is ln(2))
    # Parallel Python Software: http://www.parallelpython.com

    import math, time, thread, sys
    import pp

    #class for callbacks
    class Sum:
    def __init__(self):
    self.value = 0.0
    self.lock = thread.allocate_lock()
    self.count = 0

    #the callback function
    def add(self, value):
    # we must use lock here because += is not atomic
    self.count += 1
    self.lock.acquire()
    self.value += value
    self.lock.release()

    #class which instances will be passed to the ppworkers
    class Eval:
    def __init__(self, seed):
    self.seed = seed

    def part_sum(self, start, end):
    “”"Calculates partial sum”"”
    sum = self.seed
    for x in xrange(start, end):
    if x % 2 == 0:
    sum -= 1.0 / x
    else:
    sum += 1.0 / x
    return sum

    print “”"Usage: python callback.py [ncpus]
    [ncpus] – the number of workers to run in parallel,
    if omitted it will be set to the number of processors in the system
    “”"

    start = 1
    end = 20000000

    # Divide the task into 128 subtasks
    parts = 128
    step = (end – start) / parts + 1

    # tuple of all parallel python servers to connect with
    ppservers = ()
    #ppservers = (“10.0.0.1″,)

    if len(sys.argv) > 1:
    ncpus = int(sys.argv[1])
    # Creates jobserver with ncpus workers
    job_server = pp.Server(ncpus, ppservers=ppservers)
    else:
    # Creates jobserver with automatically detected number of workers
    job_server = pp.Server(ppservers=ppservers)

    print “Starting pp with”, job_server.get_ncpus(), “workers”

    # Create anm instance of callback class
    sum = Sum()
    eval = Eval(0)

    # Execute the same task with different amount of active workers and measure the time
    start_time = time.time()
    for index in xrange(parts):
    starti = start+index*step
    endi = min(start+(index+1)*step, end)
    # Submit a job which will calculate partial sum
    # part_sum – the function
    # (starti, endi) – tuple with arguments for part_sum
    # callback=sum.add – callback function

    job_server.submit(eval.part_sum, (starti, endi), callback=sum.add)
    #

    Reply

Leave a Reply