Python 3 is making great steps towrd easy concurrency, and some of those have been backported into python 2.7. The concurrent.futures module is available after you `pip install futures`. This package brings very convinient methods for doing threading (ThreadPool) or multiprocessing (ProcessPool).
Threads are useful when the code is blocked by non bytecode execution, such as I/O or external process execution (C code, system calls, etc). If byte code execution is holding things up, the ProcessPool starts multiple interpreters that can execute in parallel. However, there is more overhead in spinning up these interpreters and in them communicating with the main thread through serialized representations (basically pickle or json over a socket if I understand correctly).
Here is an example with requests, which is I/O bound:
import requests from concurrent.futures import ThreadPoolExecutor, wait, as_completed from time import time urls = ['google.com','cnn.com','reddit.com','imgur.com','yahoo.com'] urls = ["http://"+url for url in urls] # Time requests running synchronously then = time() sync_results = map(requests.get, urls) print "Synchronous done in %s" % (time()-then) # Time requests running in threads then = time() pool = ThreadPoolExecutor(len(urls)) # for many urls, this should probably be capped at some value. futures = [pool.submit(requests.get,url) for url in urls] results = [r.result() for r in as_completed(futures)] print "Threadpool done in %s" % (time()-then)
Synchronous done in 46.8979928493
Threadpool done in 14.2200219631
With a longer list of urls, These numbers are:
Synchronous done in 164.506973982
Threadpool done in 16.3909759521
Since the synchronous code takes the sum of all of the request times to complete, and the threadpool just takes the time of the longest request.