0%

Python多线程 vs gevent

之前一直没了解过gevent的知识,最近在研究Flask性能提升的时候看到了gevent,正好顺带了解一下。为什么不是Twisted?Twisted是依靠回调实现的异步,不是我喜欢的类型。

先做一个简单的性能测试:

a.py

import sys
import requests
import urllib2
from timeit import Timer
import threading as thread

urls = [
    'http://www.douban.com',
] * 5

def call_back(resp):
    content = resp.content
    title = content.split('')[1].split('')[0].strip()
    return title

def by_requests():
    for u in urls:
        call_back(requests.get(u))

class doWorker(thread.Thread):
    def __init__(self, url, use_urllib2):
        thread.Thread.__init__(self)
        self.url = url
        self.use_urllib2 = use_urllib2

    def run(self):
        if self.use_urllib2:
            content = urllib2.urlopen(self.url).read().lower()
            title = content.split('')[1].split('')[0].strip()
        else:
            resp = requests.get(self.url)
            call_back(resp)

def by_mutiRequests():
    threads = []
    for url in urls:
        threads.append(doWorker(url, False))
    for t in threads:
        t.start()
    for t in threads:
        t.join()

def by_mutiUrllib2():
    threads = []
    for url in urls:
        threads.append(doWorker(url, True))
    for t in threads:
        t.start()
    for t in threads:
        t.join()

if __name__ == '__main__':
    t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")
    print 'by requests: %s seconds' % t.timeit(number=3)
    t = Timer(stmt="by_mutiRequests()", setup="from __main__ import by_mutiRequests")
    print 'by mutiRequests: %s seconds' % t.timeit(number=3)
    t = Timer(stmt="by_mutiUrllib2()", setup="from __main__ import by_mutiUrllib2")
    print 'by mutiUrllib2: %s seconds' % t.timeit(number=3)

b.py

import sys

import gevent
from gevent import monkey

monkey.patch_all()

import grequests
import urllib2

def call_back(resp):
    content = resp.content
    title = content.split('')[1].split('')[0].strip()
    return title

def worker(url, use_urllib2=False):
    if use_urllib2:
        content = urllib2.urlopen(url).read().lower()
        title = content.split('')[1].split('')[0].strip()

    else:
        rs = [grequests.get(u) for u in url]
        resps = grequests.map(rs)
        for resp in resps:
            call_back(resp) 

urls = ['http://www.douban.com']*5

def by_requests():
    worker(urls)
def by_urllib2():
    jobs = [gevent.spawn(worker, url, True) for url in urls]
    gevent.joinall(jobs)

if __name__=='__main__':
    from timeit import Timer
    t = Timer(stmt="by_requests()", setup="from __main__ import by_requests")
    print 'by grequests: %s seconds'%t.timeit(number=3)
    t = Timer(stmt="by_urllib2()", setup="from __main__ import by_urllib2")
    print 'by gurllib2: %s seconds'%t.timeit(number=3)
    sys.exit(0)

执行结果如下:

$python a.py

by requests: 2.89695906639 seconds

by mutiRequests: 1.29679989815 seconds

by mutiUrllib2: 4.26026391983 seconds

$python b.py

by grequests: 1.18827795982 seconds

by gurllib2: 2.52291297913 seconds
看起来是有略微的requests请求提升了一点,这个原因大概是requests本身就对代码进行了大量有话导致的,但是看urllib2的话,实际上还是有一定的性能提升的。

gevent若是说缺点的话,目前我的感觉如下:

  • 基于libev/libevent实现,无法运行在Windows平台。相对来说,Twisted就是跨平台的;Windows版本可以点这里下载:gevent1.0
  • 虽然是Cypthon实现的,但是其实性能不见得比Tornado好一些;
  • Twisted相对内置了很多协议,gevent现在还是差一点;太多了的Twisted也是负担
  • gevent一直没有1.0的正式发布,老版本还是存在Bug的,现在只能用1.0rc。1.0发布了,小伙伴们快上吧!