Friday, December 5, 2014

Serializing that convoluted cookielib.CookieJar


The Python cookielib.CookieJar object is a very convenient feature to manage cookies automatically as you traverse a series of Http web requests back and forth. However, the data structure of the class is a convoluted collection of Python dict.

cookielib.CookieJar has a _cookies property which is a dictionary of a dictionary of a dictionary of cookielib.Cookie.

To understand the data structure in the CookieJar object cj, try:

for domain in cj._cookies.keys():
   for path in cj._cookies[domain]:
     for name in cj._cookies[domain][path]:
       cookie = cj._cookies[domain][path][name]
       print domain, path, cookie.name, '=', cookie.value

However, the class-defined __iter__ method makes the above effort unnecessary if you just want to find the value of a cookie. The __iter__ method returns a cookielib.Cookie object for each iteration. You can simply go:

for cookie in cj:
    print cookie.domain, cookie.path, cookie.name, cookie.value # etc

If you want your CookieJar to persist in a file that can later be read back to create a Cookiejar object, the following two methods should work. They require the cPickle and base64 modules.

import cPicker, base64

def writeCookieJarFile(cj, cookieJarFile):
    f = open(cookieJarFile,'w')
    for domain in cj._cookies.keys():
        serialized = cPickle.dumps(cj._cookies[domain])
        f.write(base64.b64encode(serialized)+'\n')
    f.close()

def readCookieJarFile(cookieJarFile):
    cj = cookielib.CookieJar()
    try:
        with open(cookieJarFile,'r') as f: text=f.read()
    except Exception as exception:
        print "readCookieJarFile: %s" % exception
        return
    lines = text.split('\n')
    for line in lines:
        if line=='': continue
        cookieObject = cPickle.loads(base64.b64decode(line))
        firstCookie = cookieObject[cookieObject.keys()[0]].keys()[0]
        domain = cookieObject[cookieObject.keys()[0]][firstCookie].domain
        cj._cookies[domain] = cookieObject
    return cj

Note that cookieObject in the read method above is not a cookielib.Cookie object. It is a dictionary (keyed by domain) of a dictionary (keyed by path) of a dictionary of Cookie (keyed by name).


No comments: