I’m working on a Django app which is able to serve content on a number of subdomains. The app has a number of sites, which appear as subdomains of the main domain. There’s some middleware to look things up from the request and do the right routing.
In the wild the subdomains will be done with DNS, but for local development, I’m creating entries in my /etc/hosts such as demosite.local, using .local as my ‘main domain’ locally. After a colleague integrated some authentication code, I suddenly found I couldn’t log in on my development environment. It didn’t work with either the custom login screen or the Django admin. Very odd.
A tale of two sites
Our staging version, running on a real (virtual) server with real DNS, still worked though. I examined the staging version and compared it to my local development environment. I looked at the changelists, inspected the installed code, diffed, re-checked out and reloaded the Django process. It was exactly the same code running.
On the admin login screen, I was being redirected back to the login screen, on our own login screen I was being redirected to the ‘correct’ redirect location that you would expect if you’d authenticated properly. It appeared that the authentication was working server side at least. I verified that I was logging in with the correct credentials by deliberately putting in the incorrect ones and confirming that the form complained.
Poking the WSGIRequest
Nonetheless, my first thought was that it must be an authentication issue, to do with bcrypt or the authentication back end. I stuck a debugger in some middleware that we had and I noticed something very strange.
MY_PROJECT/MY_APP/middleware.py(188)process_request()
print request.POST
(Pdb) from django.contrib.auth import authenticate
(Pdb) admin = authenticate(username="admin-username", password="admin-password")
(Pdb) print admin
(Pdb) admin2 = login(request, admin)
(Pdb) print admin2
(Pdb) admin2.is_authenticated()
True
(Pdb) c
[29/May/2012 19:06:37] "POST /backoffice/admin/ HTTP/1.1" 200 14003 (time: 127.68s; sql: 46ms (15q))
MY_PROJECT/MY_APP/middleware.py(188)process_request()
(Pdb) print request.user
Even though I was logging in manually, request.user was still being set to AnonymousUser after the redirect.
I ran through a quick mental check-list:
- I created a whole new user account in case something was funny with that user.
- I reset the password for the user.
- I blitzed the django install from virtualenv and reinstalled in case I’d done something untoward.
- I emptied the django_session table.
Still no dice.
Someone else’s problem?
Getting desperate, I put breakpoints in the Django code. I checked that that django.contrib.admin.forms.AdminAuthenticationForm was indeed calling authenticate to get the right user. I checked that django.contrib.auth.views.login called django.contrib.auth.login. Everything checked out.
Last port of call was the browser. I looked at the cookies. Lo and behold, the session cookie was set for the staging server but not for the local server.
I was thrown off-course by a totally unrelated Chrome bug which suggested that the cookies file was corrupt:
That ate a few minutes as I quit the browser and ditched the ~/Library/Application Support/Google/Chrome/Default/cookies file. There are a lot of bug reports in the Chrome forums and my problem (with its limited specificity of symptoms) seemed to fit a number of other, similarly vague reported problems. I read through a number of these posts, without success.
I tried other browsers and they did the same thing. If this was a Chrome bug it was a WebKit problem too. That was more unlikely.
Having ruled out a bug in Chrome, I compared the headers coming back from the server for the authentication endpoint.
The session cookie was being set correctly, by the looks of things.
But so was the local version. Curiouser and curiouser.
On a diet
It appeared that Chrome was being offered cookies by both sites, but declining them from the localhost machine. I couldn’t quite see why it would reject them, they appeared to be identical. A bit of inspection showed that Chrome was indeed rejecting the session cookie from localhost, where it was accepting it from the staging server. Strangely, the CSRF token was ok in both cases.
Staging server:
Local:
So what was it about the CSRF token that worked and the sessionid token that didn’t? On closer inspection, it appears that the domain for the CSRF token is the specific domain of the URL: demo-site.local. Looking at the headers, the domain that was set for the sessionid was wider: .local. If I hadn’t blanked out the domain for the staging site, you’d see that the same pattern followed: the CSRF domain was mysite.staging-domain.com and the sessionid was for .staging-domain.com.
From this point it was a short hop and a Google to see what was up. I quickly found this question on StackOverflow in which Ralph Buchfelder helpfully pointed out that
by design domain names must have at least two dots otherwise browser will say they are invalid
Secret binges
So, because the domain I was using was demosite.local rather than demosite.something.local, Chrome was silently eating my cookies but not accepting and sending them back. This also explained why the user seemed to evaporate almost instantaneously from the request session after the 302 redirect.
Looking into the merge, I saw that it had introduced a value for settings.SESSION_COOKIE_DOMAIN (which is usually None by default) in order to achieve cross-domain authentication between the subdomains. The value of settings.SESSION_COOKIE_DOMAIN was set to be the main domain (.staging-domain.com for the staging server, .local for local development). Before the merge it was one session cookie per domain, which meant re-authenticating between subdomains.
I changed my development domain to use two dots and it worked.
So thank you Chrome for silently failing and not flagging up or even logging the error. It would be remiss of me not to mention that other browsers did the same thing, but I use Chrome for development.
And I’m sorry Django. I apologise for all the breakpoints and the doubt.
The Moral of the Story
Is if you’re developing on localhost, and you’re setting your SESSION_COOKIE_DOMAIN to anything other than None (which I now realise was introduced in the new feature integration), make sure you put two dots in it.
And I now realise that _django.contrib.sessions.middleware.SessionMiddleware _respects SESSION_COOKIE_DOMAIN whereas django.middleware.csrf.CsrfResponseMiddleware respects CSRF_COOKIE_DOMAIN.
It seems obvious when you put it like that.