I have always had a problem with this xkcd because most sites would disable your password after a number of bad guesses (making it not really matter that much), and additionally I don't see where 2^44 comes from. Taking the average word length selected and then computing the entropy using the average word length 4 times? Why not use the number of potential dictionary words and raise that to the 4th?
So we get a better power of 2 and a more accurate estimate. Granted one could filter the word list down to words that people actually know (which exceptional people normally know about 75,000 but most people know only 50,000[1]).
echo "l(75000^4)/l(2)"|bc -l
64.77841190063187168389
echo "l(50000^4)/l(2)"|bc -l
62.43856189774724695805
So I guess it's still better but it seems like a pretty big oversimplification to assume a length of word (especially a uniform one) and it shouldn't be that much harder to calculate the actual value. Maybe I did something wrong, I don't know.
True, account should be disabled after x number of bad guesses. But securing against a "brute force" attack for me is more a case of cracking hashes from a db dump. It's easy to churn though vast numbers of hashes in no time at all these days. Here things like hashing algorithm speed, number of iterations, unique salting and original password length all have a part to play.
Online bruteforcing is not feasible except for extremely weak passwords or sites with security vulnerabilities. What is feasible is bruteforcing stolen password hashes (think Adobe leak) or say a hard drive encrypted with a memorable password, and this is where secure passphrases come into play.
So I guess it's still better but it seems like a pretty big oversimplification to assume a length of word (especially a uniform one) and it shouldn't be that much harder to calculate the actual value. Maybe I did something wrong, I don't know.
[1] http://news.bbc.co.uk/2/hi/uk_news/magazine/8013859.stm