501 million 'Pwned Passwords'

'Password strength' is a phrase that almost everyone will be familiar with to some degree. There is no universally agreed upon way to measure the strength of a password (this is somewhat part of a wider problem), which leads to every online service having a different system for gauging whether or not a given password is secure enough to be used. Normally, when discussing password security, the conversation will look something like this:

"Make all of your passwords at least 16 characters long and never use them more than once".

I am going to take a different approach: I'm going to show you how not to protect yourself online, how to crack weak passwords with little to no effort, and how to check whether a password is already compromised before you use it online. With any luck, you'll be a little wiser about password security by the end of this post.

To achieve all of this I'm going to use data from the recently released "Pwned Passwords" list, which immediately raises a valid question - what on earth are "pwned passwords"?

"Password1", if you were wondering.

What are pwned passwords?

Pwned (pronounced 'Owned') passwords are passwords used 'in the wild' that have been compromised by some method. Generally these passwords are compromised via data breaches, along with other subsets of data including email addresses and personal details. When a password is compromised in this manner it is reasonable to assume that it is no longer secure, and any accounts protected by the same password are equally at risk of being compromised.

Where did the data come from?

The data is taken from the recently released "Pwned Passwords" list available from https://haveibeenpwned.com/. Pwned Passwords is a collection of 501 million unique passwords collected from multiple data breaches over a number of years. This information represents known data breaches where compromised user data was made publicly available.

Are the passwords in clear text?

No. Each password is hashed with SHA-1, followed by a number indicating how many times it has been compromised.

For example:

7C4A8D09CA3762AF61E59520943DC26494F8941B:20760336

This is the SHA-1 output of the password '123456' which has been compromised 20,760,336 times in known data breaches.

Why is this data so valuable?

The data is valuable for a few reasons. As already mentioned, it is collected from multiple data breaches involving actively used passwords in the wild. This means each password seen on this list is actively used somewhere online. Secondly, the data itself has been sorted to only include unique entries, ordered by most frequently compromised to least frequently compromised. This means less noise in the data and a better understanding of how people are trying to protect themselves online. The opportunity to learn from data taken directly from data breaches is not often presented, so any chance to do so should be taken!

What can be learnt from this data?

A lot! The most obvious lesson to be learnt is "how not to protect your online presence". The structure of the passwords seen on this list should serve as a guideline on what not to do (after all, if a cracked password is on this list it has already been compromised) when securing your online presence. There are other lessons to be learnt as well; why password reuse is so dangerous, how password cracking works, and why it is so important to have this data readily available in the first place.

What do you do with 501 million SHA-1 hashes?

Crack them with Hashcat! Well, some of them.

Trying to crack all 501 million hashes is simply not practical within my lab environment. Taking a more scientific approach and working with controlled samples of the data is much more practical and still allows me to reach educated, reasonable conclusions based on the results.
There are two data sets I will be working with for this post. The first data set is a simple 'top 100' sample. The second data set consists of 1000 hashes generated by sampling the first 10 million entries. For each 'block' of 1 million passwords, up to and including 10 million, I extracted a sample of 100 hashes.

#M Times seen #H

1M 185 times 100

2M 93 times 100

3M 65 times 100

4M 51 times 100

5M 42 times 100

6M 35 times 100

7M 31 times 100

8M 27 times 100

9M 24 times 100

10M 22 times 100

The #M field represents each 'block' of 1 million passwords up to the 10th million block, while times seen is a count of how frequently the hashes taken from that block were seen in data breaches. #H is a count of how many password hashes were copied to use as sample data for that block.

I decided to sample up to the 10th million block for two reasons.
The first reason is password frequency; beyond 10 million the frequency of seen passwords takes a very slow but steady decline. The overall objective of this post is to learn from the most common mistakes made when using passwords online so it made little sense to spend time on data less frequent than that seen at 10 million.
The second reason is resources; while working in a lab with limited GPU capacity (a single Nvidea 900 series card) it is only possible to work with word lists, and unfortunately not possible to brute force any passwords. By way of example, brute forcing all possible 8 character passwords using only lower case letters would take upwards of 30 days.

Word-lists, word-lists, and more word-lists...

There are a number of word lists I'll be using with Hashcat in this post. The majority of these lists are relatively generic and can be found online within minutes of searching, but one is of particular interest to me - 'RockYou'.

The (in)famous RockYou word list was created using clear text passwords obtained in the December 2009 hack of the RockYou! website. It was the largest collection of 'in the wild' passwords for its time. The RockYou! data breach is not tracked by "Have I been pwned", so the passwords used in the Rockyou word list are not already in the Pwned Passwords list by default. This means we can safely rule out the possibility of false positives and can assume any hashes that are cracked by Rockyou are due to legitimate password reuse and persistently bad password habits.

Worth noting is that the RockYou word list is also included by default in every distribution of Kali Linux, making it extremely easy to access.

The top 100

Unsalted SHA-1 hashes, soon to be clear text passwords.

A quick note about exactly what is about to happen. The image above shows several hashed passwords. A hashed password is a clear text word, such as "password", that has had a well known, irreversible mathematical algorithm applied to it which generates a 'hash'. Since the mathematical algorithm is irreversible, we can't replay the same formula against the hash to find out its clear text value. This irreversible property is one reason hashes are so secure.

So how does an attacker turn a hash into a clear text word? By guessing.
While there are many different hashing algorithms (SHA-1, SHA-256, MD5, to name a few) each generates a slightly different output. These differences make it possible for attackers to take an educated guess at what algorithm was used.
The attacker then takes a clear text word that they believe might be the password, hashes it with the chosen algorithm, and compares the output of their hash to the hash they are trying to crack. If the hash matches exactly than the attacker has successfully guessed the password. If the hash is not an exact match than the attacker repeats the process with a new word.

This process is automated by tools like Hashcat. Hashcat is free to download, simple to use, compatible with Windows and Linux, and uses a computers GPU to crack passwords at astonishing speeds. https://hashcat.net/hashcat/

Running Hashcat with the Rockyou word list is achieved by specifying a word list attack (-a 0) on SHA-1 hashes (-m 100) against the specified hashes (E:\location\path) using the selected word list (.\Rockyou.txt):

The results speak for themselves; 92 out of 100 passwords were cracked in 3 seconds. This result is certainly impressive for a word list that is 9 years old. We can reasonably assume that in the course of 9 years since the RockYou! data breach our collective password habits have not changed much.

The clear text passwords do not paint a pretty picture. While looking at this data keep in mind that all of these passwords have been compromised multiple times in online data breaches. The data here is actively being used in the wild.

The top 1000

Things start to improve as we work through the data into the less frequently seen passwords. Keep in mind that these 1000 hashes have been selected by sampling data in the 1 to 10 million range. None of the previously seen passwords from the top 100 list are present in this test. The Rockyou word list is the first to be used on this data:

The Rockyou word list successfully cracks 26.73% of the hashes seen in the top 1000 list. Looking at the clear text data reveals an interesting difference in the structure of passwords being used. When compared with the clear text results from the top 100, we see more passwords using a combination of letters and numerals. 9 of the 270 cracked passwords used an uppercase letter, a single password used a special symbol and basic letter substitution is used in 4 passwords.

Rockyou remains a formidable word list, even if only due to the fact our collective password habits have not significantly improved since 2009. Keeping in mind that the 270 cracked passwords have all been seen in the wild multiple times (anywhere between 185 and 22 times), there is the very real chance that each password could lead to multiple account compromises across a number of services.

But what of the remaining hashes yet to crack? Running more word lists is still a viable option at this point. Four additional word lists, containing a combined total of 16 million passwords, bring the total of cracked passwords to 34.95%. Another sample of clear text data reveals that the majority of newly cracked passwords still rarely use uppercase/lowercase letters together, have no special symbols present, and use basic letter substitution at best:

There are still plenty of hashes yet to crack. I will focus my attention on the final 50 hashes in the top 1000 list. These hashes have all been taken from the 10 million range and should be relatively difficult to crack given their low compromised frequency.
It is also time to move on from word lists and start making use of several free online services that specialise in cracking password hashes.

The first online service used is Hash Killer (https://hashkiller.co.uk/sha1-decrypter.aspx [please note this site does ask if it can use your CPU for crypto mining upon visiting it]), a site containing over 312 billion unique, cracked SHA-1 hashes. Utilising this service reveals an additional 18 passwords.

While seen much less frequently, these passwords are not necessarily any more secure than the earlier examples. Their lower frequency indicates less password reuse as opposed to any additional password hardening measures. Across the entire sample of 18 newly cracked hashes only lowercase letters and numerals were used. No uppercase letters, special symbols, or letter substitutions were observed.

The remaining 32 hashes were checked against the database of cracked SHA-1 hashes maintained by Crack Station (https://crackstation.net/ [no crypto mining on this site]). An additional 8 hashes were cracked and again the overall password strength was low (lowercase letters and numerals at best). The lower frequency again indicates less password reuse overall and does not indicate any additional password hardening.

The final 24 hashes were unable to be cracked across various other online services. This may come as a surprise given the sheer volume of unique, cracked hashes stored in various online databases. The reality is that casting a wide net only reveals the weakest or most common passwords. To crack the remaining hashes would require the use of word lists with much more specialised, situation specific passwords as opposed to the most commonly seen/compromised. Brute forcing the remaining hashes is also a viable option if the resources to do so are available.

Lessons learned

The data seems to indicate that lower frequency does not necessarily make the remaining passwords any more secure than those that have already been cracked. Passwords with a lower compromised frequency may still eventually be cracked if their overall password strength is trivial at best. By considering the format of the previously cracked passwords it would be possible to brute force a number of the remaining hashes.

Don't use passwords - use passphrases instead. Of all the compromised data, not one password was a true passphrase. At best we can see two dictionary words concatenated with a numeral or two at the end. Think about what hasn't been compromised and consider whether or not your password/passphrase is more or less complex than the passwords cracked in this post.

Before using a password or passphrase, check the online database at https://haveibeenpwned.com/Passwords to make sure it hasn't already been compromised. Reusing an already compromised password, no more how secure it use to be, does not provide any protection at all.

Try to think outside the box. The Rockyou word list is 9 years old, but still formidable. Its success is due only to the fact the majority of users online are not creating truly unique, complex passphrases.

Search This Blog

1500 Bytes