Guido van Rossum | 1 Jan 2012 01:11
Favicon

Re: Hash collision security issue (now public)

On Sat, Dec 31, 2011 at 4:56 PM, Guido van Rossum <guido <at> python.org> wrote:
PS. I would propose a specific fix but I can't seem to build a working CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this error late in the build:

./python.exe -SE -m sysconfig --generate-posix-vars
Fatal Python error: Py_Initialize: can't initialize sys standard streams
Traceback (most recent call last):
  File "/Users/guido/cpython/Lib/io.py", line 60, in <module>
make: *** [Lib/_sysconfigdata.py] Abort trap

FWIW I managed to build Python 2.6, and a trivial mutation of the string/unicode hash function (add 1 initially) made only three tests fail; test_symtable and test_json both have a dependency on dictionary order, test_ctypes I can't quite figure out what's going on.

Oh, and an unrelated failure in test_sqlite:

  File "/Users/guido/pythons/p26/Lib/sqlite3/test/types.py", line 355, in CheckSqlTimestamp
    self.failUnlessEqual(ts.year, now.year)
AssertionError: 2012 != 2011

I betcha that's because it's still 2011 here in Texas but already 2012 in UTC-land. Happy New Year everyone! :-)

--
--Guido van Rossum (python.org/~guido)
<div>
<div class="gmail_quote">On Sat, Dec 31, 2011 at 4:56 PM, Guido van Rossum <span dir="ltr">&lt;<a href="mailto:guido <at> python.org">guido <at> python.org</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">

PS. I would propose a specific fix but I can't seem to build a working CPython from the trunk on my laptop (OS X 10.6, Xcode 4.1). I get this error late in the build:<br><br>./python.exe -SE -m sysconfig --generate-posix-vars<br>Fatal Python error: Py_Initialize: can't initialize sys standard streams<br>Traceback (most recent call last):<br>&nbsp; File "/Users/guido/cpython/Lib/io.py", line 60, in &lt;module&gt;<br>

make: *** [Lib/_sysconfigdata.py] Abort trap<span class="HOEnZb"></span><br>
</blockquote>
</div>
<br>FWIW I managed to build Python 2.6, and a trivial mutation of the string/unicode hash function (add 1 initially) made only three tests fail; test_symtable and test_json both have a dependency on dictionary order, test_ctypes I can't quite figure out what's going on.<br><br>Oh, and an unrelated failure in test_sqlite:<br><br>&nbsp; File "/Users/guido/pythons/p26/Lib/sqlite3/test/types.py", line 355, in CheckSqlTimestamp<br>&nbsp;&nbsp;&nbsp; self.failUnlessEqual(ts.year, now.year)<br>AssertionError: 2012 != 2011<br><br>I betcha that's because it's still 2011 here in Texas but already 2012 in UTC-land. Happy New Year everyone! :-)<br clear="all"><br>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br>
</div>
Paul McMillan | 1 Jan 2012 04:29
Gravatar

Re: Hash collision security issue (now public)

> I'm not too concerned about a 3rd party being able to guess the random seed
> -- this would require much more effort on their part, since they would have
> to generate a new set of colliding keys each time they think they have
> guessed the hash

This is incorrect. Once an attacker has guessed the random seed, any
operation which reveals the ordering of hashed objects can be used to
verify the answer. JSON responses would be ideal. In fact, an attacker
can do a brute-force attack of the random seed offline. Once they have
the seed, generating collisions is a fast process.

The goal isn't perfection, but we need to do better than a simple
salt. I propose we modify the string hash function like this:

https://gist.github.com/0a91e52efa74f61858b5

This code is based on PyPy's implementation, but the concept is
universal. Rather than choosing a single short random seed per
process, we generate a much larger random seed (r). As we hash, we
deterministically choose a portion of that seed and incorporate it
into the hash process. This modification is a minimally intrusive
change to the existing hash function, and so should not introduce
unexpected side effects which might come from switching to a different
class of hash functions.

I've worked through this code with Alex Gaynor, Antoine Pitrou, and
Victor Stinner, and have asked several mathematicians and security
experts to review the concept. The reviewers who have gotten back to
me thus far have agreed that if the initial random seed is not flawed,
this should not overly change the properties of the hash function, but
should make it quite difficult for an attacker to deduce the necessary
information to predictably cause hash collisions. This function is not
designed to protect against timing attacks, but should be nontrivial
to reverse even with access to timing data.

Empirical testing shows that this unoptimized python implementation
produces ~10% slowdown in the hashing of ~20 character strings. This
is probably an acceptable trade off, and actually provides better
performance in the case of short strings than a high-entropy
fixed-length seed prefix.

-Paul
martin | 1 Jan 2012 04:36
Picon
Gravatar

Re: Hash collision security issue (now public)

> (Well, technically, you could use trees or some other O log n data
> structure as a fallback once you have too many collisions, for some value
> of "too many".  Seems a bit wasteful for the purpose, though.)

I don't think that would be wasteful. You wouldn't just use the tree for
the case of too many collisions, but for any collision. You might special-case
the case of a single key, i.e. start using the tree only if there is a
collision.

The issue is not the effort, but the need to support ordering if you want
to use trees. So you might restrict this to dicts that have only str keys
(which in practice should never have any collision, unless it's a deliberate
setup).

I'd use the tagged-pointer trick to determine whether a key is an object
pointer (tag 0) or an AVL tree (tag 1). So in the common case of interned
strings, the comparison for pointer equality (which is the normal case
if the keys are interned) will succeed quickly; if pointer comparison fails,
check the tag bit.

Regards,
Martin

Antoine Pitrou | 1 Jan 2012 05:11

Re: Hash collision security issue (now public)

On Sat, 31 Dec 2011 16:56:00 -0700
Guido van Rossum <guido <at> python.org> wrote:
> ISTM the only reasonable thing is to have a random seed picked very early
> in the process, to be used to change the hash() function of
> str/bytes/unicode (in a way that they are still compatible with each other).

Do str and bytes still have to be compatible with each other in 3.x?

Merry hashes, weakrefs and thread-local memoryviews to everyone!

cheers

Antoine.

Guido van Rossum | 1 Jan 2012 05:22
Favicon

Re: Hash collision security issue (now public)

On Sat, Dec 31, 2011 at 9:11 PM, Antoine Pitrou <solipsis <at> pitrou.net> wrote:
On Sat, 31 Dec 2011 16:56:00 -0700
Guido van Rossum <guido <at> python.org> wrote:
> ISTM the only reasonable thing is to have a random seed picked very early
> in the process, to be used to change the hash() function of
> str/bytes/unicode (in a way that they are still compatible with each other).

Do str and bytes still have to be compatible with each other in 3.x?

Hm, you're right, that's no longer a concern. (Though ATM the hashes still *are* compatible.)
 
Merry hashes, weakrefs and thread-local memoryviews to everyone!

:-)
 
--
--Guido van Rossum (python.org/~guido)
<div>
<div class="gmail_quote">On Sat, Dec 31, 2011 at 9:11 PM, Antoine Pitrou <span dir="ltr">&lt;<a href="mailto:solipsis <at> pitrou.net">solipsis <at> pitrou.net</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">

On Sat, 31 Dec 2011 16:56:00 -0700<br><div class="im">Guido van Rossum &lt;<a href="mailto:guido <at> python.org">guido <at> python.org</a>&gt; wrote:<br>
</div>
<div class="im">&gt; ISTM the only reasonable thing is to have a random seed picked very early<br>
&gt; in the process, to be used to change the hash() function of<br>
&gt; str/bytes/unicode (in a way that they are still compatible with each other).<br><br>
</div>Do str and bytes still have to be compatible with each other in 3.x?<br>
</blockquote>
<div>
<br>Hm, you're right, that's no longer a concern. (Though ATM the hashes still *are* compatible.)<br>&nbsp;</div>
<blockquote class="gmail_quote">

Merry hashes, weakrefs and thread-local memoryviews to everyone!<br>
</blockquote>
<div>
<br>:-)<br>&nbsp;</div>
</div>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br>
</div>
Guido van Rossum | 1 Jan 2012 05:31
Favicon

Re: Hash collision security issue (now public)

On Sat, Dec 31, 2011 at 8:29 PM, Paul McMillan <paul <at> mcmillan.ws> wrote:
> I'm not too concerned about a 3rd party being able to guess the random seed
> -- this would require much more effort on their part, since they would have
> to generate a new set of colliding keys each time they think they have
> guessed the hash

This is incorrect. Once an attacker has guessed the random seed, any
operation which reveals the ordering of hashed objects can be used to
verify the answer. JSON responses would be ideal. In fact, an attacker
can do a brute-force attack of the random seed offline. Once they have
the seed, generating collisions is a fast process.

Still, it would represent an effort for the attacker of a much greater magnitude than the current attack. It's all a trade-off -- at some point it'll just be easier for the attacker to use some other vulnerability. Also the class of vulnerable servers would be greatly reduced.
 
The goal isn't perfection, but we need to do better than a simple
salt.

Perhaps.
 
I propose we modify the string hash function like this:

https://gist.github.com/0a91e52efa74f61858b5

This code is based on PyPy's implementation, but the concept is
universal. Rather than choosing a single short random seed per
process, we generate a much larger random seed (r). As we hash, we
deterministically choose a portion of that seed and incorporate it
into the hash process. This modification is a minimally intrusive
change to the existing hash function, and so should not introduce
unexpected side effects which might come from switching to a different
class of hash functions.

I'm not sure I understand this. What's the worry about "a different class of hash functions"? (It may be clear that I do not have a deep mathematical understanding of hash functions.)
 
I've worked through this code with Alex Gaynor, Antoine Pitrou, and
Victor Stinner, and have asked several mathematicians and security
experts to review the concept. The reviewers who have gotten back to
me thus far have agreed that if the initial random seed is not flawed,

I forget -- what do we do on systems without urandom()? (E.g. Windows?)
 
this should not overly change the properties of the hash function, but
should make it quite difficult for an attacker to deduce the necessary
information to predictably cause hash collisions. This function is not
designed to protect against timing attacks, but should be nontrivial
to reverse even with access to timing data.

Let's worry about timing attacks another time okay?
 
Empirical testing shows that this unoptimized python implementation
produces ~10% slowdown in the hashing of ~20 character strings. This
is probably an acceptable trade off, and actually provides better
performance in the case of short strings than a high-entropy
fixed-length seed prefix.

Hm. I'm not sure I like the idea of extra arithmetic for every character being hashed. But I like the idea of a bigger random seed from which we deterministically pick some part. How about just initializing x to some subsequence of the seed determined by e.g. the length of the hashed string plus a few characters from it?

--
--Guido van Rossum (python.org/~guido)
<div>
<div class="gmail_quote">On Sat, Dec 31, 2011 at 8:29 PM, Paul McMillan <span dir="ltr">&lt;<a href="mailto:paul <at> mcmillan.ws">paul <at> mcmillan.ws</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">

<div class="im">&gt; I'm not too concerned about a 3rd party being able to guess the random seed<br>
&gt; -- this would require much more effort on their part, since they would have<br>
&gt; to generate a new set of colliding keys each time they think they have<br>
&gt; guessed the hash<br><br>
</div>This is incorrect. Once an attacker has guessed the random seed, any<br>
operation which reveals the ordering of hashed objects can be used to<br>
verify the answer. JSON responses would be ideal. In fact, an attacker<br>
can do a brute-force attack of the random seed offline. Once they have<br>
the seed, generating collisions is a fast process.<br>
</blockquote>
<div>
<br>Still, it would represent an effort for the attacker of a much greater magnitude than the current attack. It's all a trade-off -- at some point it'll just be easier for the attacker to use some other vulnerability. Also the class of vulnerable servers would be greatly reduced.<br>

&nbsp;<br>
</div>
<blockquote class="gmail_quote">
The goal isn't perfection, but we need to do better than a simple<br>
salt.</blockquote>
<div>
<br>Perhaps.<br>&nbsp;</div>
<blockquote class="gmail_quote">I propose we modify the string hash function like this:<br><br><a href="https://gist.github.com/0a91e52efa74f61858b5" target="_blank">https://gist.github.com/0a91e52efa74f61858b5</a><br><br>
This code is based on PyPy's implementation, but the concept is<br>
universal. Rather than choosing a single short random seed per<br>
process, we generate a much larger random seed (r). As we hash, we<br>
deterministically choose a portion of that seed and incorporate it<br>
into the hash process. This modification is a minimally intrusive<br>
change to the existing hash function, and so should not introduce<br>
unexpected side effects which might come from switching to a different<br>
class of hash functions.<br>
</blockquote>
<div>
<br>I'm not sure I understand this. What's the worry about "a different class of hash functions"? (It may be clear that I do not have a deep mathematical understanding of hash functions.)<br>

&nbsp;</div>
<blockquote class="gmail_quote">
I've worked through this code with Alex Gaynor, Antoine Pitrou, and<br>
Victor Stinner, and have asked several mathematicians and security<br>
experts to review the concept. The reviewers who have gotten back to<br>
me thus far have agreed that if the initial random seed is not flawed,</blockquote>
<div>
<br>I forget -- what do we do on systems without urandom()? (E.g. Windows?)<br>&nbsp;</div>
<blockquote class="gmail_quote">

this should not overly change the properties of the hash function, but<br>
should make it quite difficult for an attacker to deduce the necessary<br>
information to predictably cause hash collisions. This function is not<br>
designed to protect against timing attacks, but should be nontrivial<br>
to reverse even with access to timing data.<br>
</blockquote>
<div>
<br>Let's worry about timing attacks another time okay?<br>&nbsp;</div>
<blockquote class="gmail_quote">

Empirical testing shows that this unoptimized python implementation<br>
produces ~10% slowdown in the hashing of ~20 character strings. This<br>
is probably an acceptable trade off, and actually provides better<br>
performance in the case of short strings than a high-entropy<br>
fixed-length seed prefix.<span class="HOEnZb"></span><br>
</blockquote>
</div>
<br>Hm. I'm not sure I like the idea of extra arithmetic for every character being hashed. But I like the idea of a bigger random seed from which we deterministically pick some part. How about just initializing x to some subsequence of the seed determined by e.g. the length of the hashed string plus a few characters from it?<br clear="all"><br>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br>
</div>
Paul McMillan | 1 Jan 2012 06:57
Gravatar

Re: Hash collision security issue (now public)

> Still, it would represent an effort for the attacker of a much greater
> magnitude than the current attack. It's all a trade-off -- at some point
> it'll just be easier for the attacker to use some other vulnerability. Also
> the class of vulnerable servers would be greatly reduced.

I agree that doing anything is better than doing nothing. If we use
the earlier suggestion and prepend everything with a fixed-length
seed, we need quite a bit of entropy (and so a fairly long string) to
make a lasting difference.

> I'm not sure I understand this. What's the worry about "a different class of
> hash functions"? (It may be clear that I do not have a deep mathematical
> understanding of hash functions.)

This was mostly in reference to earlier suggestions of switching to
cityhash, or using btrees, or other more invasive changes. Python 2.X
is pretty stable and making large changes like that to the codebase
can have unpredictable effects. We know that the current hash function
works well (except for this specific problem), so it seems like the
best fix will be as minimal a modification as possible, to avoid
introducing bugs.

> I forget -- what do we do on systems without urandom()? (E.g. Windows?)
Windows has CryptGenRandom which is approximately equivalent.

> Let's worry about timing attacks another time okay?
Agreed. As long as there isn't a gaping hole, I'm fine with that.

> Hm. I'm not sure I like the idea of extra arithmetic for every character
> being hashed.

>From a performance standpoint, this may still be better than adding 8
or 10 characters to every single hash operation, since most hashes are
over short strings. It is important that this function touches every
character - if it only interacts with a subset of them, an attacker
can fix that subset and vary the rest.

> But I like the idea of a bigger random seed from which we
> deterministically pick some part.

Yeah. This makes it much harder to attack, since it very solidly
places the attacker outside the realm of "just brute force the key".

> How about just initializing x to some
> subsequence of the seed determined by e.g. the length of the hashed string
> plus a few characters from it?

We did consider this, and if performance is absolutely the prime
directive, this (or a variant) may be the best option. Unfortunately,
the collision generator doesn't necessarily vary the length of the
string. Additionally, if we don't vary based on all the letters in the
string, an attacker can fix the characters that we do use and generate
colliding strings around them.

Another option to consider would be to apply this change to some but
not all of the rounds. If we added the seed lookup xor operation for
only the first and last 5 values of x, we would still retain much of
the benefit without adding much computational overhead for very long
strings.

We could also consider a less computationally expensive operation than
the modulo for calculating the lookup index, like simply truncating to
the correct number of bits.

-Paul
Guido van Rossum | 1 Jan 2012 16:09
Favicon

Re: Hash collision security issue (now public)

On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan <paul <at> mcmillan.ws> wrote:
> Still, it would represent an effort for the attacker of a much greater
> magnitude than the current attack. It's all a trade-off -- at some point
> it'll just be easier for the attacker to use some other vulnerability. Also
> the class of vulnerable servers would be greatly reduced.

I agree that doing anything is better than doing nothing. If we use
the earlier suggestion and prepend everything with a fixed-length
seed, we need quite a bit of entropy (and so a fairly long string) to
make a lasting difference.

Ah, but the effect of that long string is summarized in a single (32- or 64-bit) integer.
 
> I'm not sure I understand this. What's the worry about "a different class of
> hash functions"? (It may be clear that I do not have a deep mathematical
> understanding of hash functions.)

This was mostly in reference to earlier suggestions of switching to
cityhash, or using btrees, or other more invasive changes. Python 2.X
is pretty stable and making large changes like that to the codebase
can have unpredictable effects.

Agreed.
 
We know that the current hash function
works well (except for this specific problem), so it seems like the
best fix will be as minimal a modification as possible, to avoid
introducing bugs.

Yup.
 
> I forget -- what do we do on systems without urandom()? (E.g. Windows?)
Windows has CryptGenRandom which is approximately equivalent.

> Let's worry about timing attacks another time okay?
Agreed. As long as there isn't a gaping hole, I'm fine with that.

> Hm. I'm not sure I like the idea of extra arithmetic for every character
> being hashed.

From a performance standpoint, this may still be better than adding 8
or 10 characters to every single hash operation, since most hashes are
over short strings.

But how about precomputing the intermediate value (x)? The hash is (mostly) doing x = f(x, c) for each c in the input.

It is important that this function touches every
character - if it only interacts with a subset of them, an attacker
can fix that subset and vary the rest.

I sort of see your point, but I still think that if we could add as little per-character overhead as possible it would be best -- sometimes people *do* hash very long strings.
 
> But I like the idea of a bigger random seed from which we
> deterministically pick some part.

Yeah. This makes it much harder to attack, since it very solidly
places the attacker outside the realm of "just brute force the key".

> How about just initializing x to some
> subsequence of the seed determined by e.g. the length of the hashed string
> plus a few characters from it?

We did consider this, and if performance is absolutely the prime
directive, this (or a variant) may be the best option. Unfortunately,
the collision generator doesn't necessarily vary the length of the
string. Additionally, if we don't vary based on all the letters in the
string, an attacker can fix the characters that we do use and generate
colliding strings around them.

Still, much more work for the attacker.
 
Another option to consider would be to apply this change to some but
not all of the rounds. If we added the seed lookup xor operation for
only the first and last 5 values of x, we would still retain much of
the benefit without adding much computational overhead for very long
strings.

I like that.
 
We could also consider a less computationally expensive operation than
the modulo for calculating the lookup index, like simply truncating to
the correct number of bits.

Sure. Thanks for thinking about all the details here!!

--
--Guido van Rossum (python.org/~guido)
<div>
<div class="gmail_quote">On Sat, Dec 31, 2011 at 10:57 PM, Paul McMillan <span dir="ltr">&lt;<a href="mailto:paul <at> mcmillan.ws">paul <at> mcmillan.ws</a>&gt;</span> wrote:<br><blockquote class="gmail_quote">

<div class="im">&gt; Still, it would represent an effort for the attacker of a much greater<br>
&gt; magnitude than the current attack. It's all a trade-off -- at some point<br>
&gt; it'll just be easier for the attacker to use some other vulnerability. Also<br>
&gt; the class of vulnerable servers would be greatly reduced.<br><br>
</div>I agree that doing anything is better than doing nothing. If we use<br>
the earlier suggestion and prepend everything with a fixed-length<br>
seed, we need quite a bit of entropy (and so a fairly long string) to<br>
make a lasting difference.<br><div class="im"></div>
</blockquote>
<div>
<br>Ah, but the effect of that long string is summarized in a single (32- or 64-bit) integer.<br>&nbsp;</div>
<blockquote class="gmail_quote">

<div class="im">
&gt; I'm not sure I understand this. What's the worry about "a different class of<br>
&gt; hash functions"? (It may be clear that I do not have a deep mathematical<br>
&gt; understanding of hash functions.)<br><br>
</div>This was mostly in reference to earlier suggestions of switching to<br>
cityhash, or using btrees, or other more invasive changes. Python 2.X<br>
is pretty stable and making large changes like that to the codebase<br>
can have unpredictable effects.</blockquote>
<div>
<br>Agreed.<br>&nbsp;<br>
</div>
<blockquote class="gmail_quote">We know that the current hash function<br>

works well (except for this specific problem), so it seems like the<br>
best fix will be as minimal a modification as possible, to avoid<br>
introducing bugs.<br><div class="im"></div>
</blockquote>
<div>
<br>Yup.<br>&nbsp;<br>
</div>
<blockquote class="gmail_quote">
<div class="im">

&gt; I forget -- what do we do on systems without urandom()? (E.g. Windows?)<br>
</div>Windows has CryptGenRandom which is approximately equivalent.<br><div class="im">
<br>
&gt; Let's worry about timing attacks another time okay?<br>
</div>Agreed. As long as there isn't a gaping hole, I'm fine with that.<br><div class="im">
<br>
&gt; Hm. I'm not sure I like the idea of extra arithmetic for every character<br>
&gt; being hashed.<br><br>
</div>From a performance standpoint, this may still be better than adding 8<br>
or 10 characters to every single hash operation, since most hashes are<br>
over short strings.</blockquote>
<div>
<br>But how about precomputing the intermediate value (x)? The hash is (mostly) doing x = f(x, c) for each c in the input.<br><br>
</div>
<blockquote class="gmail_quote">

 It is important that this function touches every<br>
character - if it only interacts with a subset of them, an attacker<br>
can fix that subset and vary the rest.<br><div class="im"></div>
</blockquote>
<div>
<br>I sort of see your point, but I still think that if we could add as little per-character overhead as possible it would be best -- sometimes people *do* hash very long strings.<br>

&nbsp;</div>
<blockquote class="gmail_quote">
<div class="im">
&gt; But I like the idea of a bigger random seed from which we<br>
&gt; deterministically pick some part.<br><br>
</div>Yeah. This makes it much harder to attack, since it very solidly<br>
places the attacker outside the realm of "just brute force the key".<br><div class="im">
<br>
&gt; How about just initializing x to some<br>
&gt; subsequence of the seed determined by e.g. the length of the hashed string<br>
&gt; plus a few characters from it?<br><br>
</div>We did consider this, and if performance is absolutely the prime<br>
directive, this (or a variant) may be the best option. Unfortunately,<br>
the collision generator doesn't necessarily vary the length of the<br>
string. Additionally, if we don't vary based on all the letters in the<br>
string, an attacker can fix the characters that we do use and generate<br>
colliding strings around them.<br>
</blockquote>
<div>
<br>Still, much more work for the attacker.<br>&nbsp;</div>
<blockquote class="gmail_quote">

Another option to consider would be to apply this change to some but<br>
not all of the rounds. If we added the seed lookup xor operation for<br>
only the first and last 5 values of x, we would still retain much of<br>
the benefit without adding much computational overhead for very long<br>
strings.<br>
</blockquote>
<div>
<br>I like that. <br>
</div>
<div>&nbsp;</div>
<blockquote class="gmail_quote">
We could also consider a less computationally expensive operation than<br>
the modulo for calculating the lookup index, like simply truncating to<br>
the correct number of bits.<span class="HOEnZb"></span><br>
</blockquote>
</div>
<br>Sure. Thanks for thinking about all the details here!!<br clear="all"><br>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br>
</div>
Guido van Rossum | 1 Jan 2012 16:13
Favicon

Re: Hash collision security issue (now public)

Different concern. What if someone were to have code implementing an external, persistent hash table, using Python's hash() function? They might have a way to rehash everything when a new version of Python comes along, but they would not be happy if hash() is different in each process. I somehow vaguely remember possibly having seen such code, or something else where a bit of random data was needed and hash() was used since it's so easily available.

--
--Guido van Rossum (python.org/~guido)

<div><p>Different concern. What if someone were to have code implementing an external, persistent hash table, using Python's hash() function? They might have a way to rehash everything when a new version of Python comes along, but they would not be happy if hash() is different in each process. I somehow vaguely remember possibly having seen such code, or something else where a bit of random data was needed and hash() was used since it's so easily available.<br clear="all"><br>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br></p></div>
Guido van Rossum | 1 Jan 2012 16:13
Favicon

Re: Hash collision security issue (now public)

PS. Is the collision-generator used in the attack code open source?

--
--Guido van Rossum (python.org/~guido)

<div><p>PS. Is the collision-generator used in the attack code open source?<br clear="all"><br>-- <br>--Guido van Rossum (<a href="http://python.org/~guido">python.org/~guido</a>)<br></p></div>

Gmane