John Wilson | 5 Feb 2012 15:00
Picon

Re: Buckinghamshire CC ANPR cameras

Rather to my surprise Bucks CC have given me the details of the
hashing scheme used by ANPR cameras which implement the UTMC protocol
(which is, I think, all of the civil and police ANPR cameras). This
was the result of an FoI request.

D 0 Q are replaced with O (Q isn't used in the current numbering scheme)
1 is replaced with I            (I isn't used in the current numbering scheme)
5 is replaced with S
Y is replaced with V
8 and B are replaced with 3  (this may cause problems after 2030)
Z is replaced with 2
F is replaced with E
C is replaced with G
M N W are replaced with H

In the scheme used since 2002 replacing a number by a letter or a
letter by a number will not cause extra collisions.

The transformed plate number is then hashed with the one-at-a-time
hash function described here
http://www.burtleburtle.net/bob/hash/doobs.html

The 32 bit result is reduced to 24 or 18 bits simply by masking.

This is described in the UTMC Technical Guide TR007.001b which, as far
as I can tell is not published on the UTMC site.

If anybody would like a copy of the document please contact me off list.

It would appear that the Highways Agency's statement that a large
(Continue reading)

Ian Batten | 5 Feb 2012 22:56

Re: Buckinghamshire CC ANPR cameras


On 5 Feb 2012, at 14:00, John Wilson wrote:

> 
> The transformed plate number is then hashed with the one-at-a-time
> hash function described here
> http://www.burtleburtle.net/bob/hash/doobs.html

You have to wonder how on earth an obscure, unreviewed algorithm published in a hobbyist magazine ends up
being used in a production system, don't you?

ian

Brian Morrison | 5 Feb 2012 23:55
Face
Picon

Re: Buckinghamshire CC ANPR cameras

On Sun, 5 Feb 2012 21:56:38 +0000
Ian Batten <igb@...> wrote:

> You have to wonder how on earth an obscure, unreviewed algorithm published
> in a hobbyist magazine ends up being used in a production system,
> don't you?

Sadly no, I imagine that the reason it was used was because someone
found it and didn't do any more thinking about what was needed and how
the algorithm would affect those needs.

--

-- 

Brian Morrison

bdm at fenrir dot org dot uk

   "Arguing with an engineer is like wrestling with a pig in the mud;
    after a while you realize you are muddy and the pig is enjoying it."

GnuPG key ID DE32E5C5 - http://wwwkeys.uk.pgp.net/pgpnet/wwwkeys.html
Florian Weimer | 5 Feb 2012 23:55
Picon

Re: Buckinghamshire CC ANPR cameras

* Ian Batten:

> On 5 Feb 2012, at 14:00, John Wilson wrote:
>
>> 
>> The transformed plate number is then hashed with the one-at-a-time
>> hash function described here
>> http://www.burtleburtle.net/bob/hash/doobs.html
>
> You have to wonder how on earth an obscure, unreviewed algorithm
> published in a hobbyist magazine ends up being used in a production
> system, don't you?

This particular function wasn't even published in Dr. Dobb's.
However, Bob Jenkins' hash functions are widely used.  Your own
computer probably uses them in some fashion, too.

John Wilson | 6 Feb 2012 11:24
Picon

Re: Buckinghamshire CC ANPR cameras

On 5 February 2012 22:55, Brian Morrison <bdm@...> wrote:
> On Sun, 5 Feb 2012 21:56:38 +0000
> Ian Batten <igb@...> wrote:
>
>> You have to wonder how on earth an obscure, unreviewed algorithm published
>> in a hobbyist magazine ends up being used in a production system,
>> don't you?
>
> Sadly no, I imagine that the reason it was used was because someone
> found it and didn't do any more thinking about what was needed and how
> the algorithm would affect those needs.

The document claims they tested a range of hash functions with 100,000
valid registration numbers. It seems quite a small test sample.

I'm not sniffy about contributors to Dr Dobbs. Back in the day it was
a good deal more useful to the working programmer than the ACM
Communications.

I've dome some quick tests using generated registration numbers. If
you try all the possible valid registration numbers between 51 and 12
(that's about 130 million) 24% have 10 collisions or fewer, 53% have
25 collisions or fewer, 92% have 100 collisions or fewer. 1% of the
numbers have 246 collisions or more. The highest number of collisions
is 2080. Of course, in the field you will have pre 2001 registration
numbers, "cherished numbers" and foreign numbers none of which I take
into account with this test.

It would be interesting to know if the DVLA manages the numbers it
issues to minimise the number of collisions. The know all the current
(Continue reading)

Ian Batten | 6 Feb 2012 12:11

Re: Buckinghamshire CC ANPR cameras


On 6 Feb 2012, at 10:24, John Wilson wrote:
> 
> I'm not sniffy about contributors to Dr Dobbs. Back in the day it was
> a good deal more useful to the working programmer than the ACM
> Communications.

That's not the point.  The hash function that you cited was designed for table lookup.  There's a lot of such
functions, and their properties are fairly well understood.  But the application it's being used for here
isn't a table lookup: it's data anonymisation.   It's different.  Before worrying about which journal
you're going to look in, you need to know what it is you want to achieve.  And a table-lookup hash function
probably doesn't have those properties.  For example, it has no need to make the average hamming distance
between two outputs independent of the hamming distance between the inputs, whereas for anonymisation
I'd say that was likely to be essential.  It's fine if the output from a table-lookup function sorts into the
same order as its inputs (ie, f(x)>f(y) => x>y) but that would be suicidal in anonymisation.

ian

Roland Perry | 6 Feb 2012 12:10

Re: Buckinghamshire CC ANPR cameras

In article

<CAJkBBXVsZSC9Wun8XCz_UqpKePe9TDrBNVnGNBOF3aacC4RiDA@...>, 
John Wilson <tugwilson@...> writes
>It would be interesting to know if the DVLA manages the numbers it
>issues to minimise the number of collisions. The know all the current
>registration numbers so could suppress new numbers which would have a
>high collision rate.

Given that the "replacements" are also the sort of characters that 
humans might confuse, reducing the number of collisions would seem to be 
useful, or are more of the collisions caused by 'weakness' of the hash, 
rather than two numberplates having the same pre-hash text after the 
replacement function has been run?
--

-- 
Roland Perry

Ian Batten | 6 Feb 2012 12:40

Regulatory Impact Assessment on LA use of RIPA


http://www.parliament.uk/documents/impact-assessments/IA12-004E.pdf

John Wilson | 6 Feb 2012 12:49
Picon

Re: Buckinghamshire CC ANPR cameras

On 6 February 2012 11:10, Roland Perry <lists@...> wrote:
> In article
>
<CAJkBBXVsZSC9Wun8XCz_UqpKePe9TDrBNVnGNBOF3aacC4RiDA@...>, John
> Wilson <tugwilson@...> writes
>
>> It would be interesting to know if the DVLA manages the numbers it
>> issues to minimise the number of collisions. The know all the current
>> registration numbers so could suppress new numbers which would have a
>> high collision rate.
>
>
> Given that the "replacements" are also the sort of characters that humans
> might confuse, reducing the number of collisions would seem to be useful, or
> are more of the collisions caused by 'weakness' of the hash, rather than two
> numberplates having the same pre-hash text after the replacement function
> has been run?

The "weakness" is caused by the substitution mechanism. Running the
test on the same dataset without substitution gives:

70% with 10 collisions or fewer
99% with 25 collisions or fewer
the highest number of collisions is 32 (2 instances)

It's possible that the DVLA to avoid issuing numbers which can be
easily confused. I might try an FoI request about this.

John Wilson

(Continue reading)

Roland Perry | 6 Feb 2012 14:18

Re: Buckinghamshire CC ANPR cameras

In article

<CAJkBBXW1Ecoo7_4RqEs78bkcR4G1Qzsa3H6RDsz8BMg0=FGUyg@...>, 
John Wilson <tugwilson@...> writes
>
>It's possible that the DVLA to avoid issuing numbers which can be
>easily confused. I might try an FoI request about this.

They do have some kind of programme for not issuing "rude" numbers, but 
PEN <fifteen> was one that got away.

We can only speculate why they apparently haven't issued PEN <one>S.
--

-- 
Roland Perry


Gmane