Thor Lancelot Simon | 20 Apr 09:18 2014
Picon

cprng_fast implementation benchmarks

I have done some benchmarks of various cprng_fast implementations:

	arc4-mtx		The libkern implementation from
				netbsd-current, which uses a spin mutex to
				serialize access to a single, shared arc4
				state.

	arc4-nomtx		Mutex calls #ifdeffed out.  What was in
				NetBSD prior to 2012.  This implementation
				is not correct.

	arc4-percpu		New implementation of cprng_fast using percpu
				state and arc4 as the core stream cipher.
				Uses the arc4 implementation from
				sys/crypto/arc4, slightly modified to give an
				entry point that skips the xor.

	hc128-percpu		Same new implementation but with hc128 as the
				core stream cipher.  Differs from what I
				posted earlier in that all use of inline
				functions in the public API has been removed.

	hc128-inline		Percpu iplementation I posted earlier with all
				noted bugs fixed; uses inlines in header file
				which expose some algorithm guts to speed up
				cprng_fast32().

PROCEDURE

	All implementations were modified to rekey every 512MB of output and
(Continue reading)

Thor Lancelot Simon | 18 Apr 22:12 2014
Picon

Towards design criteria for cprng_fast()

I would like to offer some observations about the use of cprng_fast()
(once known as arc4random()) in our kernel and, from these, express
what I believe are reasonable design criteria for that function.

O1) cprng_fast() is used in some performance-critical parts of the kernel:

	A) It's used to permute memory mappings

	B) It's used, indirectly, at every program startup, because its
	   output is sucked down by userspace via sysctl() for use by SSP.

	C) It's used to generate initialization vectors for other ciphers,
	   including for use by line-rate crypto accellerators.

	D) It appears that it can be called per-packet by a few parts of
	   the networking stack in some cases -- ALTQ, possibly ip_id.

O2) cprng_fast() is *never* used to encrypt data.

	A) This would seem to imply that IV generation for other ciphers
	   is its most sensitive use.

	B) We can swap out the algorithm underlying cprng_fast() at any
	   time with no compatibility concerns.

O3) cprng_fast() is usually used via cprg_fast32() to generate just 32 bits
    of data at once.

	A) This suggests that whatever algorithm we use, we'll need a way
	   to use it to service short requests efficiently.
(Continue reading)

Lilly Vollmer | 27 Mar 23:00 2014
Picon

lown

that God is seen more in

---
Questa e-mail è priva di virus e malware perché è attiva la protezione avast! Antivirus.
http://www.avast.com

Taylor R Campbell | 30 Sep 09:22 2013
Picon

cgd(4) ciphers

The set of ciphers supported by cgd is showing its age.  It would be
nice if cgd supported a block cipher that

(a) has high public confidence,

(b) can be easily implemented without timing side channels;

(c) has 256-bit blocks, so we don't need to worry about birthday
bounds for 128-bit block ciphers on multi-terabyte disks; and

(d) is fast in software without hardware acceleration, because cgd
can't take advantage of AES-NI at the moment and not all the world is
a modern high-end x86 system.

All of the ciphers cgd supports -- Blowfish, 3DES, and AES -- fail
(b), (c), and (d), and Blowfish and 3DES fail (c) and (d) badly, being
very slow 64-bit block ciphers.

The best two candidates that come to mind are Serpent, which fails
only (c) and (d), and Threefish, which seems like a good candidate.
Both were designed to avoid using data-dependent branches and memory
references.  Both have been subjected to thorough scrutiny and were
finalists in NIST competitions.

Thoughts?

(Side notes:
- cgd still needs renovation for MP safety and hardware acceleration,
but that's a bigger task than adding one or two new ciphers.
- We could use Threefish tweaks instead of CBC mode, but that would
(Continue reading)

Taylor R Campbell | 17 Jun 05:30 2013
Picon

rework rndsink and cprng locking

The attached patch reworks the rndsink(9) abstraction to simplify it
and fix various races, and rework the implementation (but not the
crypto) of cprng(9) in the process.  Comments?

After this, if I find the time, I'd like to (a) rework the cprng(9)
API a trifle to simplify it and make its blocking behaviour clearer,
(b) fix some MP issues in /dev/u?random, (c) rework the locking scheme
for rnd sources and the rnd sample queue, and (d) make it all run in
rump so we can easily have automatic tests for this stuff.
commit 8667a3927cc3d1f46300dbce8a70cd7135f3c2e1
Author: Taylor R Campbell <riastradh <at> NetBSD.org>
Date:   Wed Apr 10 20:40:27 2013 +0000

    Rework rndsink(9) abstraction and adapt arc4random(9) and cprng(9).
    
    rndsink(9):
    - Simplify API.
    - Simplify locking scheme.
    - Add a man page.
    - Avoid races in destruction.
    - Avoid races in requesting entropy now and scheduling entropy later.
    
    Periodic distribution of entropy to sinks reduces the need for the
    last one, but this way we don't need to rely on periodic distribution
    (e.g., in a future tickless NetBSD).
    
    rndsinks_lock should probably eventually merge with the rndpool lock,
    but we'll put that off for now.
(Continue reading)

Taylor R Campbell | 4 Sep 00:32 2012
Picon

default sshd host keys

(I am not subscribed to these lists, so please cc me in replies.)

If you enable sshd on stock NetBSD 6.0_RC1, then by default on boot
you will get an RSA host key with a 1024-bit modulus, a DSA host key
with 1024/160-bit parameters, and an ECDSA host key from the nistp521
curve.  All this is decided by the defaults specified in
/etc/rc.d/sshd and /etc/defaults/rc.conf.

But these days, 1024-bit RSA moduli and 1024/160-bit DSA parameters
are much too small for comfort[1].  ssh-keygen itself will generate
2048-bit RSA moduli by default, and the only reason that we end up
with 1024-bit RSA moduli is that we set

   ssh_keygen_flags="-b 1024"

in /etc/defaults/rc.conf.  I would like at least to replace this by

   ssh_keygen_flags=""

so that we get the defaults in ssh-keygen without our having to update
/etc/defaults/rc.conf every time the default key sizes are updated in
ssh-keygen.  Objections?

Going a little further, we could use `ssh-keygen -A' to generate all
the keys, instead of the script in /etc/rc.d/sshd.  However, that's a
bigger change, and I am also nervous about using 1024/160-bit DSA
parameters, which are much too small these days; or even using (EC)DSA
at all, because it requires an entropy source not only for key
generation but also to make signatures.  So if we make any bigger
change, I'd like to discuss using only RSA keys with >=2048-bit moduli
(Continue reading)

Thor Lancelot Simon | 6 Apr 16:59 2012
Picon

Kernel entropy pool / cprng race fix

The attached patch should fix a race that exists when an entropy sink
(such as a CPRNG instance that is being reseeded) is destroyed while
it is currently being reseeded.

I've minimally tested it but since the problem is hard to reproduce
I'd appreciate additional testers or additional eyes.

The diff is also at http://www.panix.com/~tls/seed-destroy.diff .

Thanks!

--

-- 
Thor Lancelot Simon	                                     tls <at> panix.com
  "The liberties...lose much of their value whenever those who have greater
   private means are permitted to use their advantages to control the course
   of public debate."					-John Rawls
? rh
? seed-destroy.diff
? arch/amd64/conf/RNDVERBOSE
? arch/arm/xscale/.iopaau.c.swp
? kern/1
? kern/lastbatch.txt
? kern/separate-mutex.diff
Index: kern/kern_rndq.c
===================================================================
RCS file: /cvsroot/src/sys/kern/kern_rndq.c,v
retrieving revision 1.1
diff -u -r1.1 kern_rndq.c
(Continue reading)

Thor Lancelot Simon | 4 Mar 04:49 2012
Picon

OpenSSH/OpenSSL patches to stop excessive entropy consumption

When applied along with revisions 1.10 and 1.11 of libc/gen/arc4random.c,
these patches should stop the excessive entropy consumption observed with
OpenSSH on current and NetBSD 6-branch systems.

I note that the cause of the problem is complex and somewhat amusing.

Let's start from this question: why on earth are there calls to
arc4random_stir() in unexpected places all over the OpenSSH sources?
Before and after every fork, after exec, in the key generation routines --
in places where there are no calls to arc4random() itself and where one
would hope there never had been (particularly for key generation!).

The reason turns out to be that, at some point, OpenSSL (not OpenSSH)
was patched for OpenBSD to make it use libc arc4random() as the source
of startup key material for its own RNG.  In an application like OpenSSH
that does not use the SSL parts of the library and does not call RAND_seed(),
that is the *only* key material for the generator.

I can only guess this was done because OpenSSL was "using too much entropy"
from /dev/random or /dev/urandom.  But the result was that programs like
OpenSSH, which call OpenSSL crypto functions in both halves after fork(),
would get the exact same bytes back from the generator (same primes for
ephemeral RSA or DH keys, same... *shudder*).  The pervasive calls to
arc4random_stir() paper over this problem.

This hack was not applied for NetBSD so OpenSSL continued to draw down
new entropy after every fork-exec OpenSSH performed: one per connection,
at least.  That's a lot of entropy.

The attached patch adjusts OpenSSL so its generator can be explicitly
(Continue reading)

Greg Troxel | 25 Feb 15:42 2012
Picon

openssl x509 -hash


Some colleagues have been finding that "openssl x509 -hash" produces
different results on netbsd-5 vs -current (late 2011).  The results are
consistent between i386/amd64.

(The hashes are used as symlinks in a CA directory to allow finding
trust anchor CA certs; we are using a private CA.)

1) Is anyone else seeing this?

2) Is there a notion that these hashes are meant to be computed/used on
a single machine, or are they meant to be broadly portable?  The man
page doesn't explain this very well.
Thor Lancelot Simon | 9 Dec 05:14 2011
Picon

Patch: new random pseudodevice

I have placed a patch at http://www.panix.com/~tls/rndpseudo.diff
which removes direct userspace access to the kernel entropy pool.  It
is replaced with the NIST SP 800-90 CTR_DRBG generator, separately
keyed per pseudodevice open (actually, keyed on first read or select so
opens don't themselves consume entropy).

The urandom device node will key the generator and output data even if the
kernel entropy pool estimates that it does not have enough bits to
provide an AES-128 key with ful entropy.  The random device node will block
until sufficient bits are available from the pool to key the generator.

Nonblocking/select/poll semantics should be the same as with the old
code -- I have test cases for this.

This generator is approximately 20 times as fast as the old generator
(dd with bs=64K yields 53MB/sec on 2Ghz Core2 instead of 2.5MB/sec)
and also uses a separate mutex per instance so concurrency is greatly
improved.

I have also fixed various bugs (including some missing locking and a
reseed-counter overflow in the CTR_DRBG code) while testing this.  I
am sure there are new bugs too.

I intend to check this in by Monday, December 12, and then, in a
separate step, move the remaining code from "rnd.c" and "rndpool.c"
to sys/kern from sys/dev, since it is no longer device code.  So, if
you have comments -- soon, please.

Thanks!

(Continue reading)

Thor Lancelot Simon | 21 Nov 16:50 2011
Picon

Re: Patch: rework kernel random number subsystem

On Mon, Nov 21, 2011 at 09:20:36AM +0100, Pawel Jakub Dawidek wrote:
> 
> Could you tell more about performance characteristics of your
> implementation? If I read the code correctly, you also use single mutex
> in cprng_strong() around all the work. The simplest scalability test is
> to run 'dd if=/dev/random of=/dev/null bs=1m count=1024' N times in
> parallel, where N is the number of CPUs.

As you can see from what I checked in, I haven't replaced the
pseudodevice implementation -- yet.  On my test system the performance
of the stream generator is about 50% better than the old direct
extraction from the entropy pool, and that's for small requests; it
can probably get better with some work.

When I replace the existing pseudodevice code, the way it will work is
that there will be one instance of cprng_strong per instance of the
pseudodevice -- which will clone on open.  So the problem you describe
should not exist.  Also, one separately-keyed/"personalized" instance
of the stream generator per client is really how these generators are
intended to be used, so I am more comfortable with it on those grounds
too.

At present there are only two instances of the cprng_strong used by
the kernel itself.  However, there is no reason why there could not
be one per CPU -- and there should.

Also on the near-term horizon is a replacement for cprng_fast()
which is much stronger, faster, and avoids contention by using
per-cpu state.  You may have noticed I had to add a mutex to
the underlying arc4random() implementation -- this is a temporary
(Continue reading)


Gmane