David Leon Gil via RT | 2 Oct 02:37 2014
Picon

Re: [openssl.org #3554] [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix

(Many thanks to Lutz for pointing out that I omitted the subject line;
hopefully this isn't a duplicate.)

> ---------- Forwarded message ----------
> From: David Leon Gil <coruus <at> gmail.com>
> To: "rt <at> openssl.org" <rt <at> openssl.org>
> Cc:
> Date: Wed, 1 Oct 2014 09:45:10 -0400
> Subject: Re: [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix
 On Wednesday, October 1, 2014, Andy Polyakov via RT <rt <at> openssl.org> wrote:
>
> > All internal exports: Zeroize XMM registers that may contain secret
> > data before returning. (At 4x pxors per cycle, the overhead is
> > negligible.)
> >
> > _ctr32: Zeroize $key0 and $ctr.
>
> Question is why just aesni module? Why not everywhere?

It should be done everywhere. But I only have limited time to spend on
this, so...one file at a time.

> Why not demand that compiler does it too?

See, e.g., http://www.daemonology.net/blog/2014-09-06-zeroing-buffers-is-insufficient.html
for someone who is demanding just that.

I have heard there is some work on a compiler pass to do so, in fact.

> Why just registers, and not stack too?
(Continue reading)

Andy Polyakov via RT | 1 Oct 23:51 2014
Picon

Re: [openssl.org #3149]

For public reference. The final version committed to repository with
following changes:

- nameing re-biased to ecp_nistz256, both filenames and functions;
- assembly optimized for processors other than Intel Core family;
- assembly modules adapted even for non-ELF platforms;
- some of higher level subroutines are moved to assembly to improve
performance even further;
- code adapted even for 32-bit platforms;

As for latter. Effort is ongoing to initially support ARM and x86 and
preliminary results indicate ~2x performance improvement. Point worth
mentioning in the context is that I'm considering switching back to
scatter-gather method. Basically for 32-bit sake, because current method
is a bit too slow on non-SIMD platforms. But then it would be simpler to
maintain same method in all cases including x86_64 one. Objection
against scatter-gather method was based on assertion in
http://cryptojedi.org/peter/data/chesrump-20130822.pdf that timing
within cache line varies. While phenomena is real, one has to recognize
that its effect on gather procedure is either 0 or at most few cycles.
This is because if conflict can occur it occurs only once per gather
procedure. And since it's only few cycles it's not given that you can
actually measure it, because tick counter resolution is actually limited
on contemporary processors. Not to mention that conflict can be avoided
by aligning stack frame in specific manner relative to scatter table
(method used in x86_64-mont5 module by the way).

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev <at> openssl.org
(Continue reading)

Andy Polyakov via RT | 1 Oct 20:02 2014
Picon

Re: [openssl.org #3554] [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix

>> If you can present
>> coherent argument and consensus is reached, then it would have to be
>> implemented universally, not only in aesni-x86_64 module.
> 
> So, hopefully my cross-posted message convinced you.

No, not really. What I meant was that as long as you can't ask even
compiler to wipe used registers and stack frame, it doesn't really make
sense to strive for this in assembly. Or in other words if we set up for
such quest, then we should make corresponding cases with all compiler
developers/manufacturers.

> To summarize the
> argument briefly:
> 
> - Library users may be performing a mix of private cryptographic
> operations and operations controlled by untrusted code.

Even case for compiler-generated code.

> - A large "API" may be exposed to the untrusted code.

Even case for compiler-generated code.

> - It's easier to sanitize secrets from memory and registers when we
> know they contain secrets than it is to ensure that *no* other
> functions may leak register contents to untrusted code.

Even case for compiler-generated code.

(Continue reading)

Rich Salz via RT | 1 Oct 17:36 2014
Picon

[openssl.org #2779] OpenSSL 1.0.1 doesn't compile with NO_STDIO/NO_FP_API

Some patches that let the build get further along, before I gave up.
Er, stopped working on this. :)
--
Rich Salz, OpenSSL dev team; rsalz <at> openssl.org

David Leon Gil via RT | 1 Oct 16:08 2014
Picon

Re: [openssl.org #3554] [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix

On Wed, Oct 1, 2014 at 9:40 AM, Andy Polyakov via RT <rt <at> openssl.org> wrote:
> If you can present
> coherent argument and consensus is reached, then it would have to be
> implemented universally, not only in aesni-x86_64 module.

So, hopefully my cross-posted message convinced you. To summarize the
argument briefly:

- Library users may be performing a mix of private cryptographic
operations and operations controlled by untrusted code.
- A large "API" may be exposed to the untrusted code.
- It's easier to sanitize secrets from memory and registers when we
know they contain secrets than it is to ensure that *no* other
functions may leak register contents to untrusted code.
- The cost is negligible. (And it's lower for us than library clients:
They have no way of knowing what registers have been used, so they
would need to do the equivalent of OPENSSL_wipe_cpu.)

The reasons for targeting AES-NI first: I was working on another patch
(not yet submitted) to that file.

The reason for doing this one-file-at-a-time: A single huge patch
would likely see little meaningful review (reviewing assembler is
fairly tiring even at the scope of a single file).

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev <at> openssl.org
Automated List Manager                           majordomo <at> openssl.org

(Continue reading)

Andy Polyakov via RT | 1 Oct 15:40 2014
Picon

Re: [openssl.org #3554] [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix

>> All internal exports: Zeroize XMM registers that may contain secret
>> data before returning. (At 4x pxors per cycle, the overhead is
>> negligible.)
>>
>> _ctr32: Zeroize $key0 and $ctr.
> 
> Question is why just aesni module? Why not everywhere? Why not demand
> that compiler does it too? Why just registers, and not stack too? The
> answer is that it doesn't make much sense, because the code you are
> trying to "protect" against resides in same process context and can read
> all the secrets from memory much more reliably than from registers or
> stack. I'm not saying that it makes no sense to clean up, only that *if*
> we do choose to do it, then it should be done for right reason and
> consistently.

Well, I'm being a little bit inconsistent here, because there are
*stack* cleanups in some other modules, most notably in BN. But question
why registers and why in just aesni still stands. If you can present
coherent argument and consensus is reached, then it would have to be
implemented universally, not only in aesni-x86_64 module.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
Development Mailing List                       openssl-dev <at> openssl.org
Automated List Manager                           majordomo <at> openssl.org

Andy Polyakov via RT | 1 Oct 15:24 2014
Picon

Re: [openssl.org #3554] [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix

> All internal exports: Zeroize XMM registers that may contain secret
> data before returning. (At 4x pxors per cycle, the overhead is
> negligible.)
> 
> _ctr32: Zeroize $key0 and $ctr.

Question is why just aesni module? Why not everywhere? Why not demand
that compiler does it too? Why just registers, and not stack too? The
answer is that it doesn't make much sense, because the code you are
trying to "protect" against resides in same process context and can read
all the secrets from memory much more reliably than from registers or
stack. I'm not saying that it makes no sense to clean up, only that *if*
we do choose to do it, then it should be done for right reason and
consistently.

> (By the way, is OPENSSL_wipe_cpu used or tested anywhere?)

No. For reference, the original idea was to seek opportunity to deploy
it on border to libcrypto shared library, i.e. not upon return from some
specific subroutines, but automatically upon any return from within
libcrypto to application code. The idea was never realized though. The
only case when it makes sense is when some other code called afterwards
refers to register or ex-local variable used in libcrypto as
uninitialized and unintentionally sends that data to another process or
over network. Even then one can argue that it's that code vulnerability
that should be fixed. It might make sense to use OPENSSL_wipe_cpu in
specialized cases, when OpenSSL components are used in non-multi-task
environment, but that would be something for specific developer to
reason around and do.

(Continue reading)

Andy Polyakov via RT | 1 Oct 14:56 2014
Picon

Re: [openssl.org #3552] aesni_ecb_encrypt clobbers Win64 callee-save registers

> crypto/aes/asm/aesni-x86_64.pl: aesni_ecb_encrypt (unlike the other
> AES-NI functions) does not save and restore the Win64 ABI callee-save
> XMM registers.

Oh! The reason must be that originally the module used lower instruction
interleave factor and ECB didn't need to off-load any XMM registers. And
when interleave factor was increased, ECB was overlooked and problem
remained unnoticed, because ECB is not actually used in any real-life
application. Thanks for report! But solution would be different from one
proposed in next report, we don't need to off-load that many registers,
4 is sufficient, and one has to harmonize SEH handler too...

Attachment (aesni-x86_64.diff): text/x-patch, 2218 bytes
Andy Polyakov via RT | 1 Oct 14:42 2014
Picon

Re: [openssl.org #3550] patch

Hi,

> This is the ppccap.c's patch for Mac OS X with PowerPC G3.
> OpenSSL with PowerPC G3 is working fine. But always clash without OPENSSL_ppccap=0.

If by crash you mean http://www.openssl.org/support/faq.html#PROG17,
i.e. you suffer from it when debugging, then answer is deal with it.
E.g. by setting the environment variable when debugging. Otherwise you
shouldn't notice it, because the exception is handled. But in either
case suggested patch wouldn't not qualify as solution. Note that with
the compiler flags used to build OpenSSL on MacOS X for PPC __ALTIVEC__
is *not* defined. This means that proposed patch would disable the probe
and consequently Altivec code even on Altivec-capable processors.
Altivec works without __ALTIVEC__ defined at compile time, because all
Altivec code resides in assembly modules. Well, "modules" is not really
correct, because there is only one for the moment, the
Vector-Permutation AES.

Anyway, if there is problem besides when debugging, then describe it. If
 solution is indeed required, then sysctl should be the way to go about
fixing it.

> -----------------------------------------------------------------------------------
> diff -urNd openssl-1.0.1i.orig/crypto/ppccap.c openssl-1.0.1i/crypto/ppccap.c
> --- openssl-1.0.1i.orig/crypto/ppccap.c	2014-08-07 06:10:56.000000000 +0900
> +++ openssl-1.0.1i/crypto/ppccap.c	2014-09-30 20:20:05.000000000 +0900
>  <at>  <at>  -84,6 +84,10  <at>  <at> 
>  
>  	OPENSSL_ppccap_P = 0;
>  
(Continue reading)

Timo Teräs via RT | 1 Oct 13:47 2014
Picon

Re: [openssl.org #3505] rewrite c_rehash in C

Updated c_rehash.c based on feed back from the mailing list and Rich
Salz.

Would be nice to get additional review. It is not yet a patch for
'openssl' subcommand, but this is on the list. Just posting current
progress.

Most notable changes:
- removal of non-portable libc usage such as fnmatch(), *at() functions
- atomic update of hashes is improved, but not perfect. as in symlinks
  are only touched if needed. trying to take care of all possible race
  conditions looks tricky, and probably not feasible.
- added 'crt' extension
- hashes only files containing exactly one certificate or CRL. this is
  1) because openssl library core checks only the first cert, and 2)
  distros often create ca-certificates.crt containing *all*
  certs, as this now matches known extensions it would easily make
  openssl core break when all symlinks point to this same .crt file.
- code to create old style hashes is there, but no command line switch
  parsing is added yet

also some other minor cleanups and changes are done too.

Feedback would be appreciated.

Thanks,
Timo

Attachment (c_rehash.c): text/x-c++src, 8 KiB
(Continue reading)

David Leon Gil via RT | 1 Oct 09:09 2014
Picon

[openssl.org #3554] [PATCH] aesni-x86_64.pl: zeroize registers, Win64 ABI fix

All internal exports: Zeroize XMM registers that may contain secret
data before returning. (At 4x pxors per cycle, the overhead is
negligible.)

_ctr32: Zeroize $key0 and $ctr.

aesni_ecb_encrypt: If $win64, saves and restores xmm registers with
callee-save status under the Win64 ABI. The code is adapted fairly
directly from _ctr32.

The Win64 fix is untested! I don't have a Windows development machine
at the moment.

(By the way, is OPENSSL_wipe_cpu used or tested anywhere?)


Gmane