Jussi Kivilinna | 2 Sep 17:06 2014
Picon
Picon

Re: [PATCH 1/1] whirlpool hash amd64 assembly

On 02/09/14 04:02, And Sch wrote:
> That is very impressive. The goal is accomplished then, I just wanted a faster whirlpool hash in gnupg. I'm
no good with assembly, so I have no hope of doing better than the compiler. You may want to title the assembly
as sse-amd64 now.
> 
> Thanks

Did you have change to run the implementation on Atom? I'd be very interested to know how's the performance there.

-Jussi

ps. Please keep mailing-list in CC.

> 
>> -----Original Message-----
>> From: jussi.kivilinna <at> iki.fi
>> Sent: Mon, 01 Sep 2014 19:15:03 +0300
>> To: gcrypt-devel <at> gnupg.org
>> Subject: Re: [PATCH 1/1] whirlpool hash amd64 assembly
>>
>> On 29/08/14 18:45, And Sch wrote:
>> <snip>
>>>
>>> That is more than twice as fast as the original on the Atom system.
>>>
>>> I tried to find a way to use macros to sort out parts of the loop, but
>>> any change in the order of the instructions slows it down a lot. There
>>> are also only 7 registers available at one time in most parts of the
>>> loop, so that makes macros and rearrangements even more difficult.
>>>
(Continue reading)

by Werner Koch | 2 Sep 09:26 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-111-g5eec04a

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  5eec04a43e6c562e956353449be931dd43dfe1cc (commit)
      from  708a3a72cc0608ed4a38ff78d8843c1b46ebf633 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 5eec04a43e6c562e956353449be931dd43dfe1cc
Author: Werner Koch <wk <at> gnupg.org>
Date:   Tue Sep 2 09:25:20 2014 +0200

    asm: Allow building x86 and amd64 using old compilers.

    * src/hwf-x86.c (get_xgetbv): Build only if AVX support is enabled.
    --

    Old as(1) versions do not support the xgetvb instruction.  Thus build
    this function only if asm support has been requested.

    GnuPG-bug-id: 1708

diff --git a/src/hwf-x86.c b/src/hwf-x86.c
index 0591b4f..7ee246d 100644
--- a/src/hwf-x86.c
(Continue reading)

by Werner Koch | 1 Sep 11:40 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-110-g708a3a7

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  708a3a72cc0608ed4a38ff78d8843c1b46ebf633 (commit)
      from  db3c0286bf159568aa315d15f9708fe2de02b022 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 708a3a72cc0608ed4a38ff78d8843c1b46ebf633
Author: Werner Koch <wk <at> gnupg.org>
Date:   Mon Sep 1 11:40:31 2014 +0200

    Add DCO entries for Andrei Scherer and Stefan Mueller.

    --

diff --git a/AUTHORS b/AUTHORS
index 2c92998..860dea2 100644
--- a/AUTHORS
+++ b/AUTHORS
 <at>  <at>  -136,6 +136,9  <at>  <at>  phcoder <at> gmail.com
 Authors with a DCO
 ==================

+Andrei Scherer <andsch <at> inbox.com>
(Continue reading)

And Sch | 29 Aug 17:45 2014

[PATCH 1/1] whirlpool hash amd64 assembly

* cipher/whirlpool.c (whirlpool_transform, sbox, added macros): Added macros to support little endian AMD64 assembly implementation. Added prototype for assembly function and wrapped transform function in macro.
* cipher/whirlpool-amd64.S (_gcry_whirlpool_transform_amd64): Originally generated by gcc with optimization options, I've cleaned it up a bit.
* configure: Added build option for AMD64 assembly implementation.
* configure.ac: Added build option for AMD64 assembly implementation.
--

Benchmark on different systems:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     63.40 ns/B     15.04 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     29.29 ns/B     32.56 MiB/s         - c/B


Intel(R) Core(TM) i5-4670 CPU  <at>  3.40GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      7.75 ns/B     123.0 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      5.91 ns/B     161.3 MiB/s         - c/B

That is more than twice as fast as the original on the Atom system.
(Continue reading)

by Werner Koch | 29 Aug 14:54 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-109-gdb3c028

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  db3c0286bf159568aa315d15f9708fe2de02b022 (commit)
      from  e606d5f1bada1f2d21faeedd3fa2cf2dca7b274c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit db3c0286bf159568aa315d15f9708fe2de02b022
Author: Werner Koch <wk <at> gnupg.org>
Date:   Fri Aug 29 14:54:11 2014 +0200

    mpi: Re-indent longlong.h.
    
    --
    Indenting the cpp statements should make longlong.h better readable.

diff --git a/mpi/longlong.h b/mpi/longlong.h
index 4f33937..db98e47 100644
--- a/mpi/longlong.h
+++ b/mpi/longlong.h
 <at>  <at>  -1,5 +1,6  <at>  <at> 
 /* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
-   Note: I added some stuff for use with gnupg
+   Note: This is the Libgcrypt version
(Continue reading)

And Sch | 28 Aug 20:02 2014

[PATCH 1/1] Improved whirlpool hash performance

* cipher/whirlpool.c (whirlpool_transform, sbox, added macro): Added macro and rearranged round
function to alternate between reading to and writing from different state and key variables. Two
whirlpool_context_t variables removed, two were replaced, the sizes of state and key doubled, so
overall the burn stack stays the same. buffer_to_block and block_xor were combined into one operation.
The sbox was converted to one large table, because it is faster than many small tables.
--

Benchmark on different systems:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     63.40 ns/B     15.04 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     46.21 ns/B     20.64 MiB/s         - c/B

Intel(R) Core(TM) i5-4670 CPU  <at>  3.40GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      7.75 ns/B     123.0 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      6.70 ns/B     142.3 MiB/s         - c/B

This one actually shows greater improvement on the Atom system.
(Continue reading)

Jussi Kivilinna | 28 Aug 18:35 2014
Picon
Picon

[PATCH] Add new Poly1305 MAC test vectors

* tests/basic.c (check_mac): Add new test vectors for Poly1305 MAC.
--

Patch adds new test vectors for Poly1305 MAC from Internet Draft
draft-irtf-cfrg-chacha20-poly1305-01.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
---
 tests/basic.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/tests/basic.c b/tests/basic.c
index 6d70cfd..e406db4 100644
--- a/tests/basic.c
+++ b/tests/basic.c
 <at>  <at>  -6008,6 +6008,72  <at>  <at>  check_mac (void)
         "\xf3\x47\x7e\x7c\xd9\x54\x17\xaf\x89\xa6\xb8\x79\x4c\x31\x0c\xf0",
         NULL,
         0, 32 },
+      /* draft-irtf-cfrg-chacha20-poly1305-01 */
+      /* TV#5 */
+      { GCRY_MAC_POLY1305,
+        "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF",
+        "\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        NULL,
+        16, 32 },
+      /* TV#6 */
+      { GCRY_MAC_POLY1305,
(Continue reading)

And Sch | 28 Aug 19:45 2014

[PATCH 1/1] Improved ripemd160 performance

* cipher/rmd160.c (transform): Interleave the left and right lane rounds to introduce more instruction
level parallelism.
--

The benchmarks on different systems:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |     13.07 ns/B     72.97 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |     11.37 ns/B     83.84 MiB/s         - c/B

Intel(R) Core(TM) i5-4670 CPU  <at>  3.40GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |      3.31 ns/B     288.0 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |      2.08 ns/B     458.5 MiB/s         - c/B

Signed-off-by: Andrei Scherer <andsch <at> inbox.com>

---

(Continue reading)

Stephan Mueller | 25 Aug 12:37 2014
Picon

Cipher FIPS flag enforcement

Hi,

all of the the cipher definitions contain and define the flag

	unsigned int fips:1;

in their _spec_t types.

Up to 1.4.x, that field was enforced in the cipher init functions (i.e. 
if the FIPS mode is set, only fips=1 ciphers are allowed).

The random/ code still contains such logic. But all other fips flag 
enforcement code is gone.

Is this intentional? Note, FIPS can live without such restrictions, but 
then why keep the fips flag lingering?

Ciao
Stephan
Stephan Mueller | 22 Aug 22:45 2014
Picon

DCO signed

Libgcrypt Developer's Certificate of Origin.  Version 1.0
=========================================================

By making a contribution to the Libgcrypt project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the free software license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the
    best of my knowledge, is covered under an appropriate free
    software license and I have the right under that license to
    submit that work with modifications, whether created in whole
    or in part by me, under the same free software license
    (unless I am permitted to submit under a different license),
    as indicated in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including
    all personal information I submit with it, including my
    sign-off) is maintained indefinitely and may be redistributed
    consistent with this project or the free software license(s)
    involved.

Signed-off-by: Stephan Mueller <smueller <at> chronox.de>
(Continue reading)

And Sch | 22 Aug 18:32 2014

[PATCH 1/1] Improved ripemd160 performance

Here is the ripemd160 performance patch, signed.
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |      3.31 ns/B     288.0 MiB/s         - c/B

after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 RIPEMD160      |      2.08 ns/B     458.5 MiB/s         - c/B

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

diff -ruNp libgcrypt-1.6.2/cipher/rmd160.c libgcrypt-1.6.3/cipher/rmd160.c
- --- libgcrypt-1.6.2/cipher/rmd160.c	2014-08-21 07:50:39.000000000 -0500
+++ libgcrypt-1.6.3/cipher/rmd160.c	2014-08-21 17:15:13.678664524 -0500
 <at>  <at>  -178,8 +178,7  <at>  <at>  static unsigned int
 transform ( void *ctx, const unsigned char *data )
 {
   RMD160_CONTEXT *hd = ctx;
- -  register u32 a,b,c,d,e;
- -  u32 aa,bb,cc,dd,ee,t;
+  register u32 al, ar, bl, br, cl, cr, dl, dr, el, er;
   u32 x[16];
   int i;

 <at>  <at>  -201,196 +200,186  <at>  <at>  transform ( void *ctx, const unsigned ch
 #define F2(x,y,z)   ( ((x) | ~(y)) ^ (z) )
 #define F3(x,y,z)   ( ((x) & (z)) | ((y) & ~(z)) )
(Continue reading)


Gmane