Dmitry Eremin-Solenikov | 5 Sep 00:14 2014
Picon

Libgcrypt and libgmp

Hello,

I was looking on how to improve the performance of public key
implementation in libgcrypt.
One of ideas was to check if using libgmp can improve the situation.
Nowdays GMP is
dual-licensed under GPLv2+ or LGPLv3+. As far as I understand, this
licensing is compatible
with libgcrypt LGPLv2.1+.

Surprisingly even replacing several asm-coded functions resulted in
nearly 20-25% speed
increase (according to tests/benchmark pubkey). Do such patches have a
chance to be
reviewed & accepted, or it is a waste of time, as you would prefer to
keep libgcrypt
independent of libgmp. I'm not changing the gcry_mpi_t internals, or
removing secure
allocation/reallocation, only replacing the computational code.

Before:
Algorithm         generate 1000*priv 1000*public
------------------------------------------------
RSA 1024 bit          10ms     770ms        40ms
RSA 2048 bit         320ms    4850ms       130ms
RSA 3072 bit        1460ms   13640ms       220ms
RSA 4096 bit        1690ms   31830ms       390ms
ELG 1024 bit             -    1360ms      1190ms
ELG 2048 bit             -    4840ms      5260ms
ELG 3072 bit             -   10050ms     11400ms
(Continue reading)

And Sch | 3 Sep 17:54 2014

Re: [PATCH 1/1] whirlpool hash amd64 assembly

Sorry, I hit the wrong button when replying. Here are benchmarks on the Atom system:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz

original, no patches:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     63.40 ns/B     15.04 MiB/s         - c/B

my C only optimization:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     46.21 ns/B     20.64 MiB/s         - c/B

my edited GCC x64 assembly:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     29.29 ns/B     32.56 MiB/s         - c/B

the SSE assembly by Jussi Kivilinna:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     41.19 ns/B     23.15 MiB/s         - c/B

It is weird that the SSE assembly is much faster than the non-SSE on the i5, but unexpectedly slower on the
Atom system. The bswap does not explain the difference because I also tested an SSE version with bswap
removed with same results.

-Andrei
> -----Original Message-----
(Continue reading)

by Werner Koch | 3 Sep 08:54 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-113-g8b960a8

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  8b960a807d168000d2690897a7634bd384ac1346 (commit)
      from  8a2a328742012a7c528dd007437185e4584c1e48 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 8b960a807d168000d2690897a7634bd384ac1346
Author: Werner Koch <wk <at> gnupg.org>
Date:   Wed Sep 3 08:53:43 2014 +0200

    Add a constant for a forthcoming new RNG.

    * src/gcrypt.h.in (GCRYCTL_DRBG_REINIT): New constant.

diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in
index 9d64b22..65d9ef6 100644
--- a/src/gcrypt.h.in
+++ b/src/gcrypt.h.in
 <at>  <at>  -330,7 +330,8  <at>  <at>  enum gcry_ctl_cmds
     GCRYCTL_CLOSE_RANDOM_DEVICE = 70,
     GCRYCTL_INACTIVATE_FIPS_FLAG = 71,
     GCRYCTL_REACTIVATE_FIPS_FLAG = 72,
-    GCRYCTL_SET_SBOX = 73
(Continue reading)

by Jussi Kivilinna | 2 Sep 19:42 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-112-g8a2a328

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  8a2a328742012a7c528dd007437185e4584c1e48 (commit)
      from  5eec04a43e6c562e956353449be931dd43dfe1cc (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 8a2a328742012a7c528dd007437185e4584c1e48
Author: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
Date:   Tue Sep 2 20:40:07 2014 +0300

    Add new Poly1305 MAC test vectors

    * tests/basic.c (check_mac): Add new test vectors for Poly1305 MAC.
    --

    Patch adds new test vectors for Poly1305 MAC from Internet Draft
    draft-irtf-cfrg-chacha20-poly1305-01.

    Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>

diff --git a/tests/basic.c b/tests/basic.c
index 6d70cfd..e406db4 100644
--- a/tests/basic.c
(Continue reading)

Jussi Kivilinna | 2 Sep 17:06 2014
Picon
Picon

Re: [PATCH 1/1] whirlpool hash amd64 assembly

On 02/09/14 04:02, And Sch wrote:
> That is very impressive. The goal is accomplished then, I just wanted a faster whirlpool hash in gnupg. I'm
no good with assembly, so I have no hope of doing better than the compiler. You may want to title the assembly
as sse-amd64 now.
> 
> Thanks

Did you have change to run the implementation on Atom? I'd be very interested to know how's the performance there.

-Jussi

ps. Please keep mailing-list in CC.

> 
>> -----Original Message-----
>> From: jussi.kivilinna <at> iki.fi
>> Sent: Mon, 01 Sep 2014 19:15:03 +0300
>> To: gcrypt-devel <at> gnupg.org
>> Subject: Re: [PATCH 1/1] whirlpool hash amd64 assembly
>>
>> On 29/08/14 18:45, And Sch wrote:
>> <snip>
>>>
>>> That is more than twice as fast as the original on the Atom system.
>>>
>>> I tried to find a way to use macros to sort out parts of the loop, but
>>> any change in the order of the instructions slows it down a lot. There
>>> are also only 7 registers available at one time in most parts of the
>>> loop, so that makes macros and rearrangements even more difficult.
>>>
(Continue reading)

by Werner Koch | 2 Sep 09:26 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-111-g5eec04a

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  5eec04a43e6c562e956353449be931dd43dfe1cc (commit)
      from  708a3a72cc0608ed4a38ff78d8843c1b46ebf633 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 5eec04a43e6c562e956353449be931dd43dfe1cc
Author: Werner Koch <wk <at> gnupg.org>
Date:   Tue Sep 2 09:25:20 2014 +0200

    asm: Allow building x86 and amd64 using old compilers.

    * src/hwf-x86.c (get_xgetbv): Build only if AVX support is enabled.
    --

    Old as(1) versions do not support the xgetvb instruction.  Thus build
    this function only if asm support has been requested.

    GnuPG-bug-id: 1708

diff --git a/src/hwf-x86.c b/src/hwf-x86.c
index 0591b4f..7ee246d 100644
--- a/src/hwf-x86.c
(Continue reading)

by Werner Koch | 1 Sep 11:40 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-110-g708a3a7

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  708a3a72cc0608ed4a38ff78d8843c1b46ebf633 (commit)
      from  db3c0286bf159568aa315d15f9708fe2de02b022 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 708a3a72cc0608ed4a38ff78d8843c1b46ebf633
Author: Werner Koch <wk <at> gnupg.org>
Date:   Mon Sep 1 11:40:31 2014 +0200

    Add DCO entries for Andrei Scherer and Stefan Mueller.

    --

diff --git a/AUTHORS b/AUTHORS
index 2c92998..860dea2 100644
--- a/AUTHORS
+++ b/AUTHORS
 <at>  <at>  -136,6 +136,9  <at>  <at>  phcoder <at> gmail.com
 Authors with a DCO
 ==================

+Andrei Scherer <andsch <at> inbox.com>
(Continue reading)

And Sch | 29 Aug 17:45 2014

[PATCH 1/1] whirlpool hash amd64 assembly

* cipher/whirlpool.c (whirlpool_transform, sbox, added macros): Added macros to support little endian AMD64 assembly implementation. Added prototype for assembly function and wrapped transform function in macro.
* cipher/whirlpool-amd64.S (_gcry_whirlpool_transform_amd64): Originally generated by gcc with optimization options, I've cleaned it up a bit.
* configure: Added build option for AMD64 assembly implementation.
* configure.ac: Added build option for AMD64 assembly implementation.
--

Benchmark on different systems:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     63.40 ns/B     15.04 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     29.29 ns/B     32.56 MiB/s         - c/B


Intel(R) Core(TM) i5-4670 CPU  <at>  3.40GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      7.75 ns/B     123.0 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      5.91 ns/B     161.3 MiB/s         - c/B

That is more than twice as fast as the original on the Atom system.
(Continue reading)

by Werner Koch | 29 Aug 14:54 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-109-gdb3c028

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  db3c0286bf159568aa315d15f9708fe2de02b022 (commit)
      from  e606d5f1bada1f2d21faeedd3fa2cf2dca7b274c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit db3c0286bf159568aa315d15f9708fe2de02b022
Author: Werner Koch <wk <at> gnupg.org>
Date:   Fri Aug 29 14:54:11 2014 +0200

    mpi: Re-indent longlong.h.
    
    --
    Indenting the cpp statements should make longlong.h better readable.

diff --git a/mpi/longlong.h b/mpi/longlong.h
index 4f33937..db98e47 100644
--- a/mpi/longlong.h
+++ b/mpi/longlong.h
 <at>  <at>  -1,5 +1,6  <at>  <at> 
 /* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
-   Note: I added some stuff for use with gnupg
+   Note: This is the Libgcrypt version
(Continue reading)

And Sch | 28 Aug 20:02 2014

[PATCH 1/1] Improved whirlpool hash performance

* cipher/whirlpool.c (whirlpool_transform, sbox, added macro): Added macro and rearranged round
function to alternate between reading to and writing from different state and key variables. Two
whirlpool_context_t variables removed, two were replaced, the sizes of state and key doubled, so
overall the burn stack stays the same. buffer_to_block and block_xor were combined into one operation.
The sbox was converted to one large table, because it is faster than many small tables.
--

Benchmark on different systems:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     63.40 ns/B     15.04 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     46.21 ns/B     20.64 MiB/s         - c/B

Intel(R) Core(TM) i5-4670 CPU  <at>  3.40GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      7.75 ns/B     123.0 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      6.70 ns/B     142.3 MiB/s         - c/B

This one actually shows greater improvement on the Atom system.
(Continue reading)

Jussi Kivilinna | 28 Aug 18:35 2014
Picon
Picon

[PATCH] Add new Poly1305 MAC test vectors

* tests/basic.c (check_mac): Add new test vectors for Poly1305 MAC.
--

Patch adds new test vectors for Poly1305 MAC from Internet Draft
draft-irtf-cfrg-chacha20-poly1305-01.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
---
 tests/basic.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/tests/basic.c b/tests/basic.c
index 6d70cfd..e406db4 100644
--- a/tests/basic.c
+++ b/tests/basic.c
 <at>  <at>  -6008,6 +6008,72  <at>  <at>  check_mac (void)
         "\xf3\x47\x7e\x7c\xd9\x54\x17\xaf\x89\xa6\xb8\x79\x4c\x31\x0c\xf0",
         NULL,
         0, 32 },
+      /* draft-irtf-cfrg-chacha20-poly1305-01 */
+      /* TV#5 */
+      { GCRY_MAC_POLY1305,
+        "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF",
+        "\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        NULL,
+        16, 32 },
+      /* TV#6 */
+      { GCRY_MAC_POLY1305,
(Continue reading)


Gmane