Chris Ballinger | 18 Sep 02:42 2014

gen-posix-lock-obj for iOS

I had trouble compiling libgpg-error 1.15 for iOS and needed to apply a patch, but I read that it was already committed upstream.

When cross-compiling for arm-apple-darwin and aarch64-apple-darwin I also needed to generate these files, so here they are. I made a little iOS utility to help people generate them in case Apple adds any more architectures in the future: https://github.com/chrisballinger/gen-posix-lock-obj-iOS

Cheers!
_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
Dmitry Eremin-Solenikov | 5 Sep 00:14 2014
Picon

Libgcrypt and libgmp

Hello,

I was looking on how to improve the performance of public key
implementation in libgcrypt.
One of ideas was to check if using libgmp can improve the situation.
Nowdays GMP is
dual-licensed under GPLv2+ or LGPLv3+. As far as I understand, this
licensing is compatible
with libgcrypt LGPLv2.1+.

Surprisingly even replacing several asm-coded functions resulted in
nearly 20-25% speed
increase (according to tests/benchmark pubkey). Do such patches have a
chance to be
reviewed & accepted, or it is a waste of time, as you would prefer to
keep libgcrypt
independent of libgmp. I'm not changing the gcry_mpi_t internals, or
removing secure
allocation/reallocation, only replacing the computational code.

Before:
Algorithm         generate 1000*priv 1000*public
------------------------------------------------
RSA 1024 bit          10ms     770ms        40ms
RSA 2048 bit         320ms    4850ms       130ms
RSA 3072 bit        1460ms   13640ms       220ms
RSA 4096 bit        1690ms   31830ms       390ms
ELG 1024 bit             -    1360ms      1190ms
ELG 2048 bit             -    4840ms      5260ms
ELG 3072 bit             -   10050ms     11400ms
DSA 1024/160             -     560ms       690ms
DSA 2048/224             -    2060ms      2860ms
DSA 3072/256             -    4220ms      5890ms
ECDSA 192 bit          0ms    1270ms      2170ms
ECDSA 224 bit         10ms    2060ms      3800ms
ECDSA 256 bit          0ms    1800ms      3200ms
ECDSA 384 bit         20ms    3770ms      6830ms
ECDSA 521 bit         30ms    8990ms     16490ms
EdDSA Ed25519          0ms    4010ms      5650ms
GOST  256 bit         10ms    1730ms      3270ms
GOST  512 bit         30ms    8030ms     15250ms

After:
Algorithm         generate 1000*priv 1000*public
------------------------------------------------
RSA 1024 bit          10ms     550ms        30ms
RSA 2048 bit         100ms    3270ms        80ms
RSA 3072 bit         120ms    8980ms       150ms
RSA 4096 bit         850ms   21110ms       250ms
ELG 1024 bit             -    1020ms       850ms
ELG 2048 bit             -    3400ms      3530ms
ELG 3072 bit             -    6960ms      7850ms
DSA 1024/160             -     380ms       470ms
DSA 2048/224             -    1390ms      1850ms
DSA 3072/256             -    2870ms      4030ms
ECDSA 192 bit          0ms    1200ms      2100ms
ECDSA 224 bit         10ms    1860ms      3470ms
ECDSA 256 bit         10ms    1730ms      3070ms
ECDSA 384 bit         10ms    3150ms      5800ms
ECDSA 521 bit         30ms    6880ms     13090ms
EdDSA Ed25519         10ms    3550ms      5110ms
GOST  256 bit          0ms    1680ms      3200ms
GOST  512 bit         20ms    6400ms     12590ms

--

-- 
With best wishes
Dmitry
And Sch | 3 Sep 17:54 2014

Re: [PATCH 1/1] whirlpool hash amd64 assembly

Sorry, I hit the wrong button when replying. Here are benchmarks on the Atom system:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz

original, no patches:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     63.40 ns/B     15.04 MiB/s         - c/B

my C only optimization:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     46.21 ns/B     20.64 MiB/s         - c/B

my edited GCC x64 assembly:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     29.29 ns/B     32.56 MiB/s         - c/B

the SSE assembly by Jussi Kivilinna:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     41.19 ns/B     23.15 MiB/s         - c/B

It is weird that the SSE assembly is much faster than the non-SSE on the i5, but unexpectedly slower on the
Atom system. The bswap does not explain the difference because I also tested an SSE version with bswap
removed with same results.

-Andrei
> -----Original Message-----
> From: jussi.kivilinna <at> iki.fi
> Sent: Tue, 02 Sep 2014 18:06:28 +0300
> To: andsch <at> inbox.com
> Subject: Re: [PATCH 1/1] whirlpool hash amd64 assembly
> 
> On 02/09/14 04:02, And Sch wrote:
>> That is very impressive. The goal is accomplished then, I just wanted a
>> faster whirlpool hash in gnupg. I'm no good with assembly, so I have no
>> hope of doing better than the compiler. You may want to title the
>> assembly as sse-amd64 now.
>> 
>> Thanks
> 
> Did you have change to run the implementation on Atom? I'd be very
> interested to know how's the performance there.
> 
> -Jussi
> 
> ps. Please keep mailing-list in CC.
> 
>> 
>>> -----Original Message-----
>>> From: jussi.kivilinna <at> iki.fi
>>> Sent: Mon, 01 Sep 2014 19:15:03 +0300
>>> To: gcrypt-devel <at> gnupg.org
>>> Subject: Re: [PATCH 1/1] whirlpool hash amd64 assembly
>>> 
>>> On 29/08/14 18:45, And Sch wrote:
>>> <snip>
>>>> 
>>>> That is more than twice as fast as the original on the Atom system.
>>>> 
>>>> I tried to find a way to use macros to sort out parts of the loop, but
>>>> any change in the order of the instructions slows it down a lot. There
>>>> are also only 7 registers available at one time in most parts of the
>>>> loop, so that makes macros and rearrangements even more difficult.
>>>> 
>>>> I used a little endian version of the last patch I posted and gcc
>>>> -funroll-loops to generate this assembly. I've looked through it and
>>>> tried to organize it as best I can. Suggestions on how to clean it up
>>>> further would be helpful.
>>>> 
>>> 
>>> I don't agree that this is good method for creating assembly
>>> implementations. As I see it, the main point with assembly
>>> implementations is that you can do optimizations that compiler has no
>>> way
>>> of finding. For example, you could load indexes to rax/rbx/rcx/rdx
>>> registers that allow extracting not only first index byte but also
>>> second
>>> byte with just one instruction. Or, use XMM registers to store the
>>> key[]
>>> and state[] arrays instead of stack.
>>> 
>>> Well, I ended up making such implementation, which I've attached. On
>>> Intel i5-4570 (3.6 Ghz turbo), I get:
>>> 
>>>> tests/bench-slope --cpu-mhz 3600 hash whirlpool
>>> Hash:
>>>                 |  nanosecs/byte   mebibytes/sec   cycles/byte
>>>  WHIRLPOOL      |      4.28 ns/B     222.7 MiB/s     15.42 c/B
>>> 
>>> -Jussi
>>> 
>>> _______________________________________________
>>> Gcrypt-devel mailing list
>>> Gcrypt-devel <at> gnupg.org
>>> http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
>> 
>> ____________________________________________________________
>> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
>> Check it out at http://www.inbox.com/earth
>> 
>> 
>>

____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
Visit http://www.inbox.com/photosharing to find out more!
by Werner Koch | 3 Sep 08:54 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-113-g8b960a8

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  8b960a807d168000d2690897a7634bd384ac1346 (commit)
      from  8a2a328742012a7c528dd007437185e4584c1e48 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 8b960a807d168000d2690897a7634bd384ac1346
Author: Werner Koch <wk <at> gnupg.org>
Date:   Wed Sep 3 08:53:43 2014 +0200

    Add a constant for a forthcoming new RNG.

    * src/gcrypt.h.in (GCRYCTL_DRBG_REINIT): New constant.

diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in
index 9d64b22..65d9ef6 100644
--- a/src/gcrypt.h.in
+++ b/src/gcrypt.h.in
 <at>  <at>  -330,7 +330,8  <at>  <at>  enum gcry_ctl_cmds
     GCRYCTL_CLOSE_RANDOM_DEVICE = 70,
     GCRYCTL_INACTIVATE_FIPS_FLAG = 71,
     GCRYCTL_REACTIVATE_FIPS_FLAG = 72,
-    GCRYCTL_SET_SBOX = 73
+    GCRYCTL_SET_SBOX = 73,
+    GCRYCTL_DRBG_REINIT = 74
   };

 /* Perform various operations defined by CMD. */

-----------------------------------------------------------------------

Summary of changes:
 src/gcrypt.h.in |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
by Jussi Kivilinna | 2 Sep 19:42 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-112-g8a2a328

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  8a2a328742012a7c528dd007437185e4584c1e48 (commit)
      from  5eec04a43e6c562e956353449be931dd43dfe1cc (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 8a2a328742012a7c528dd007437185e4584c1e48
Author: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
Date:   Tue Sep 2 20:40:07 2014 +0300

    Add new Poly1305 MAC test vectors

    * tests/basic.c (check_mac): Add new test vectors for Poly1305 MAC.
    --

    Patch adds new test vectors for Poly1305 MAC from Internet Draft
    draft-irtf-cfrg-chacha20-poly1305-01.

    Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>

diff --git a/tests/basic.c b/tests/basic.c
index 6d70cfd..e406db4 100644
--- a/tests/basic.c
+++ b/tests/basic.c
 <at>  <at>  -6008,6 +6008,72  <at>  <at>  check_mac (void)
         "\xf3\x47\x7e\x7c\xd9\x54\x17\xaf\x89\xa6\xb8\x79\x4c\x31\x0c\xf0",
         NULL,
         0, 32 },
+      /* draft-irtf-cfrg-chacha20-poly1305-01 */
+      /* TV#5 */
+      { GCRY_MAC_POLY1305,
+        "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF",
+        "\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        NULL,
+        16, 32 },
+      /* TV#6 */
+      { GCRY_MAC_POLY1305,
+        "\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF",
+        "\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        NULL,
+        16, 32 },
+      /* TV#7 */
+      { GCRY_MAC_POLY1305,
+        "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"
+        "\xF0\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"
+        "\x11\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        NULL,
+        48, 32 },
+      /* TV#8 */
+      { GCRY_MAC_POLY1305,
+        "\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"
+        "\xFB\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE\xFE"
+        "\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01",
+        "\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        NULL,
+        48, 32 },
+      /* TV#9 */
+      { GCRY_MAC_POLY1305,
+        "\xFD\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF",
+        "\x02\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\xFA\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF",
+        NULL,
+        16, 32 },
+      /* TV#10 */
+      { GCRY_MAC_POLY1305,
+        "\xE3\x35\x94\xD7\x50\x5E\x43\xB9\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x33\x94\xD7\x50\x5E\x43\x79\xCD\x01\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x01\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x14\x00\x00\x00\x00\x00\x00\x00\x55\x00\x00\x00\x00\x00\x00\x00",
+        NULL,
+        64, 32 },
+      /* TV#11 */
+      { GCRY_MAC_POLY1305,
+        "\xE3\x35\x94\xD7\x50\x5E\x43\xB9\x00\x00\x00\x00\x00\x00\x00\x00"
+        "\x33\x94\xD7\x50\x5E\x43\x79\xCD\x01\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x01\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00"
+        "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        "\x13\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
+        NULL,
+        48, 32 },
       /* from http://cr.yp.to/mac/poly1305-20050329.pdf */
       { GCRY_MAC_POLY1305,
         "\xf3\xf6",

-----------------------------------------------------------------------

Summary of changes:
 tests/basic.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
Jussi Kivilinna | 2 Sep 17:06 2014
Picon
Picon

Re: [PATCH 1/1] whirlpool hash amd64 assembly

On 02/09/14 04:02, And Sch wrote:
> That is very impressive. The goal is accomplished then, I just wanted a faster whirlpool hash in gnupg. I'm
no good with assembly, so I have no hope of doing better than the compiler. You may want to title the assembly
as sse-amd64 now.
> 
> Thanks

Did you have change to run the implementation on Atom? I'd be very interested to know how's the performance there.

-Jussi

ps. Please keep mailing-list in CC.

> 
>> -----Original Message-----
>> From: jussi.kivilinna <at> iki.fi
>> Sent: Mon, 01 Sep 2014 19:15:03 +0300
>> To: gcrypt-devel <at> gnupg.org
>> Subject: Re: [PATCH 1/1] whirlpool hash amd64 assembly
>>
>> On 29/08/14 18:45, And Sch wrote:
>> <snip>
>>>
>>> That is more than twice as fast as the original on the Atom system.
>>>
>>> I tried to find a way to use macros to sort out parts of the loop, but
>>> any change in the order of the instructions slows it down a lot. There
>>> are also only 7 registers available at one time in most parts of the
>>> loop, so that makes macros and rearrangements even more difficult.
>>>
>>> I used a little endian version of the last patch I posted and gcc
>>> -funroll-loops to generate this assembly. I've looked through it and
>>> tried to organize it as best I can. Suggestions on how to clean it up
>>> further would be helpful.
>>>
>>
>> I don't agree that this is good method for creating assembly
>> implementations. As I see it, the main point with assembly
>> implementations is that you can do optimizations that compiler has no way
>> of finding. For example, you could load indexes to rax/rbx/rcx/rdx
>> registers that allow extracting not only first index byte but also second
>> byte with just one instruction. Or, use XMM registers to store the key[]
>> and state[] arrays instead of stack.
>>
>> Well, I ended up making such implementation, which I've attached. On
>> Intel i5-4570 (3.6 Ghz turbo), I get:
>>
>>> tests/bench-slope --cpu-mhz 3600 hash whirlpool
>> Hash:
>>                 |  nanosecs/byte   mebibytes/sec   cycles/byte
>>  WHIRLPOOL      |      4.28 ns/B     222.7 MiB/s     15.42 c/B
>>
>> -Jussi
>>
>> _______________________________________________
>> Gcrypt-devel mailing list
>> Gcrypt-devel <at> gnupg.org
>> http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
> 
> ____________________________________________________________
> FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!
> Check it out at http://www.inbox.com/earth
> 
> 
> 
by Werner Koch | 2 Sep 09:26 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-111-g5eec04a

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  5eec04a43e6c562e956353449be931dd43dfe1cc (commit)
      from  708a3a72cc0608ed4a38ff78d8843c1b46ebf633 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 5eec04a43e6c562e956353449be931dd43dfe1cc
Author: Werner Koch <wk <at> gnupg.org>
Date:   Tue Sep 2 09:25:20 2014 +0200

    asm: Allow building x86 and amd64 using old compilers.

    * src/hwf-x86.c (get_xgetbv): Build only if AVX support is enabled.
    --

    Old as(1) versions do not support the xgetvb instruction.  Thus build
    this function only if asm support has been requested.

    GnuPG-bug-id: 1708

diff --git a/src/hwf-x86.c b/src/hwf-x86.c
index 0591b4f..7ee246d 100644
--- a/src/hwf-x86.c
+++ b/src/hwf-x86.c
 <at>  <at>  -96,6 +96,7  <at>  <at>  get_cpuid(unsigned int in, unsigned int *eax, unsigned int *ebx,
     *edx = regs[3];
 }

+#if defined(ENABLE_AVX_SUPPORT) || defined(ENABLE_AVX2_SUPPORT)
 static unsigned int
 get_xgetbv(void)
 {
 <at>  <at>  -109,6 +110,7  <at>  <at>  get_xgetbv(void)

   return t_eax;
 }
+#endif /* ENABLE_AVX_SUPPORT || ENABLE_AVX2_SUPPORT */

 #endif /* i386 && GNUC */

 <at>  <at>  -145,6 +147,7  <at>  <at>  get_cpuid(unsigned int in, unsigned int *eax, unsigned int *ebx,
     *edx = regs[3];
 }

+#if defined(ENABLE_AVX_SUPPORT) || defined(ENABLE_AVX2_SUPPORT)
 static unsigned int
 get_xgetbv(void)
 {
 <at>  <at>  -158,6 +161,7  <at>  <at>  get_xgetbv(void)

   return t_eax;
 }
+#endif /* ENABLE_AVX_SUPPORT || ENABLE_AVX2_SUPPORT */

 #endif /* x86-64 && GNUC */

-----------------------------------------------------------------------

Summary of changes:
 src/hwf-x86.c |    4 ++++
 1 file changed, 4 insertions(+)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
by Werner Koch | 1 Sep 11:40 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-110-g708a3a7

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  708a3a72cc0608ed4a38ff78d8843c1b46ebf633 (commit)
      from  db3c0286bf159568aa315d15f9708fe2de02b022 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 708a3a72cc0608ed4a38ff78d8843c1b46ebf633
Author: Werner Koch <wk <at> gnupg.org>
Date:   Mon Sep 1 11:40:31 2014 +0200

    Add DCO entries for Andrei Scherer and Stefan Mueller.

    --

diff --git a/AUTHORS b/AUTHORS
index 2c92998..860dea2 100644
--- a/AUTHORS
+++ b/AUTHORS
 <at>  <at>  -136,6 +136,9  <at>  <at>  phcoder <at> gmail.com
 Authors with a DCO
 ==================

+Andrei Scherer <andsch <at> inbox.com>
+2014-0822:BF7CEF794F9.000003F0andsch <at> inbox.com:
+
 Christian Aistleitner <christian <at> quelltextlich.at>
 2013-02-26:20130226110144.GA12678 <at> quelltextlich.at:

 <at>  <at>  -163,6 +166,9  <at>  <at>  Rafaël Carré <funman <at> videolan.org>
 Sergey V. <sftp.mtuci <at> gmail.com>
 2013-11-07:2066221.5IYa7Yq760 <at> darkstar:

+Stephan Mueller <smueller <at> chronox.de>
+2014-08-22:2008899.25OeoelVVA <at> myon.chronox.de:
+
 Tomáš Mráz <tm <at> t8m.info>
 2012-04-16:1334571250.5056.52.camel <at> vespa.frost.loc:

-----------------------------------------------------------------------

Summary of changes:
 AUTHORS |    6 ++++++
 1 file changed, 6 insertions(+)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
And Sch | 29 Aug 17:45 2014

[PATCH 1/1] whirlpool hash amd64 assembly

* cipher/whirlpool.c (whirlpool_transform, sbox, added macros): Added macros to support little endian AMD64 assembly implementation. Added prototype for assembly function and wrapped transform function in macro.
* cipher/whirlpool-amd64.S (_gcry_whirlpool_transform_amd64): Originally generated by gcc with optimization options, I've cleaned it up a bit.
* configure: Added build option for AMD64 assembly implementation.
* configure.ac: Added build option for AMD64 assembly implementation.
--

Benchmark on different systems:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     63.40 ns/B     15.04 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     29.29 ns/B     32.56 MiB/s         - c/B


Intel(R) Core(TM) i5-4670 CPU  <at>  3.40GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      7.75 ns/B     123.0 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      5.91 ns/B     161.3 MiB/s         - c/B

That is more than twice as fast as the original on the Atom system.

I tried to find a way to use macros to sort out parts of the loop, but any change in the order of the instructions slows it down a lot. There are also only 7 registers available at one time in most parts of the loop, so that makes macros and rearrangements even more difficult.

I used a little endian version of the last patch I posted and gcc -funroll-loops to generate this assembly. I've looked through it and tried to organize it as best I can. Suggestions on how to clean it up further would be helpful.

Signed-off-by: Andrei Scherer <andsch <at> inbox.com>

---

diff -ruNp libgcrypt-1.6.2/cipher/whirlpool-amd64.S libgcrypt-1.6.3/cipher/whirlpool-amd64.S
--- libgcrypt-1.6.2/cipher/whirlpool-amd64.S	1969-12-31 18:00:00.000000000 -0600
+++ libgcrypt-1.6.3/cipher/whirlpool-amd64.S	2014-08-28 20:06:34.691538742 -0500
 <at>  <at>  -0,0 +1,1068  <at>  <at> 
+.p2align 4,,15
+.globl _gcry_whirlpool_transform_amd64
+.type	_gcry_whirlpool_transform_amd64,  <at> function
+_gcry_whirlpool_transform_amd64:
+	.cfi_startproc
+	pushq	%r15
+	.cfi_def_cfa_offset 16
+	.cfi_offset 15, -16
+	pushq	%r14
+	.cfi_def_cfa_offset 24
+	.cfi_offset 14, -24
+	pushq	%r13
+	.cfi_def_cfa_offset 32
+	.cfi_offset 13, -32
+	pushq	%r12
+	.cfi_def_cfa_offset 40
+	.cfi_offset 12, -40
+	pushq	%rbp
+	.cfi_def_cfa_offset 48
+	.cfi_offset 6, -48
+	pushq	%rbx
+	.cfi_def_cfa_offset 56
+	.cfi_offset 3, -56
+	subq	$152, %rsp
+	.cfi_def_cfa_offset 208
+
+	/* load hash (%rdi) */
+	/* store hash (%rdi) into key -(%rsp) */
+	/* xor block (%rsi) with hash (%rdi) */
+	/* store result into state +(%rsp) */
+	movq	(%rdi), %r15
+	movq	%r15, -112(%rsp)
+	xorq	(%rsi), %r15
+	movq	%r15, 16(%rsp)
+
+	movq	8(%rdi), %r13
+	movq	%r13, -104(%rsp)
+	xorq	8(%rsi), %r13
+	movq	%r13, 24(%rsp)
+
+	movq	16(%rdi), %rbp
+	movq	%rbp, -96(%rsp)
+	xorq	16(%rsi), %rbp
+	movq	%rbp, 32(%rsp)
+
+	movq	24(%rdi), %r11
+	movq	%r11, -88(%rsp)
+	xorq	24(%rsi), %r11
+	movq	%r11, 40(%rsp)
+
+	movq	32(%rdi), %r10
+	movq	%r10, -80(%rsp)
+	xorq	32(%rsi), %r10
+	movq	%r10, 48(%rsp)
+
+	movq	40(%rdi), %rcx
+	movq	%rcx, -72(%rsp)
+	xorq	40(%rsi), %rcx
+	movq	%rcx, 56(%rsp)
+
+	movq	48(%rdi), %rdx
+	movq	%rdx, -64(%rsp)
+	xorq	48(%rsi), %rdx
+	movq	%rdx, 64(%rsp)
+
+	movq	56(%rdi), %rax
+	movq	%rax, -56(%rsp)
+	xorq	56(%rsi), %rax
+	movq	%rax, 72(%rsp)
+
+	/* store result into hash (%rdi) */
+	movq	%r15, (%rdi)
+	movq	%r13, 8(%rdi)
+	movq	%rbp, 16(%rdi)
+	movq	%r11, 24(%rdi)
+	movq	%r10, 32(%rdi)
+	movq	%rcx, 40(%rdi)
+	movq	%rdx, 48(%rdi)
+	movq	%rax, 56(%rdi)
+
+	/* load first rc address */
+	leaq	rc(%rip), %r12
+
+	/* load last rc address */
+	leaq	80+rc(%rip), %rbp
+
+	/* load sbox address*/
+	leaq	sbox(%rip), %r8
+
+	/* zero alternator */
+	xorl	%r10d, %r10d
+
+.p2align 4,,10
+.p2align 3
+.Lroundloop:
+	/* save and flip alternator */
+	movl	%r10d, %edx
+	xorl	$1, %r10d
+	salq	$6, %rdx
+
+	/* 0 state, key */
+	/* load state +(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	/* load key -(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	movzbl	31(%rsp,%rdx), %ebx
+	movzbl	16(%rsp,%rdx), %ecx
+	movzbl	73(%rsp,%rdx), %eax
+	movzbl	66(%rsp,%rdx), %r9d
+	movq	14336(%r8,%rbx,8), %r11
+	movzbl	-97(%rsp,%rdx), %ebx
+	movzbl	59(%rsp,%rdx), %esi
+	movzbl	52(%rsp,%rdx), %r15d
+	xorq	(%r8,%rcx,8), %r11
+	movzbl	-112(%rsp,%rdx), %ecx
+	movzbl	45(%rsp,%rdx), %r14d
+	movzbl	38(%rsp,%rdx), %r13d
+	xorq	2048(%r8,%rax,8), %r11
+	movq	14336(%r8,%rbx,8), %rax
+	movzbl	-90(%rsp,%rdx), %ebx
+	xorq	4096(%r8,%r9,8), %r11
+	xorq	(%r8,%rcx,8), %rax
+	movzbl	-55(%rsp,%rdx), %r9d
+	xorq	6144(%r8,%rsi,8), %r11
+	movzbl	-62(%rsp,%rdx), %esi
+	xorq	2048(%r8,%r9,8), %rax
+	xorq	8192(%r8,%r15,8), %r11
+	movzbl	-69(%rsp,%rdx), %r15d
+	xorq	4096(%r8,%rsi,8), %rax
+	xorq	10240(%r8,%r14,8), %r11
+	movzbl	-76(%rsp,%rdx), %r14d
+	xorq	6144(%r8,%r15,8), %rax
+	xorq	12288(%r8,%r13,8), %r11
+	movzbl	-83(%rsp,%rdx), %r13d
+	xorq	8192(%r8,%r14,8), %rax
+	xorq	10240(%r8,%r13,8), %rax
+	xorq	12288(%r8,%rbx,8), %rax
+
+	/* xor rc with key */
+	xorq	(%r12), %rax
+
+	/* xor key with state */
+	xorq	%rax, %r11
+
+	/* this alternator is left unflipped */
+	/* movsxd in intel syntax */
+	movslq	%r10d, %rcx
+	salq	$6, %rcx
+
+	/* store key */
+	movq	%rax, -112(%rsp,%rcx)
+
+	/* store state */
+	movq	%r11, 16(%rsp,%rcx)
+
+	/* 1 state, key */
+	/* load state +(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	/* load key -(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	movzbl	39(%rsp,%rdx), %eax
+	movzbl	24(%rsp,%rdx), %r11d
+	movzbl	17(%rsp,%rdx), %esi
+	movzbl	74(%rsp,%rdx), %r15d
+	movq	14336(%r8,%rax,8), %r9
+	movzbl	67(%rsp,%rdx), %r14d
+	movzbl	60(%rsp,%rdx), %r13d
+	movzbl	53(%rsp,%rdx), %ebx
+	xorq	(%r8,%r11,8), %r9
+	movzbl	-89(%rsp,%rdx), %r11d
+	movzbl	46(%rsp,%rdx), %eax
+	xorq	2048(%r8,%rsi,8), %r9
+	movzbl	-104(%rsp,%rdx), %esi
+	xorq	4096(%r8,%r15,8), %r9
+	movq	14336(%r8,%r11,8), %r15
+	movzbl	-75(%rsp,%rdx), %r11d
+	xorq	6144(%r8,%r14,8), %r9
+	xorq	(%r8,%rsi,8), %r15
+	movzbl	-111(%rsp,%rdx), %r14d
+	movzbl	-82(%rsp,%rdx), %esi
+	xorq	8192(%r8,%r13,8), %r9
+	movzbl	-54(%rsp,%rdx), %r13d
+	xorq	2048(%r8,%r14,8), %r15
+	xorq	10240(%r8,%rbx,8), %r9
+	movzbl	-61(%rsp,%rdx), %ebx
+	xorq	4096(%r8,%r13,8), %r15
+	xorq	12288(%r8,%rax,8), %r9
+	movzbl	-68(%rsp,%rdx), %eax
+	xorq	6144(%r8,%rbx,8), %r15
+	xorq	8192(%r8,%rax,8), %r15
+	xorq	10240(%r8,%r11,8), %r15
+	xorq	12288(%r8,%rsi,8), %r15
+
+	/* xor key with state */
+	xorq	%r15, %r9
+
+	/* store key */
+	movq	%r15, -104(%rsp,%rcx)
+
+	/* store state */
+	movq	%r9, 24(%rsp,%rcx)
+
+	/* 2 state, key */
+	/* load state +(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	/* load key -(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	movzbl	47(%rsp,%rdx), %r9d
+	movzbl	32(%rsp,%rdx), %r15d
+	movzbl	25(%rsp,%rdx), %r13d
+	movzbl	18(%rsp,%rdx), %ebx
+	movq	14336(%r8,%r9,8), %r14
+	movzbl	75(%rsp,%rdx), %eax
+	movzbl	68(%rsp,%rdx), %r11d
+	movzbl	61(%rsp,%rdx), %esi
+	xorq	(%r8,%r15,8), %r14
+	movzbl	-81(%rsp,%rdx), %r15d
+	movzbl	54(%rsp,%rdx), %r9d
+	xorq	2048(%r8,%r13,8), %r14
+	movzbl	-96(%rsp,%rdx), %r13d
+	xorq	4096(%r8,%rbx,8), %r14
+	movzbl	-103(%rsp,%rdx), %ebx
+	xorq	6144(%r8,%rax,8), %r14
+	movq	14336(%r8,%r15,8), %rax
+	movzbl	-67(%rsp,%rdx), %r15d
+	xorq	8192(%r8,%r11,8), %r14
+	xorq	(%r8,%r13,8), %rax
+	movzbl	-110(%rsp,%rdx), %r11d
+	movzbl	-74(%rsp,%rdx), %r13d
+	xorq	10240(%r8,%rsi,8), %r14
+	xorq	2048(%r8,%rbx,8), %rax
+	movzbl	-53(%rsp,%rdx), %esi
+	xorq	12288(%r8,%r9,8), %r14
+	xorq	4096(%r8,%r11,8), %rax
+	movzbl	-60(%rsp,%rdx), %r9d
+	xorq	6144(%r8,%rsi,8), %rax
+	xorq	8192(%r8,%r9,8), %rax
+	xorq	10240(%r8,%r15,8), %rax
+	xorq	12288(%r8,%r13,8), %rax
+
+	/* xor key with state */
+	xorq	%rax, %r14
+
+	/* store key */
+	movq	%rax, -96(%rsp,%rcx)
+
+	/* store state */
+	movq	%r14, 32(%rsp,%rcx)
+
+	/* 3 state, key */
+	/* load state +(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	/* load key -(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	movzbl	55(%rsp,%rdx), %r14d
+	movzbl	40(%rsp,%rdx), %eax
+	movzbl	33(%rsp,%rdx), %ebx
+	movzbl	26(%rsp,%rdx), %r11d
+	movq	14336(%r8,%r14,8), %rsi
+	movzbl	19(%rsp,%rdx), %r9d
+	movzbl	76(%rsp,%rdx), %r15d
+	movzbl	69(%rsp,%rdx), %r13d
+	xorq	(%r8,%rax,8), %rsi
+	movzbl	-73(%rsp,%rdx), %eax
+	movzbl	62(%rsp,%rdx), %r14d
+	xorq	2048(%r8,%rbx,8), %rsi
+	movzbl	-88(%rsp,%rdx), %ebx
+	xorq	4096(%r8,%r11,8), %rsi
+	movq	14336(%r8,%rax,8), %r11
+	movzbl	-59(%rsp,%rdx), %eax
+	xorq	6144(%r8,%r9,8), %rsi
+	xorq	(%r8,%rbx,8), %r11
+	movzbl	-95(%rsp,%rdx), %r9d
+	movzbl	-66(%rsp,%rdx), %ebx
+	xorq	8192(%r8,%r15,8), %rsi
+	movzbl	-102(%rsp,%rdx), %r15d
+	xorq	2048(%r8,%r9,8), %r11
+	xorq	10240(%r8,%r13,8), %rsi
+	movzbl	-109(%rsp,%rdx), %r13d
+	xorq	4096(%r8,%r15,8), %r11
+	xorq	12288(%r8,%r14,8), %rsi
+	movzbl	-52(%rsp,%rdx), %r14d
+	xorq	6144(%r8,%r13,8), %r11
+	xorq	8192(%r8,%r14,8), %r11
+	xorq	10240(%r8,%rax,8), %r11
+	xorq	12288(%r8,%rbx,8), %r11
+
+	/* xor key with state */
+	xorq	%r11, %rsi
+
+	/* store key */
+	movq	%r11, -88(%rsp,%rcx)
+
+	/* store state */
+	movq	%rsi, 40(%rsp,%rcx)
+
+	/* 4 state, key */
+	/* load state +(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	/* load key -(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	movzbl	63(%rsp,%rdx), %esi
+	movzbl	48(%rsp,%rdx), %r11d
+	movzbl	41(%rsp,%rdx), %r15d
+	movzbl	34(%rsp,%rdx), %r13d
+	movq	14336(%r8,%rsi,8), %r9
+	movzbl	27(%rsp,%rdx), %r14d
+	movzbl	20(%rsp,%rdx), %eax
+	movzbl	77(%rsp,%rdx), %ebx
+	xorq	(%r8,%r11,8), %r9
+	movzbl	-65(%rsp,%rdx), %r11d
+	movzbl	70(%rsp,%rdx), %esi
+	xorq	2048(%r8,%r15,8), %r9
+	movzbl	-80(%rsp,%rdx), %r15d
+	xorq	4096(%r8,%r13,8), %r9
+	movq	14336(%r8,%r11,8), %r13
+	movzbl	-51(%rsp,%rdx), %r11d
+	xorq	6144(%r8,%r14,8), %r9
+	xorq	(%r8,%r15,8), %r13
+	movzbl	-87(%rsp,%rdx), %r14d
+	movzbl	-58(%rsp,%rdx), %r15d
+	xorq	8192(%r8,%rax,8), %r9
+	movzbl	-94(%rsp,%rdx), %eax
+	xorq	2048(%r8,%r14,8), %r13
+	xorq	10240(%r8,%rbx,8), %r9
+	movzbl	-101(%rsp,%rdx), %ebx
+	xorq	4096(%r8,%rax,8), %r13
+	xorq	12288(%r8,%rsi,8), %r9
+	movzbl	-108(%rsp,%rdx), %esi
+	xorq	6144(%r8,%rbx,8), %r13
+	xorq	8192(%r8,%rsi,8), %r13
+	xorq	10240(%r8,%r11,8), %r13
+	xorq	12288(%r8,%r15,8), %r13
+
+	/* xor key with state */
+	xorq	%r13, %r9
+
+	/* store key */
+	movq	%r13, -80(%rsp,%rcx)
+
+	/* store state */
+	movq	%r9, 48(%rsp,%rcx)
+
+	/* 5 state, key */
+	/* load state +(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	/* load key -(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	movzbl	71(%rsp,%rdx), %r9d
+	movzbl	56(%rsp,%rdx), %r13d
+	movzbl	49(%rsp,%rdx), %eax
+	movzbl	42(%rsp,%rdx), %ebx
+	movq	14336(%r8,%r9,8), %r14
+	movzbl	35(%rsp,%rdx), %esi
+	movzbl	28(%rsp,%rdx), %r11d
+	movzbl	21(%rsp,%rdx), %r15d
+	xorq	(%r8,%r13,8), %r14
+	movzbl	-57(%rsp,%rdx), %r13d
+	movzbl	78(%rsp,%rdx), %r9d
+	xorq	2048(%r8,%rax,8), %r14
+	movzbl	-72(%rsp,%rdx), %eax
+	xorq	4096(%r8,%rbx,8), %r14
+	movzbl	-79(%rsp,%rdx), %ebx
+	xorq	6144(%r8,%rsi,8), %r14
+	movq	14336(%r8,%r13,8), %rsi
+	movzbl	-107(%rsp,%rdx), %r13d
+	xorq	8192(%r8,%r11,8), %r14
+	xorq	(%r8,%rax,8), %rsi
+	movzbl	-86(%rsp,%rdx), %r11d
+	movzbl	-50(%rsp,%rdx), %eax
+	xorq	10240(%r8,%r15,8), %r14
+	xorq	2048(%r8,%rbx,8), %rsi
+	movzbl	-93(%rsp,%rdx), %r15d
+	xorq	12288(%r8,%r9,8), %r14
+	xorq	4096(%r8,%r11,8), %rsi
+	movzbl	-100(%rsp,%rdx), %r9d
+	xorq	6144(%r8,%r15,8), %rsi
+	xorq	8192(%r8,%r9,8), %rsi
+	xorq	10240(%r8,%r13,8), %rsi
+	xorq	12288(%r8,%rax,8), %rsi
+
+	/* xor key with state */
+	xorq	%rsi, %r14
+
+	/* store key */
+	movq	%rsi, -72(%rsp,%rcx)
+
+	/* store state */
+	movq	%r14, 56(%rsp,%rcx)
+
+	/* 6 state, key */
+	/* load state +(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	/* load key -(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	movzbl	79(%rsp,%rdx), %r14d
+	movzbl	64(%rsp,%rdx), %ebx
+	movzbl	57(%rsp,%rdx), %r11d
+	movzbl	50(%rsp,%rdx), %r15d
+	movq	14336(%r8,%r14,8), %rsi
+	movzbl	43(%rsp,%rdx), %r9d
+	movzbl	36(%rsp,%rdx), %r13d
+	movzbl	29(%rsp,%rdx), %eax
+	xorq	(%r8,%rbx,8), %rsi
+	movzbl	-64(%rsp,%rdx), %ebx
+	movzbl	22(%rsp,%rdx), %r14d
+	xorq	2048(%r8,%r11,8), %rsi
+	movzbl	-49(%rsp,%rdx), %r11d
+	xorq	4096(%r8,%r15,8), %rsi
+	movq	14336(%r8,%r11,8), %r15
+	movzbl	-99(%rsp,%rdx), %r11d
+	xorq	6144(%r8,%r9,8), %rsi
+	movzbl	-71(%rsp,%rdx), %r9d
+	xorq	8192(%r8,%r13,8), %rsi
+	movzbl	-78(%rsp,%rdx), %r13d
+	xorq	10240(%r8,%rax,8), %rsi
+	xorq	(%r8,%rbx,8), %r15
+	movzbl	-85(%rsp,%rdx), %eax
+	movzbl	-106(%rsp,%rdx), %ebx
+	xorq	12288(%r8,%r14,8), %rsi
+	xorq	2048(%r8,%r9,8), %r15
+	movzbl	-92(%rsp,%rdx), %r14d
+	xorq	4096(%r8,%r13,8), %r15
+	xorq	6144(%r8,%rax,8), %r15
+	xorq	8192(%r8,%r14,8), %r15
+	xorq	10240(%r8,%r11,8), %r15
+	xorq	12288(%r8,%rbx,8), %r15
+
+	/* xor key with state */
+	xorq	%r15, %rsi
+
+	/* store key */
+	movq	%r15, -64(%rsp,%rcx)
+
+	/* store state */
+	movq	%rsi, 64(%rsp,%rcx)
+
+	/* 7 state, key */
+	/* load state +(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	/* load key -(%rsp) */
+	/* load and xor sboxes (%r8) together */
+	movzbl	23(%rsp,%rdx), %esi
+	movzbl	72(%rsp,%rdx), %r15d
+	movzbl	65(%rsp,%rdx), %r13d
+	movzbl	58(%rsp,%rdx), %eax
+	movq	14336(%r8,%rsi,8), %r9
+	movzbl	51(%rsp,%rdx), %r14d
+	movzbl	44(%rsp,%rdx), %r11d
+	movzbl	37(%rsp,%rdx), %ebx
+	xorq	(%r8,%r15,8), %r9
+	movzbl	-105(%rsp,%rdx), %r15d
+	movzbl	30(%rsp,%rdx), %esi
+	xorq	2048(%r8,%r13,8), %r9
+	movzbl	-56(%rsp,%rdx), %r13d
+	xorq	4096(%r8,%rax,8), %r9
+	movzbl	-63(%rsp,%rdx), %eax
+	xorq	6144(%r8,%r14,8), %r9
+	movq	14336(%r8,%r15,8), %r14
+	movzbl	-91(%rsp,%rdx), %r15d
+	xorq	8192(%r8,%r11,8), %r9
+	xorq	(%r8,%r13,8), %r14
+	movzbl	-70(%rsp,%rdx), %r11d
+	xorq	10240(%r8,%rbx,8), %r9
+	xorq	2048(%r8,%rax,8), %r14
+	movzbl	-77(%rsp,%rdx), %ebx
+	xorq	12288(%r8,%rsi,8), %r9
+	xorq	4096(%r8,%r11,8), %r14
+	movzbl	-84(%rsp,%rdx), %esi
+	movzbl	-98(%rsp,%rdx), %edx
+	xorq	6144(%r8,%rbx,8), %r14
+	xorq	8192(%r8,%rsi,8), %r14
+	xorq	10240(%r8,%r15,8), %r14
+	xorq	12288(%r8,%rdx,8), %r14
+
+	/* xor key with state */
+	xorq	%r14, %r9
+
+	/* store key */
+	movq	%r14, -56(%rsp,%rcx)
+
+	/* store state */
+	movq	%r9, 72(%rsp,%rcx)
+
+	/* increment and compare rc addresses */
+	addq	$8, %r12
+	cmpq	%rbp, %r12
+	jne	.Lroundloop
+
+	/* load state +(%rsp) */
+	/* xor state +(%rsp) with hash (%rdi) */
+	movq	16(%rsp), %r12
+	xorq	%r12, (%rdi)
+	movq	24(%rsp), %rbp
+	xorq	%rbp, 8(%rdi)
+	movq	64(%rsp), %r13
+	xorq	%r13, 48(%rdi)
+	movq	72(%rsp), %r14
+	xorq	%r14, 56(%rdi)
+	movq	32(%rsp), %r10
+	xorq	%r10, 16(%rdi)
+	movq	40(%rsp), %r8
+	xorq	%r8, 24(%rdi)
+	movq	48(%rsp), %rcx
+	xorq	%rcx, 32(%rdi)
+	movq	56(%rsp), %r9
+	xorq	%r9, 40(%rdi)
+
+	addq	$152, %rsp
+	.cfi_def_cfa_offset 56
+	popq	%rbx
+	.cfi_def_cfa_offset 48
+	popq	%rbp
+	.cfi_def_cfa_offset 40
+	popq	%r12
+	.cfi_def_cfa_offset 32
+	popq	%r13
+	.cfi_def_cfa_offset 24
+	popq	%r14
+	.cfi_def_cfa_offset 16
+	popq	%r15
+	.cfi_def_cfa_offset 8
+
+	/* burn_stack */
+	movl $(4 * 64 + 2 * 4), %eax
+
+	ret
+	.cfi_endproc
+.section	.rodata
+.align 32
+.type	rc,  <at> object
+.size	rc, 80
+/* round constants in little endian */
+rc:
+	.quad 0x4f01b887e8c62318
+	.quad 0x52916f79f5d2a636
+	.quad 0x357b0ca38e9bbc60
+	.quad 0x57fe4b2ec2d7e01d
+	.quad 0xda4af09fe5377715
+	.quad 0x856ba0b10a29c958
+	.quad 0x67053ecbf4105dbd
+	.quad 0xd8957da78b4127e4
+	.quad 0x9e4717dd667ceefb
+	.quad 0x33835aad07bf2dca
+.align 32
+.type	sbox,  <at> object
+.size	sbox, 16384
+/* rotated sboxes in little endian */
+sbox:
+	.quad 0xd83078c018601818, 0x2646af05238c2323, 0xb891f97ec63fc6c6, 0xfbcd6f13e887e8e8
+	.quad 0xcb13a14c87268787, 0x116d62a9b8dab8b8, 0x0902050801040101, 0x0d9e6e424f214f4f
+	.quad 0x9b6ceead36d83636, 0xff510459a6a2a6a6, 0x0cb9bdded26fd2d2, 0x0ef706fbf5f3f5f5
+	.quad 0x96f280ef79f97979, 0x30dece5f6fa16f6f, 0x6d3feffc917e9191, 0xf8a407aa52555252
+	.quad 0x47c0fd27609d6060, 0x35657689bccabcbc, 0x372bcdac9b569b9b, 0x8a018c048e028e8e
+	.quad 0xd25b1571a3b6a3a3, 0x6c183c600c300c0c, 0x84f68aff7bf17b7b, 0x806ae1b535d43535
+	.quad 0xf53a69e81d741d1d, 0xb3dd4753e0a7e0e0, 0x21b3acf6d77bd7d7, 0x9c99ed5ec22fc2c2
+	.quad 0x435c966d2eb82e2e, 0x29967a624b314b4b, 0x5de121a3fedffefe, 0xd5ae168257415757
+	.quad 0xbd2a41a815541515, 0xe8eeb69f77c17777, 0x926eeba537dc3737, 0x9ed7567be5b3e5e5
+	.quad 0x1323d98c9f469f9f, 0x23fd17d3f0e7f0f0, 0x20947f6a4a354a4a, 0x44a9959eda4fdada
+	.quad 0xa2b025fa587d5858, 0xcf8fca06c903c9c9, 0x7c528d5529a42929, 0x5a1422500a280a0a
+	.quad 0x507f4fe1b1feb1b1, 0xc95d1a69a0baa0a0, 0x14d6da7f6bb16b6b, 0xd917ab5c852e8585
+	.quad 0x3c677381bdcebdbd, 0x8fba34d25d695d5d, 0x9020508010401010, 0x07f503f3f4f7f4f4
+	.quad 0xdd8bc016cb0bcbcb, 0xd37cc6ed3ef83e3e, 0x2d0a112805140505, 0x78cee61f67816767
+	.quad 0x97d55373e4b7e4e4, 0x024ebb25279c2727, 0x7382583241194141, 0xa70b9d2c8b168b8b
+	.quad 0xf6530151a7a6a7a7, 0xb2fa94cf7de97d7d, 0x4937fbdc956e9595, 0x56ad9f8ed847d8d8
+	.quad 0x70eb308bfbcbfbfb, 0xcdc17123ee9feeee, 0xbbf891c77ced7c7c, 0x71cce31766856666
+	.quad 0x7ba78ea6dd53dddd, 0xaf2e4bb8175c1717, 0x458e460247014747, 0x1a21dc849e429e9e
+	.quad 0xd489c51eca0fcaca, 0x585a99752db42d2d, 0x2e637991bfc6bfbf, 0x3f0e1b38071c0707
+	.quad 0xac472301ad8eadad, 0xb0b42fea5a755a5a, 0xef1bb56c83368383, 0xb666ff8533cc3333
+	.quad 0x5cc6f23f63916363, 0x12040a1002080202, 0x93493839aa92aaaa, 0xdee2a8af71d97171
+	.quad 0xc68dcf0ec807c8c8, 0xd1327dc819641919, 0x3b92707249394949, 0x5faf9a86d943d9d9
+	.quad 0x31f91dc3f2eff2f2, 0xa8db484be3abe3e3, 0xb9b62ae25b715b5b, 0xbc0d9234881a8888
+	.quad 0x3e29c8a49a529a9a, 0x0b4cbe2d26982626, 0xbf64fa8d32c83232, 0x597d4ae9b0fab0b0
+	.quad 0xf2cf6a1be983e9e9, 0x771e33780f3c0f0f, 0x33b7a6e6d573d5d5, 0xf41dba74803a8080
+	.quad 0x27617c99bec2bebe, 0xeb87de26cd13cdcd, 0x8968e4bd34d03434, 0x3290757a483d4848
+	.quad 0x54e324abffdbffff, 0x8df48ff77af57a7a, 0x643deaf4907a9090, 0x9dbe3ec25f615f5f
+	.quad 0x3d40a01d20802020, 0x0fd0d56768bd6868, 0xca3472d01a681a1a, 0xb7412c19ae82aeae
+	.quad 0x7d755ec9b4eab4b4, 0xcea8199a544d5454, 0x7f3be5ec93769393, 0x2f44aa0d22882222
+	.quad 0x63c8e907648d6464, 0x2aff12dbf1e3f1f1, 0xcce6a2bf73d17373, 0x82245a9012481212
+	.quad 0x7a805d3a401d4040, 0x4810284008200808, 0x959be856c32bc3c3, 0xdfc57b33ec97ecec
+	.quad 0x4dab9096db4bdbdb, 0xc05f1f61a1bea1a1, 0x9107831c8d0e8d8d, 0xc87ac9f53df43d3d
+	.quad 0x5b33f1cc97669797, 0x0000000000000000, 0xf983d436cf1bcfcf, 0x6e5687452bac2b2b
+	.quad 0xe1ecb39776c57676, 0xe619b06482328282, 0x28b1a9fed67fd6d6, 0xc33677d81b6c1b1b
+	.quad 0x74775bc1b5eeb5b5, 0xbe432911af86afaf, 0x1dd4df776ab56a6a, 0xeaa00dba505d5050
+	.quad 0x578a4c1245094545, 0x38fb18cbf3ebf3f3, 0xad60f09d30c03030, 0xc4c3742bef9befef
+	.quad 0xda7ec3e53ffc3f3f, 0xc7aa1c9255495555, 0xdb591079a2b2a2a2, 0xe9c96503ea8feaea
+	.quad 0x6acaec0f65896565, 0x036968b9bad2baba, 0x4a5e93652fbc2f2f, 0x8e9de74ec027c0c0
+	.quad 0x60a181bede5fdede, 0xfc386ce01c701c1c, 0x46e72ebbfdd3fdfd, 0x1f9a64524d294d4d
+	.quad 0x7639e0e492729292, 0xfaeabc8f75c97575, 0x360c1e3006180606, 0xae0998248a128a8a
+	.quad 0x4b7940f9b2f2b2b2, 0x85d15963e6bfe6e6, 0x7e1c36700e380e0e, 0xe73e63f81f7c1f1f
+	.quad 0x55c4f73762956262, 0x3ab5a3eed477d4d4, 0x814d3229a89aa8a8, 0x5231f4c496629696
+	.quad 0x62ef3a9bf9c3f9f9, 0xa397f666c533c5c5, 0x104ab13525942525, 0xabb220f259795959
+	.quad 0xd015ae54842a8484, 0xc5e4a7b772d57272, 0xec72ddd539e43939, 0x1698615a4c2d4c4c
+	.quad 0x94bc3bca5e655e5e, 0x9ff085e778fd7878, 0xe570d8dd38e03838, 0x980586148c0a8c8c
+	.quad 0x17bfb2c6d163d1d1, 0xe4570b41a5aea5a5, 0xa1d94d43e2afe2e2, 0x4ec2f82f61996161
+	.quad 0x427b45f1b3f6b3b3, 0x3442a51521842121, 0x0825d6949c4a9c9c, 0xee3c66f01e781e1e
+	.quad 0x6186522243114343, 0xb193fc76c73bc7c7, 0x4fe52bb3fcd7fcfc, 0x2408142004100404
+	.quad 0xe3a208b251595151, 0x252fc7bc995e9999, 0x22dac44f6da96d6d, 0x651a39680d340d0d
+	.quad 0x79e93583facffafa, 0x69a384b6df5bdfdf, 0xa9fc9bd77ee57e7e, 0x1948b43d24902424
+	.quad 0xfe76d7c53bec3b3b, 0x9a4b3d31ab96abab, 0xf081d13ece1fcece, 0x9922558811441111
+	.quad 0x8303890c8f068f8f, 0x049c6b4a4e254e4e, 0x667351d1b7e6b7b7, 0xe0cb600beb8bebeb
+	.quad 0xc178ccfd3cf03c3c, 0xfd1fbf7c813e8181, 0x4035fed4946a9494, 0x1cf30cebf7fbf7f7
+	.quad 0x186f67a1b9deb9b9, 0x8b265f98134c1313, 0x51589c7d2cb02c2c, 0x05bbb8d6d36bd3d3
+	.quad 0x8cd35c6be7bbe7e7, 0x39dccb576ea56e6e, 0xaa95f36ec437c4c4, 0x1b060f18030c0303
+	.quad 0xdcac138a56455656, 0x5e88491a440d4444, 0xa0fe9edf7fe17f7f, 0x884f3721a99ea9a9
+	.quad 0x6754824d2aa82a2a, 0x0a6b6db1bbd6bbbb, 0x879fe246c123c1c1, 0xf1a602a253515353
+	.quad 0x72a58baedc57dcdc, 0x531627580b2c0b0b, 0x0127d39c9d4e9d9d, 0x2bd8c1476cad6c6c
+	.quad 0xa462f59531c43131, 0xf3e8b98774cd7474, 0x15f109e3f6fff6f6, 0x4c8c430a46054646
+	.quad 0xa5452609ac8aacac, 0xb50f973c891e8989, 0xb42844a014501414, 0xbadf425be1a3e1e1
+	.quad 0xa62c4eb016581616, 0xf774d2cd3ae83a3a, 0x06d2d06f69b96969, 0x41122d4809240909
+	.quad 0xd7e0ada770dd7070, 0x6f7154d9b6e2b6b6, 0x1ebdb7ced067d0d0, 0xd6c77e3bed93eded
+	.quad 0xe285db2ecc17cccc, 0x6884572a42154242, 0x2c2dc2b4985a9898, 0xed550e49a4aaa4a4
+	.quad 0x7550885d28a02828, 0x86b831da5c6d5c5c, 0x6bed3f93f8c7f8f8, 0xc211a44486228686
+
+	.quad 0x3078c018601818d8, 0x46af05238c232326, 0x91f97ec63fc6c6b8, 0xcd6f13e887e8e8fb
+	.quad 0x13a14c87268787cb, 0x6d62a9b8dab8b811, 0x0205080104010109, 0x9e6e424f214f4f0d
+	.quad 0x6ceead36d836369b, 0x510459a6a2a6a6ff, 0xb9bdded26fd2d20c, 0xf706fbf5f3f5f50e
+	.quad 0xf280ef79f9797996, 0xdece5f6fa16f6f30, 0x3feffc917e91916d, 0xa407aa52555252f8
+	.quad 0xc0fd27609d606047, 0x657689bccabcbc35, 0x2bcdac9b569b9b37, 0x018c048e028e8e8a
+	.quad 0x5b1571a3b6a3a3d2, 0x183c600c300c0c6c, 0xf68aff7bf17b7b84, 0x6ae1b535d4353580
+	.quad 0x3a69e81d741d1df5, 0xdd4753e0a7e0e0b3, 0xb3acf6d77bd7d721, 0x99ed5ec22fc2c29c
+	.quad 0x5c966d2eb82e2e43, 0x967a624b314b4b29, 0xe121a3fedffefe5d, 0xae168257415757d5
+	.quad 0x2a41a815541515bd, 0xeeb69f77c17777e8, 0x6eeba537dc373792, 0xd7567be5b3e5e59e
+	.quad 0x23d98c9f469f9f13, 0xfd17d3f0e7f0f023, 0x947f6a4a354a4a20, 0xa9959eda4fdada44
+	.quad 0xb025fa587d5858a2, 0x8fca06c903c9c9cf, 0x528d5529a429297c, 0x1422500a280a0a5a
+	.quad 0x7f4fe1b1feb1b150, 0x5d1a69a0baa0a0c9, 0xd6da7f6bb16b6b14, 0x17ab5c852e8585d9
+	.quad 0x677381bdcebdbd3c, 0xba34d25d695d5d8f, 0x2050801040101090, 0xf503f3f4f7f4f407
+	.quad 0x8bc016cb0bcbcbdd, 0x7cc6ed3ef83e3ed3, 0x0a1128051405052d, 0xcee61f6781676778
+	.quad 0xd55373e4b7e4e497, 0x4ebb25279c272702, 0x8258324119414173, 0x0b9d2c8b168b8ba7
+	.quad 0x530151a7a6a7a7f6, 0xfa94cf7de97d7db2, 0x37fbdc956e959549, 0xad9f8ed847d8d856
+	.quad 0xeb308bfbcbfbfb70, 0xc17123ee9feeeecd, 0xf891c77ced7c7cbb, 0xcce3176685666671
+	.quad 0xa78ea6dd53dddd7b, 0x2e4bb8175c1717af, 0x8e46024701474745, 0x21dc849e429e9e1a
+	.quad 0x89c51eca0fcacad4, 0x5a99752db42d2d58, 0x637991bfc6bfbf2e, 0x0e1b38071c07073f
+	.quad 0x472301ad8eadadac, 0xb42fea5a755a5ab0, 0x1bb56c83368383ef, 0x66ff8533cc3333b6
+	.quad 0xc6f23f639163635c, 0x040a100208020212, 0x493839aa92aaaa93, 0xe2a8af71d97171de
+	.quad 0x8dcf0ec807c8c8c6, 0x327dc819641919d1, 0x927072493949493b, 0xaf9a86d943d9d95f
+	.quad 0xf91dc3f2eff2f231, 0xdb484be3abe3e3a8, 0xb62ae25b715b5bb9, 0x0d9234881a8888bc
+	.quad 0x29c8a49a529a9a3e, 0x4cbe2d269826260b, 0x64fa8d32c83232bf, 0x7d4ae9b0fab0b059
+	.quad 0xcf6a1be983e9e9f2, 0x1e33780f3c0f0f77, 0xb7a6e6d573d5d533, 0x1dba74803a8080f4
+	.quad 0x617c99bec2bebe27, 0x87de26cd13cdcdeb, 0x68e4bd34d0343489, 0x90757a483d484832
+	.quad 0xe324abffdbffff54, 0xf48ff77af57a7a8d, 0x3deaf4907a909064, 0xbe3ec25f615f5f9d
+	.quad 0x40a01d208020203d, 0xd0d56768bd68680f, 0x3472d01a681a1aca, 0x412c19ae82aeaeb7
+	.quad 0x755ec9b4eab4b47d, 0xa8199a544d5454ce, 0x3be5ec937693937f, 0x44aa0d228822222f
+	.quad 0xc8e907648d646463, 0xff12dbf1e3f1f12a, 0xe6a2bf73d17373cc, 0x245a901248121282
+	.quad 0x805d3a401d40407a, 0x1028400820080848, 0x9be856c32bc3c395, 0xc57b33ec97ececdf
+	.quad 0xab9096db4bdbdb4d, 0x5f1f61a1bea1a1c0, 0x07831c8d0e8d8d91, 0x7ac9f53df43d3dc8
+	.quad 0x33f1cc976697975b, 0x0000000000000000, 0x83d436cf1bcfcff9, 0x5687452bac2b2b6e
+	.quad 0xecb39776c57676e1, 0x19b06482328282e6, 0xb1a9fed67fd6d628, 0x3677d81b6c1b1bc3
+	.quad 0x775bc1b5eeb5b574, 0x432911af86afafbe, 0xd4df776ab56a6a1d, 0xa00dba505d5050ea
+	.quad 0x8a4c124509454557, 0xfb18cbf3ebf3f338, 0x60f09d30c03030ad, 0xc3742bef9befefc4
+	.quad 0x7ec3e53ffc3f3fda, 0xaa1c9255495555c7, 0x591079a2b2a2a2db, 0xc96503ea8feaeae9
+	.quad 0xcaec0f658965656a, 0x6968b9bad2baba03, 0x5e93652fbc2f2f4a, 0x9de74ec027c0c08e
+	.quad 0xa181bede5fdede60, 0x386ce01c701c1cfc, 0xe72ebbfdd3fdfd46, 0x9a64524d294d4d1f
+	.quad 0x39e0e49272929276, 0xeabc8f75c97575fa, 0x0c1e300618060636, 0x0998248a128a8aae
+	.quad 0x7940f9b2f2b2b24b, 0xd15963e6bfe6e685, 0x1c36700e380e0e7e, 0x3e63f81f7c1f1fe7
+	.quad 0xc4f7376295626255, 0xb5a3eed477d4d43a, 0x4d3229a89aa8a881, 0x31f4c49662969652
+	.quad 0xef3a9bf9c3f9f962, 0x97f666c533c5c5a3, 0x4ab1352594252510, 0xb220f259795959ab
+	.quad 0x15ae54842a8484d0, 0xe4a7b772d57272c5, 0x72ddd539e43939ec, 0x98615a4c2d4c4c16
+	.quad 0xbc3bca5e655e5e94, 0xf085e778fd78789f, 0x70d8dd38e03838e5, 0x0586148c0a8c8c98
+	.quad 0xbfb2c6d163d1d117, 0x570b41a5aea5a5e4, 0xd94d43e2afe2e2a1, 0xc2f82f619961614e
+	.quad 0x7b45f1b3f6b3b342, 0x42a5152184212134, 0x25d6949c4a9c9c08, 0x3c66f01e781e1eee
+	.quad 0x8652224311434361, 0x93fc76c73bc7c7b1, 0xe52bb3fcd7fcfc4f, 0x0814200410040424
+	.quad 0xa208b251595151e3, 0x2fc7bc995e999925, 0xdac44f6da96d6d22, 0x1a39680d340d0d65
+	.quad 0xe93583facffafa79, 0xa384b6df5bdfdf69, 0xfc9bd77ee57e7ea9, 0x48b43d2490242419
+	.quad 0x76d7c53bec3b3bfe, 0x4b3d31ab96abab9a, 0x81d13ece1fcecef0, 0x2255881144111199
+	.quad 0x03890c8f068f8f83, 0x9c6b4a4e254e4e04, 0x7351d1b7e6b7b766, 0xcb600beb8bebebe0
+	.quad 0x78ccfd3cf03c3cc1, 0x1fbf7c813e8181fd, 0x35fed4946a949440, 0xf30cebf7fbf7f71c
+	.quad 0x6f67a1b9deb9b918, 0x265f98134c13138b, 0x589c7d2cb02c2c51, 0xbbb8d6d36bd3d305
+	.quad 0xd35c6be7bbe7e78c, 0xdccb576ea56e6e39, 0x95f36ec437c4c4aa, 0x060f18030c03031b
+	.quad 0xac138a56455656dc, 0x88491a440d44445e, 0xfe9edf7fe17f7fa0, 0x4f3721a99ea9a988
+	.quad 0x54824d2aa82a2a67, 0x6b6db1bbd6bbbb0a, 0x9fe246c123c1c187, 0xa602a253515353f1
+	.quad 0xa58baedc57dcdc72, 0x1627580b2c0b0b53, 0x27d39c9d4e9d9d01, 0xd8c1476cad6c6c2b
+	.quad 0x62f59531c43131a4, 0xe8b98774cd7474f3, 0xf109e3f6fff6f615, 0x8c430a460546464c
+	.quad 0x452609ac8aacaca5, 0x0f973c891e8989b5, 0x2844a014501414b4, 0xdf425be1a3e1e1ba
+	.quad 0x2c4eb016581616a6, 0x74d2cd3ae83a3af7, 0xd2d06f69b9696906, 0x122d480924090941
+	.quad 0xe0ada770dd7070d7, 0x7154d9b6e2b6b66f, 0xbdb7ced067d0d01e, 0xc77e3bed93ededd6
+	.quad 0x85db2ecc17cccce2, 0x84572a4215424268, 0x2dc2b4985a98982c, 0x550e49a4aaa4a4ed
+	.quad 0x50885d28a0282875, 0xb831da5c6d5c5c86, 0xed3f93f8c7f8f86b, 0x11a44486228686c2
+
+	.quad 0x78c018601818d830, 0xaf05238c23232646, 0xf97ec63fc6c6b891, 0x6f13e887e8e8fbcd
+	.quad 0xa14c87268787cb13, 0x62a9b8dab8b8116d, 0x0508010401010902, 0x6e424f214f4f0d9e
+	.quad 0xeead36d836369b6c, 0x0459a6a2a6a6ff51, 0xbdded26fd2d20cb9, 0x06fbf5f3f5f50ef7
+	.quad 0x80ef79f9797996f2, 0xce5f6fa16f6f30de, 0xeffc917e91916d3f, 0x07aa52555252f8a4
+	.quad 0xfd27609d606047c0, 0x7689bccabcbc3565, 0xcdac9b569b9b372b, 0x8c048e028e8e8a01
+	.quad 0x1571a3b6a3a3d25b, 0x3c600c300c0c6c18, 0x8aff7bf17b7b84f6, 0xe1b535d43535806a
+	.quad 0x69e81d741d1df53a, 0x4753e0a7e0e0b3dd, 0xacf6d77bd7d721b3, 0xed5ec22fc2c29c99
+	.quad 0x966d2eb82e2e435c, 0x7a624b314b4b2996, 0x21a3fedffefe5de1, 0x168257415757d5ae
+	.quad 0x41a815541515bd2a, 0xb69f77c17777e8ee, 0xeba537dc3737926e, 0x567be5b3e5e59ed7
+	.quad 0xd98c9f469f9f1323, 0x17d3f0e7f0f023fd, 0x7f6a4a354a4a2094, 0x959eda4fdada44a9
+	.quad 0x25fa587d5858a2b0, 0xca06c903c9c9cf8f, 0x8d5529a429297c52, 0x22500a280a0a5a14
+	.quad 0x4fe1b1feb1b1507f, 0x1a69a0baa0a0c95d, 0xda7f6bb16b6b14d6, 0xab5c852e8585d917
+	.quad 0x7381bdcebdbd3c67, 0x34d25d695d5d8fba, 0x5080104010109020, 0x03f3f4f7f4f407f5
+	.quad 0xc016cb0bcbcbdd8b, 0xc6ed3ef83e3ed37c, 0x1128051405052d0a, 0xe61f6781676778ce
+	.quad 0x5373e4b7e4e497d5, 0xbb25279c2727024e, 0x5832411941417382, 0x9d2c8b168b8ba70b
+	.quad 0x0151a7a6a7a7f653, 0x94cf7de97d7db2fa, 0xfbdc956e95954937, 0x9f8ed847d8d856ad
+	.quad 0x308bfbcbfbfb70eb, 0x7123ee9feeeecdc1, 0x91c77ced7c7cbbf8, 0xe3176685666671cc
+	.quad 0x8ea6dd53dddd7ba7, 0x4bb8175c1717af2e, 0x460247014747458e, 0xdc849e429e9e1a21
+	.quad 0xc51eca0fcacad489, 0x99752db42d2d585a, 0x7991bfc6bfbf2e63, 0x1b38071c07073f0e
+	.quad 0x2301ad8eadadac47, 0x2fea5a755a5ab0b4, 0xb56c83368383ef1b, 0xff8533cc3333b666
+	.quad 0xf23f639163635cc6, 0x0a10020802021204, 0x3839aa92aaaa9349, 0xa8af71d97171dee2
+	.quad 0xcf0ec807c8c8c68d, 0x7dc819641919d132, 0x7072493949493b92, 0x9a86d943d9d95faf
+	.quad 0x1dc3f2eff2f231f9, 0x484be3abe3e3a8db, 0x2ae25b715b5bb9b6, 0x9234881a8888bc0d
+	.quad 0xc8a49a529a9a3e29, 0xbe2d269826260b4c, 0xfa8d32c83232bf64, 0x4ae9b0fab0b0597d
+	.quad 0x6a1be983e9e9f2cf, 0x33780f3c0f0f771e, 0xa6e6d573d5d533b7, 0xba74803a8080f41d
+	.quad 0x7c99bec2bebe2761, 0xde26cd13cdcdeb87, 0xe4bd34d034348968, 0x757a483d48483290
+	.quad 0x24abffdbffff54e3, 0x8ff77af57a7a8df4, 0xeaf4907a9090643d, 0x3ec25f615f5f9dbe
+	.quad 0xa01d208020203d40, 0xd56768bd68680fd0, 0x72d01a681a1aca34, 0x2c19ae82aeaeb741
+	.quad 0x5ec9b4eab4b47d75, 0x199a544d5454cea8, 0xe5ec937693937f3b, 0xaa0d228822222f44
+	.quad 0xe907648d646463c8, 0x12dbf1e3f1f12aff, 0xa2bf73d17373cce6, 0x5a90124812128224
+	.quad 0x5d3a401d40407a80, 0x2840082008084810, 0xe856c32bc3c3959b, 0x7b33ec97ececdfc5
+	.quad 0x9096db4bdbdb4dab, 0x1f61a1bea1a1c05f, 0x831c8d0e8d8d9107, 0xc9f53df43d3dc87a
+	.quad 0xf1cc976697975b33, 0x0000000000000000, 0xd436cf1bcfcff983, 0x87452bac2b2b6e56
+	.quad 0xb39776c57676e1ec, 0xb06482328282e619, 0xa9fed67fd6d628b1, 0x77d81b6c1b1bc336
+	.quad 0x5bc1b5eeb5b57477, 0x2911af86afafbe43, 0xdf776ab56a6a1dd4, 0x0dba505d5050eaa0
+	.quad 0x4c1245094545578a, 0x18cbf3ebf3f338fb, 0xf09d30c03030ad60, 0x742bef9befefc4c3
+	.quad 0xc3e53ffc3f3fda7e, 0x1c9255495555c7aa, 0x1079a2b2a2a2db59, 0x6503ea8feaeae9c9
+	.quad 0xec0f658965656aca, 0x68b9bad2baba0369, 0x93652fbc2f2f4a5e, 0xe74ec027c0c08e9d
+	.quad 0x81bede5fdede60a1, 0x6ce01c701c1cfc38, 0x2ebbfdd3fdfd46e7, 0x64524d294d4d1f9a
+	.quad 0xe0e4927292927639, 0xbc8f75c97575faea, 0x1e3006180606360c, 0x98248a128a8aae09
+	.quad 0x40f9b2f2b2b24b79, 0x5963e6bfe6e685d1, 0x36700e380e0e7e1c, 0x63f81f7c1f1fe73e
+	.quad 0xf7376295626255c4, 0xa3eed477d4d43ab5, 0x3229a89aa8a8814d, 0xf4c4966296965231
+	.quad 0x3a9bf9c3f9f962ef, 0xf666c533c5c5a397, 0xb13525942525104a, 0x20f259795959abb2
+	.quad 0xae54842a8484d015, 0xa7b772d57272c5e4, 0xddd539e43939ec72, 0x615a4c2d4c4c1698
+	.quad 0x3bca5e655e5e94bc, 0x85e778fd78789ff0, 0xd8dd38e03838e570, 0x86148c0a8c8c9805
+	.quad 0xb2c6d163d1d117bf, 0x0b41a5aea5a5e457, 0x4d43e2afe2e2a1d9, 0xf82f619961614ec2
+	.quad 0x45f1b3f6b3b3427b, 0xa515218421213442, 0xd6949c4a9c9c0825, 0x66f01e781e1eee3c
+	.quad 0x5222431143436186, 0xfc76c73bc7c7b193, 0x2bb3fcd7fcfc4fe5, 0x1420041004042408
+	.quad 0x08b251595151e3a2, 0xc7bc995e9999252f, 0xc44f6da96d6d22da, 0x39680d340d0d651a
+	.quad 0x3583facffafa79e9, 0x84b6df5bdfdf69a3, 0x9bd77ee57e7ea9fc, 0xb43d249024241948
+	.quad 0xd7c53bec3b3bfe76, 0x3d31ab96abab9a4b, 0xd13ece1fcecef081, 0x5588114411119922
+	.quad 0x890c8f068f8f8303, 0x6b4a4e254e4e049c, 0x51d1b7e6b7b76673, 0x600beb8bebebe0cb
+	.quad 0xccfd3cf03c3cc178, 0xbf7c813e8181fd1f, 0xfed4946a94944035, 0x0cebf7fbf7f71cf3
+	.quad 0x67a1b9deb9b9186f, 0x5f98134c13138b26, 0x9c7d2cb02c2c5158, 0xb8d6d36bd3d305bb
+	.quad 0x5c6be7bbe7e78cd3, 0xcb576ea56e6e39dc, 0xf36ec437c4c4aa95, 0x0f18030c03031b06
+	.quad 0x138a56455656dcac, 0x491a440d44445e88, 0x9edf7fe17f7fa0fe, 0x3721a99ea9a9884f
+	.quad 0x824d2aa82a2a6754, 0x6db1bbd6bbbb0a6b, 0xe246c123c1c1879f, 0x02a253515353f1a6
+	.quad 0x8baedc57dcdc72a5, 0x27580b2c0b0b5316, 0xd39c9d4e9d9d0127, 0xc1476cad6c6c2bd8
+	.quad 0xf59531c43131a462, 0xb98774cd7474f3e8, 0x09e3f6fff6f615f1, 0x430a460546464c8c
+	.quad 0x2609ac8aacaca545, 0x973c891e8989b50f, 0x44a014501414b428, 0x425be1a3e1e1badf
+	.quad 0x4eb016581616a62c, 0xd2cd3ae83a3af774, 0xd06f69b9696906d2, 0x2d48092409094112
+	.quad 0xada770dd7070d7e0, 0x54d9b6e2b6b66f71, 0xb7ced067d0d01ebd, 0x7e3bed93ededd6c7
+	.quad 0xdb2ecc17cccce285, 0x572a421542426884, 0xc2b4985a98982c2d, 0x0e49a4aaa4a4ed55
+	.quad 0x885d28a028287550, 0x31da5c6d5c5c86b8, 0x3f93f8c7f8f86bed, 0xa44486228686c211
+
+	.quad 0xc018601818d83078, 0x05238c23232646af, 0x7ec63fc6c6b891f9, 0x13e887e8e8fbcd6f
+	.quad 0x4c87268787cb13a1, 0xa9b8dab8b8116d62, 0x0801040101090205, 0x424f214f4f0d9e6e
+	.quad 0xad36d836369b6cee, 0x59a6a2a6a6ff5104, 0xded26fd2d20cb9bd, 0xfbf5f3f5f50ef706
+	.quad 0xef79f9797996f280, 0x5f6fa16f6f30dece, 0xfc917e91916d3fef, 0xaa52555252f8a407
+	.quad 0x27609d606047c0fd, 0x89bccabcbc356576, 0xac9b569b9b372bcd, 0x048e028e8e8a018c
+	.quad 0x71a3b6a3a3d25b15, 0x600c300c0c6c183c, 0xff7bf17b7b84f68a, 0xb535d43535806ae1
+	.quad 0xe81d741d1df53a69, 0x53e0a7e0e0b3dd47, 0xf6d77bd7d721b3ac, 0x5ec22fc2c29c99ed
+	.quad 0x6d2eb82e2e435c96, 0x624b314b4b29967a, 0xa3fedffefe5de121, 0x8257415757d5ae16
+	.quad 0xa815541515bd2a41, 0x9f77c17777e8eeb6, 0xa537dc3737926eeb, 0x7be5b3e5e59ed756
+	.quad 0x8c9f469f9f1323d9, 0xd3f0e7f0f023fd17, 0x6a4a354a4a20947f, 0x9eda4fdada44a995
+	.quad 0xfa587d5858a2b025, 0x06c903c9c9cf8fca, 0x5529a429297c528d, 0x500a280a0a5a1422
+	.quad 0xe1b1feb1b1507f4f, 0x69a0baa0a0c95d1a, 0x7f6bb16b6b14d6da, 0x5c852e8585d917ab
+	.quad 0x81bdcebdbd3c6773, 0xd25d695d5d8fba34, 0x8010401010902050, 0xf3f4f7f4f407f503
+	.quad 0x16cb0bcbcbdd8bc0, 0xed3ef83e3ed37cc6, 0x28051405052d0a11, 0x1f6781676778cee6
+	.quad 0x73e4b7e4e497d553, 0x25279c2727024ebb, 0x3241194141738258, 0x2c8b168b8ba70b9d
+	.quad 0x51a7a6a7a7f65301, 0xcf7de97d7db2fa94, 0xdc956e95954937fb, 0x8ed847d8d856ad9f
+	.quad 0x8bfbcbfbfb70eb30, 0x23ee9feeeecdc171, 0xc77ced7c7cbbf891, 0x176685666671cce3
+	.quad 0xa6dd53dddd7ba78e, 0xb8175c1717af2e4b, 0x0247014747458e46, 0x849e429e9e1a21dc
+	.quad 0x1eca0fcacad489c5, 0x752db42d2d585a99, 0x91bfc6bfbf2e6379, 0x38071c07073f0e1b
+	.quad 0x01ad8eadadac4723, 0xea5a755a5ab0b42f, 0x6c83368383ef1bb5, 0x8533cc3333b666ff
+	.quad 0x3f639163635cc6f2, 0x100208020212040a, 0x39aa92aaaa934938, 0xaf71d97171dee2a8
+	.quad 0x0ec807c8c8c68dcf, 0xc819641919d1327d, 0x72493949493b9270, 0x86d943d9d95faf9a
+	.quad 0xc3f2eff2f231f91d, 0x4be3abe3e3a8db48, 0xe25b715b5bb9b62a, 0x34881a8888bc0d92
+	.quad 0xa49a529a9a3e29c8, 0x2d269826260b4cbe, 0x8d32c83232bf64fa, 0xe9b0fab0b0597d4a
+	.quad 0x1be983e9e9f2cf6a, 0x780f3c0f0f771e33, 0xe6d573d5d533b7a6, 0x74803a8080f41dba
+	.quad 0x99bec2bebe27617c, 0x26cd13cdcdeb87de, 0xbd34d034348968e4, 0x7a483d4848329075
+	.quad 0xabffdbffff54e324, 0xf77af57a7a8df48f, 0xf4907a9090643dea, 0xc25f615f5f9dbe3e
+	.quad 0x1d208020203d40a0, 0x6768bd68680fd0d5, 0xd01a681a1aca3472, 0x19ae82aeaeb7412c
+	.quad 0xc9b4eab4b47d755e, 0x9a544d5454cea819, 0xec937693937f3be5, 0x0d228822222f44aa
+	.quad 0x07648d646463c8e9, 0xdbf1e3f1f12aff12, 0xbf73d17373cce6a2, 0x901248121282245a
+	.quad 0x3a401d40407a805d, 0x4008200808481028, 0x56c32bc3c3959be8, 0x33ec97ececdfc57b
+	.quad 0x96db4bdbdb4dab90, 0x61a1bea1a1c05f1f, 0x1c8d0e8d8d910783, 0xf53df43d3dc87ac9
+	.quad 0xcc976697975b33f1, 0x0000000000000000, 0x36cf1bcfcff983d4, 0x452bac2b2b6e5687
+	.quad 0x9776c57676e1ecb3, 0x6482328282e619b0, 0xfed67fd6d628b1a9, 0xd81b6c1b1bc33677
+	.quad 0xc1b5eeb5b574775b, 0x11af86afafbe4329, 0x776ab56a6a1dd4df, 0xba505d5050eaa00d
+	.quad 0x1245094545578a4c, 0xcbf3ebf3f338fb18, 0x9d30c03030ad60f0, 0x2bef9befefc4c374
+	.quad 0xe53ffc3f3fda7ec3, 0x9255495555c7aa1c, 0x79a2b2a2a2db5910, 0x03ea8feaeae9c965
+	.quad 0x0f658965656acaec, 0xb9bad2baba036968, 0x652fbc2f2f4a5e93, 0x4ec027c0c08e9de7
+	.quad 0xbede5fdede60a181, 0xe01c701c1cfc386c, 0xbbfdd3fdfd46e72e, 0x524d294d4d1f9a64
+	.quad 0xe4927292927639e0, 0x8f75c97575faeabc, 0x3006180606360c1e, 0x248a128a8aae0998
+	.quad 0xf9b2f2b2b24b7940, 0x63e6bfe6e685d159, 0x700e380e0e7e1c36, 0xf81f7c1f1fe73e63
+	.quad 0x376295626255c4f7, 0xeed477d4d43ab5a3, 0x29a89aa8a8814d32, 0xc4966296965231f4
+	.quad 0x9bf9c3f9f962ef3a, 0x66c533c5c5a397f6, 0x3525942525104ab1, 0xf259795959abb220
+	.quad 0x54842a8484d015ae, 0xb772d57272c5e4a7, 0xd539e43939ec72dd, 0x5a4c2d4c4c169861
+	.quad 0xca5e655e5e94bc3b, 0xe778fd78789ff085, 0xdd38e03838e570d8, 0x148c0a8c8c980586
+	.quad 0xc6d163d1d117bfb2, 0x41a5aea5a5e4570b, 0x43e2afe2e2a1d94d, 0x2f619961614ec2f8
+	.quad 0xf1b3f6b3b3427b45, 0x15218421213442a5, 0x949c4a9c9c0825d6, 0xf01e781e1eee3c66
+	.quad 0x2243114343618652, 0x76c73bc7c7b193fc, 0xb3fcd7fcfc4fe52b, 0x2004100404240814
+	.quad 0xb251595151e3a208, 0xbc995e9999252fc7, 0x4f6da96d6d22dac4, 0x680d340d0d651a39
+	.quad 0x83facffafa79e935, 0xb6df5bdfdf69a384, 0xd77ee57e7ea9fc9b, 0x3d249024241948b4
+	.quad 0xc53bec3b3bfe76d7, 0x31ab96abab9a4b3d, 0x3ece1fcecef081d1, 0x8811441111992255
+	.quad 0x0c8f068f8f830389, 0x4a4e254e4e049c6b, 0xd1b7e6b7b7667351, 0x0beb8bebebe0cb60
+	.quad 0xfd3cf03c3cc178cc, 0x7c813e8181fd1fbf, 0xd4946a94944035fe, 0xebf7fbf7f71cf30c
+	.quad 0xa1b9deb9b9186f67, 0x98134c13138b265f, 0x7d2cb02c2c51589c, 0xd6d36bd3d305bbb8
+	.quad 0x6be7bbe7e78cd35c, 0x576ea56e6e39dccb, 0x6ec437c4c4aa95f3, 0x18030c03031b060f
+	.quad 0x8a56455656dcac13, 0x1a440d44445e8849, 0xdf7fe17f7fa0fe9e, 0x21a99ea9a9884f37
+	.quad 0x4d2aa82a2a675482, 0xb1bbd6bbbb0a6b6d, 0x46c123c1c1879fe2, 0xa253515353f1a602
+	.quad 0xaedc57dcdc72a58b, 0x580b2c0b0b531627, 0x9c9d4e9d9d0127d3, 0x476cad6c6c2bd8c1
+	.quad 0x9531c43131a462f5, 0x8774cd7474f3e8b9, 0xe3f6fff6f615f109, 0x0a460546464c8c43
+	.quad 0x09ac8aacaca54526, 0x3c891e8989b50f97, 0xa014501414b42844, 0x5be1a3e1e1badf42
+	.quad 0xb016581616a62c4e, 0xcd3ae83a3af774d2, 0x6f69b9696906d2d0, 0x480924090941122d
+	.quad 0xa770dd7070d7e0ad, 0xd9b6e2b6b66f7154, 0xced067d0d01ebdb7, 0x3bed93ededd6c77e
+	.quad 0x2ecc17cccce285db, 0x2a42154242688457, 0xb4985a98982c2dc2, 0x49a4aaa4a4ed550e
+	.quad 0x5d28a02828755088, 0xda5c6d5c5c86b831, 0x93f8c7f8f86bed3f, 0x4486228686c211a4
+
+	.quad 0x18601818d83078c0, 0x238c23232646af05, 0xc63fc6c6b891f97e, 0xe887e8e8fbcd6f13
+	.quad 0x87268787cb13a14c, 0xb8dab8b8116d62a9, 0x0104010109020508, 0x4f214f4f0d9e6e42
+	.quad 0x36d836369b6ceead, 0xa6a2a6a6ff510459, 0xd26fd2d20cb9bdde, 0xf5f3f5f50ef706fb
+	.quad 0x79f9797996f280ef, 0x6fa16f6f30dece5f, 0x917e91916d3feffc, 0x52555252f8a407aa
+	.quad 0x609d606047c0fd27, 0xbccabcbc35657689, 0x9b569b9b372bcdac, 0x8e028e8e8a018c04
+	.quad 0xa3b6a3a3d25b1571, 0x0c300c0c6c183c60, 0x7bf17b7b84f68aff, 0x35d43535806ae1b5
+	.quad 0x1d741d1df53a69e8, 0xe0a7e0e0b3dd4753, 0xd77bd7d721b3acf6, 0xc22fc2c29c99ed5e
+	.quad 0x2eb82e2e435c966d, 0x4b314b4b29967a62, 0xfedffefe5de121a3, 0x57415757d5ae1682
+	.quad 0x15541515bd2a41a8, 0x77c17777e8eeb69f, 0x37dc3737926eeba5, 0xe5b3e5e59ed7567b
+	.quad 0x9f469f9f1323d98c, 0xf0e7f0f023fd17d3, 0x4a354a4a20947f6a, 0xda4fdada44a9959e
+	.quad 0x587d5858a2b025fa, 0xc903c9c9cf8fca06, 0x29a429297c528d55, 0x0a280a0a5a142250
+	.quad 0xb1feb1b1507f4fe1, 0xa0baa0a0c95d1a69, 0x6bb16b6b14d6da7f, 0x852e8585d917ab5c
+	.quad 0xbdcebdbd3c677381, 0x5d695d5d8fba34d2, 0x1040101090205080, 0xf4f7f4f407f503f3
+	.quad 0xcb0bcbcbdd8bc016, 0x3ef83e3ed37cc6ed, 0x051405052d0a1128, 0x6781676778cee61f
+	.quad 0xe4b7e4e497d55373, 0x279c2727024ebb25, 0x4119414173825832, 0x8b168b8ba70b9d2c
+	.quad 0xa7a6a7a7f6530151, 0x7de97d7db2fa94cf, 0x956e95954937fbdc, 0xd847d8d856ad9f8e
+	.quad 0xfbcbfbfb70eb308b, 0xee9feeeecdc17123, 0x7ced7c7cbbf891c7, 0x6685666671cce317
+	.quad 0xdd53dddd7ba78ea6, 0x175c1717af2e4bb8, 0x47014747458e4602, 0x9e429e9e1a21dc84
+	.quad 0xca0fcacad489c51e, 0x2db42d2d585a9975, 0xbfc6bfbf2e637991, 0x071c07073f0e1b38
+	.quad 0xad8eadadac472301, 0x5a755a5ab0b42fea, 0x83368383ef1bb56c, 0x33cc3333b666ff85
+	.quad 0x639163635cc6f23f, 0x0208020212040a10, 0xaa92aaaa93493839, 0x71d97171dee2a8af
+	.quad 0xc807c8c8c68dcf0e, 0x19641919d1327dc8, 0x493949493b927072, 0xd943d9d95faf9a86
+	.quad 0xf2eff2f231f91dc3, 0xe3abe3e3a8db484b, 0x5b715b5bb9b62ae2, 0x881a8888bc0d9234
+	.quad 0x9a529a9a3e29c8a4, 0x269826260b4cbe2d, 0x32c83232bf64fa8d, 0xb0fab0b0597d4ae9
+	.quad 0xe983e9e9f2cf6a1b, 0x0f3c0f0f771e3378, 0xd573d5d533b7a6e6, 0x803a8080f41dba74
+	.quad 0xbec2bebe27617c99, 0xcd13cdcdeb87de26, 0x34d034348968e4bd, 0x483d48483290757a
+	.quad 0xffdbffff54e324ab, 0x7af57a7a8df48ff7, 0x907a9090643deaf4, 0x5f615f5f9dbe3ec2
+	.quad 0x208020203d40a01d, 0x68bd68680fd0d567, 0x1a681a1aca3472d0, 0xae82aeaeb7412c19
+	.quad 0xb4eab4b47d755ec9, 0x544d5454cea8199a, 0x937693937f3be5ec, 0x228822222f44aa0d
+	.quad 0x648d646463c8e907, 0xf1e3f1f12aff12db, 0x73d17373cce6a2bf, 0x1248121282245a90
+	.quad 0x401d40407a805d3a, 0x0820080848102840, 0xc32bc3c3959be856, 0xec97ececdfc57b33
+	.quad 0xdb4bdbdb4dab9096, 0xa1bea1a1c05f1f61, 0x8d0e8d8d9107831c, 0x3df43d3dc87ac9f5
+	.quad 0x976697975b33f1cc, 0x0000000000000000, 0xcf1bcfcff983d436, 0x2bac2b2b6e568745
+	.quad 0x76c57676e1ecb397, 0x82328282e619b064, 0xd67fd6d628b1a9fe, 0x1b6c1b1bc33677d8
+	.quad 0xb5eeb5b574775bc1, 0xaf86afafbe432911, 0x6ab56a6a1dd4df77, 0x505d5050eaa00dba
+	.quad 0x45094545578a4c12, 0xf3ebf3f338fb18cb, 0x30c03030ad60f09d, 0xef9befefc4c3742b
+	.quad 0x3ffc3f3fda7ec3e5, 0x55495555c7aa1c92, 0xa2b2a2a2db591079, 0xea8feaeae9c96503
+	.quad 0x658965656acaec0f, 0xbad2baba036968b9, 0x2fbc2f2f4a5e9365, 0xc027c0c08e9de74e
+	.quad 0xde5fdede60a181be, 0x1c701c1cfc386ce0, 0xfdd3fdfd46e72ebb, 0x4d294d4d1f9a6452
+	.quad 0x927292927639e0e4, 0x75c97575faeabc8f, 0x06180606360c1e30, 0x8a128a8aae099824
+	.quad 0xb2f2b2b24b7940f9, 0xe6bfe6e685d15963, 0x0e380e0e7e1c3670, 0x1f7c1f1fe73e63f8
+	.quad 0x6295626255c4f737, 0xd477d4d43ab5a3ee, 0xa89aa8a8814d3229, 0x966296965231f4c4
+	.quad 0xf9c3f9f962ef3a9b, 0xc533c5c5a397f666, 0x25942525104ab135, 0x59795959abb220f2
+	.quad 0x842a8484d015ae54, 0x72d57272c5e4a7b7, 0x39e43939ec72ddd5, 0x4c2d4c4c1698615a
+	.quad 0x5e655e5e94bc3bca, 0x78fd78789ff085e7, 0x38e03838e570d8dd, 0x8c0a8c8c98058614
+	.quad 0xd163d1d117bfb2c6, 0xa5aea5a5e4570b41, 0xe2afe2e2a1d94d43, 0x619961614ec2f82f
+	.quad 0xb3f6b3b3427b45f1, 0x218421213442a515, 0x9c4a9c9c0825d694, 0x1e781e1eee3c66f0
+	.quad 0x4311434361865222, 0xc73bc7c7b193fc76, 0xfcd7fcfc4fe52bb3, 0x0410040424081420
+	.quad 0x51595151e3a208b2, 0x995e9999252fc7bc, 0x6da96d6d22dac44f, 0x0d340d0d651a3968
+	.quad 0xfacffafa79e93583, 0xdf5bdfdf69a384b6, 0x7ee57e7ea9fc9bd7, 0x249024241948b43d
+	.quad 0x3bec3b3bfe76d7c5, 0xab96abab9a4b3d31, 0xce1fcecef081d13e, 0x1144111199225588
+	.quad 0x8f068f8f8303890c, 0x4e254e4e049c6b4a, 0xb7e6b7b7667351d1, 0xeb8bebebe0cb600b
+	.quad 0x3cf03c3cc178ccfd, 0x813e8181fd1fbf7c, 0x946a94944035fed4, 0xf7fbf7f71cf30ceb
+	.quad 0xb9deb9b9186f67a1, 0x134c13138b265f98, 0x2cb02c2c51589c7d, 0xd36bd3d305bbb8d6
+	.quad 0xe7bbe7e78cd35c6b, 0x6ea56e6e39dccb57, 0xc437c4c4aa95f36e, 0x030c03031b060f18
+	.quad 0x56455656dcac138a, 0x440d44445e88491a, 0x7fe17f7fa0fe9edf, 0xa99ea9a9884f3721
+	.quad 0x2aa82a2a6754824d, 0xbbd6bbbb0a6b6db1, 0xc123c1c1879fe246, 0x53515353f1a602a2
+	.quad 0xdc57dcdc72a58bae, 0x0b2c0b0b53162758, 0x9d4e9d9d0127d39c, 0x6cad6c6c2bd8c147
+	.quad 0x31c43131a462f595, 0x74cd7474f3e8b987, 0xf6fff6f615f109e3, 0x460546464c8c430a
+	.quad 0xac8aacaca5452609, 0x891e8989b50f973c, 0x14501414b42844a0, 0xe1a3e1e1badf425b
+	.quad 0x16581616a62c4eb0, 0x3ae83a3af774d2cd, 0x69b9696906d2d06f, 0x0924090941122d48
+	.quad 0x70dd7070d7e0ada7, 0xb6e2b6b66f7154d9, 0xd067d0d01ebdb7ce, 0xed93ededd6c77e3b
+	.quad 0xcc17cccce285db2e, 0x421542426884572a, 0x985a98982c2dc2b4, 0xa4aaa4a4ed550e49
+	.quad 0x28a028287550885d, 0x5c6d5c5c86b831da, 0xf8c7f8f86bed3f93, 0x86228686c211a444
+
+	.quad 0x601818d83078c018, 0x8c23232646af0523, 0x3fc6c6b891f97ec6, 0x87e8e8fbcd6f13e8
+	.quad 0x268787cb13a14c87, 0xdab8b8116d62a9b8, 0x0401010902050801, 0x214f4f0d9e6e424f
+	.quad 0xd836369b6ceead36, 0xa2a6a6ff510459a6, 0x6fd2d20cb9bdded2, 0xf3f5f50ef706fbf5
+	.quad 0xf9797996f280ef79, 0xa16f6f30dece5f6f, 0x7e91916d3feffc91, 0x555252f8a407aa52
+	.quad 0x9d606047c0fd2760, 0xcabcbc35657689bc, 0x569b9b372bcdac9b, 0x028e8e8a018c048e
+	.quad 0xb6a3a3d25b1571a3, 0x300c0c6c183c600c, 0xf17b7b84f68aff7b, 0xd43535806ae1b535
+	.quad 0x741d1df53a69e81d, 0xa7e0e0b3dd4753e0, 0x7bd7d721b3acf6d7, 0x2fc2c29c99ed5ec2
+	.quad 0xb82e2e435c966d2e, 0x314b4b29967a624b, 0xdffefe5de121a3fe, 0x415757d5ae168257
+	.quad 0x541515bd2a41a815, 0xc17777e8eeb69f77, 0xdc3737926eeba537, 0xb3e5e59ed7567be5
+	.quad 0x469f9f1323d98c9f, 0xe7f0f023fd17d3f0, 0x354a4a20947f6a4a, 0x4fdada44a9959eda
+	.quad 0x7d5858a2b025fa58, 0x03c9c9cf8fca06c9, 0xa429297c528d5529, 0x280a0a5a1422500a
+	.quad 0xfeb1b1507f4fe1b1, 0xbaa0a0c95d1a69a0, 0xb16b6b14d6da7f6b, 0x2e8585d917ab5c85
+	.quad 0xcebdbd3c677381bd, 0x695d5d8fba34d25d, 0x4010109020508010, 0xf7f4f407f503f3f4
+	.quad 0x0bcbcbdd8bc016cb, 0xf83e3ed37cc6ed3e, 0x1405052d0a112805, 0x81676778cee61f67
+	.quad 0xb7e4e497d55373e4, 0x9c2727024ebb2527, 0x1941417382583241, 0x168b8ba70b9d2c8b
+	.quad 0xa6a7a7f6530151a7, 0xe97d7db2fa94cf7d, 0x6e95954937fbdc95, 0x47d8d856ad9f8ed8
+	.quad 0xcbfbfb70eb308bfb, 0x9feeeecdc17123ee, 0xed7c7cbbf891c77c, 0x85666671cce31766
+	.quad 0x53dddd7ba78ea6dd, 0x5c1717af2e4bb817, 0x014747458e460247, 0x429e9e1a21dc849e
+	.quad 0x0fcacad489c51eca, 0xb42d2d585a99752d, 0xc6bfbf2e637991bf, 0x1c07073f0e1b3807
+	.quad 0x8eadadac472301ad, 0x755a5ab0b42fea5a, 0x368383ef1bb56c83, 0xcc3333b666ff8533
+	.quad 0x9163635cc6f23f63, 0x08020212040a1002, 0x92aaaa93493839aa, 0xd97171dee2a8af71
+	.quad 0x07c8c8c68dcf0ec8, 0x641919d1327dc819, 0x3949493b92707249, 0x43d9d95faf9a86d9
+	.quad 0xeff2f231f91dc3f2, 0xabe3e3a8db484be3, 0x715b5bb9b62ae25b, 0x1a8888bc0d923488
+	.quad 0x529a9a3e29c8a49a, 0x9826260b4cbe2d26, 0xc83232bf64fa8d32, 0xfab0b0597d4ae9b0
+	.quad 0x83e9e9f2cf6a1be9, 0x3c0f0f771e33780f, 0x73d5d533b7a6e6d5, 0x3a8080f41dba7480
+	.quad 0xc2bebe27617c99be, 0x13cdcdeb87de26cd, 0xd034348968e4bd34, 0x3d48483290757a48
+	.quad 0xdbffff54e324abff, 0xf57a7a8df48ff77a, 0x7a9090643deaf490, 0x615f5f9dbe3ec25f
+	.quad 0x8020203d40a01d20, 0xbd68680fd0d56768, 0x681a1aca3472d01a, 0x82aeaeb7412c19ae
+	.quad 0xeab4b47d755ec9b4, 0x4d5454cea8199a54, 0x7693937f3be5ec93, 0x8822222f44aa0d22
+	.quad 0x8d646463c8e90764, 0xe3f1f12aff12dbf1, 0xd17373cce6a2bf73, 0x48121282245a9012
+	.quad 0x1d40407a805d3a40, 0x2008084810284008, 0x2bc3c3959be856c3, 0x97ececdfc57b33ec
+	.quad 0x4bdbdb4dab9096db, 0xbea1a1c05f1f61a1, 0x0e8d8d9107831c8d, 0xf43d3dc87ac9f53d
+	.quad 0x6697975b33f1cc97, 0x0000000000000000, 0x1bcfcff983d436cf, 0xac2b2b6e5687452b
+	.quad 0xc57676e1ecb39776, 0x328282e619b06482, 0x7fd6d628b1a9fed6, 0x6c1b1bc33677d81b
+	.quad 0xeeb5b574775bc1b5, 0x86afafbe432911af, 0xb56a6a1dd4df776a, 0x5d5050eaa00dba50
+	.quad 0x094545578a4c1245, 0xebf3f338fb18cbf3, 0xc03030ad60f09d30, 0x9befefc4c3742bef
+	.quad 0xfc3f3fda7ec3e53f, 0x495555c7aa1c9255, 0xb2a2a2db591079a2, 0x8feaeae9c96503ea
+	.quad 0x8965656acaec0f65, 0xd2baba036968b9ba, 0xbc2f2f4a5e93652f, 0x27c0c08e9de74ec0
+	.quad 0x5fdede60a181bede, 0x701c1cfc386ce01c, 0xd3fdfd46e72ebbfd, 0x294d4d1f9a64524d
+	.quad 0x7292927639e0e492, 0xc97575faeabc8f75, 0x180606360c1e3006, 0x128a8aae0998248a
+	.quad 0xf2b2b24b7940f9b2, 0xbfe6e685d15963e6, 0x380e0e7e1c36700e, 0x7c1f1fe73e63f81f
+	.quad 0x95626255c4f73762, 0x77d4d43ab5a3eed4, 0x9aa8a8814d3229a8, 0x6296965231f4c496
+	.quad 0xc3f9f962ef3a9bf9, 0x33c5c5a397f666c5, 0x942525104ab13525, 0x795959abb220f259
+	.quad 0x2a8484d015ae5484, 0xd57272c5e4a7b772, 0xe43939ec72ddd539, 0x2d4c4c1698615a4c
+	.quad 0x655e5e94bc3bca5e, 0xfd78789ff085e778, 0xe03838e570d8dd38, 0x0a8c8c980586148c
+	.quad 0x63d1d117bfb2c6d1, 0xaea5a5e4570b41a5, 0xafe2e2a1d94d43e2, 0x9961614ec2f82f61
+	.quad 0xf6b3b3427b45f1b3, 0x8421213442a51521, 0x4a9c9c0825d6949c, 0x781e1eee3c66f01e
+	.quad 0x1143436186522243, 0x3bc7c7b193fc76c7, 0xd7fcfc4fe52bb3fc, 0x1004042408142004
+	.quad 0x595151e3a208b251, 0x5e9999252fc7bc99, 0xa96d6d22dac44f6d, 0x340d0d651a39680d
+	.quad 0xcffafa79e93583fa, 0x5bdfdf69a384b6df, 0xe57e7ea9fc9bd77e, 0x9024241948b43d24
+	.quad 0xec3b3bfe76d7c53b, 0x96abab9a4b3d31ab, 0x1fcecef081d13ece, 0x4411119922558811
+	.quad 0x068f8f8303890c8f, 0x254e4e049c6b4a4e, 0xe6b7b7667351d1b7, 0x8bebebe0cb600beb
+	.quad 0xf03c3cc178ccfd3c, 0x3e8181fd1fbf7c81, 0x6a94944035fed494, 0xfbf7f71cf30cebf7
+	.quad 0xdeb9b9186f67a1b9, 0x4c13138b265f9813, 0xb02c2c51589c7d2c, 0x6bd3d305bbb8d6d3
+	.quad 0xbbe7e78cd35c6be7, 0xa56e6e39dccb576e, 0x37c4c4aa95f36ec4, 0x0c03031b060f1803
+	.quad 0x455656dcac138a56, 0x0d44445e88491a44, 0xe17f7fa0fe9edf7f, 0x9ea9a9884f3721a9
+	.quad 0xa82a2a6754824d2a, 0xd6bbbb0a6b6db1bb, 0x23c1c1879fe246c1, 0x515353f1a602a253
+	.quad 0x57dcdc72a58baedc, 0x2c0b0b531627580b, 0x4e9d9d0127d39c9d, 0xad6c6c2bd8c1476c
+	.quad 0xc43131a462f59531, 0xcd7474f3e8b98774, 0xfff6f615f109e3f6, 0x0546464c8c430a46
+	.quad 0x8aacaca5452609ac, 0x1e8989b50f973c89, 0x501414b42844a014, 0xa3e1e1badf425be1
+	.quad 0x581616a62c4eb016, 0xe83a3af774d2cd3a, 0xb9696906d2d06f69, 0x24090941122d4809
+	.quad 0xdd7070d7e0ada770, 0xe2b6b66f7154d9b6, 0x67d0d01ebdb7ced0, 0x93ededd6c77e3bed
+	.quad 0x17cccce285db2ecc, 0x1542426884572a42, 0x5a98982c2dc2b498, 0xaaa4a4ed550e49a4
+	.quad 0xa028287550885d28, 0x6d5c5c86b831da5c, 0xc7f8f86bed3f93f8, 0x228686c211a44486
+
+	.quad 0x1818d83078c01860, 0x23232646af05238c, 0xc6c6b891f97ec63f, 0xe8e8fbcd6f13e887
+	.quad 0x8787cb13a14c8726, 0xb8b8116d62a9b8da, 0x0101090205080104, 0x4f4f0d9e6e424f21
+	.quad 0x36369b6ceead36d8, 0xa6a6ff510459a6a2, 0xd2d20cb9bdded26f, 0xf5f50ef706fbf5f3
+	.quad 0x797996f280ef79f9, 0x6f6f30dece5f6fa1, 0x91916d3feffc917e, 0x5252f8a407aa5255
+	.quad 0x606047c0fd27609d, 0xbcbc35657689bcca, 0x9b9b372bcdac9b56, 0x8e8e8a018c048e02
+	.quad 0xa3a3d25b1571a3b6, 0x0c0c6c183c600c30, 0x7b7b84f68aff7bf1, 0x3535806ae1b535d4
+	.quad 0x1d1df53a69e81d74, 0xe0e0b3dd4753e0a7, 0xd7d721b3acf6d77b, 0xc2c29c99ed5ec22f
+	.quad 0x2e2e435c966d2eb8, 0x4b4b29967a624b31, 0xfefe5de121a3fedf, 0x5757d5ae16825741
+	.quad 0x1515bd2a41a81554, 0x7777e8eeb69f77c1, 0x3737926eeba537dc, 0xe5e59ed7567be5b3
+	.quad 0x9f9f1323d98c9f46, 0xf0f023fd17d3f0e7, 0x4a4a20947f6a4a35, 0xdada44a9959eda4f
+	.quad 0x5858a2b025fa587d, 0xc9c9cf8fca06c903, 0x29297c528d5529a4, 0x0a0a5a1422500a28
+	.quad 0xb1b1507f4fe1b1fe, 0xa0a0c95d1a69a0ba, 0x6b6b14d6da7f6bb1, 0x8585d917ab5c852e
+	.quad 0xbdbd3c677381bdce, 0x5d5d8fba34d25d69, 0x1010902050801040, 0xf4f407f503f3f4f7
+	.quad 0xcbcbdd8bc016cb0b, 0x3e3ed37cc6ed3ef8, 0x05052d0a11280514, 0x676778cee61f6781
+	.quad 0xe4e497d55373e4b7, 0x2727024ebb25279c, 0x4141738258324119, 0x8b8ba70b9d2c8b16
+	.quad 0xa7a7f6530151a7a6, 0x7d7db2fa94cf7de9, 0x95954937fbdc956e, 0xd8d856ad9f8ed847
+	.quad 0xfbfb70eb308bfbcb, 0xeeeecdc17123ee9f, 0x7c7cbbf891c77ced, 0x666671cce3176685
+	.quad 0xdddd7ba78ea6dd53, 0x1717af2e4bb8175c, 0x4747458e46024701, 0x9e9e1a21dc849e42
+	.quad 0xcacad489c51eca0f, 0x2d2d585a99752db4, 0xbfbf2e637991bfc6, 0x07073f0e1b38071c
+	.quad 0xadadac472301ad8e, 0x5a5ab0b42fea5a75, 0x8383ef1bb56c8336, 0x3333b666ff8533cc
+	.quad 0x63635cc6f23f6391, 0x020212040a100208, 0xaaaa93493839aa92, 0x7171dee2a8af71d9
+	.quad 0xc8c8c68dcf0ec807, 0x1919d1327dc81964, 0x49493b9270724939, 0xd9d95faf9a86d943
+	.quad 0xf2f231f91dc3f2ef, 0xe3e3a8db484be3ab, 0x5b5bb9b62ae25b71, 0x8888bc0d9234881a
+	.quad 0x9a9a3e29c8a49a52, 0x26260b4cbe2d2698, 0x3232bf64fa8d32c8, 0xb0b0597d4ae9b0fa
+	.quad 0xe9e9f2cf6a1be983, 0x0f0f771e33780f3c, 0xd5d533b7a6e6d573, 0x8080f41dba74803a
+	.quad 0xbebe27617c99bec2, 0xcdcdeb87de26cd13, 0x34348968e4bd34d0, 0x48483290757a483d
+	.quad 0xffff54e324abffdb, 0x7a7a8df48ff77af5, 0x9090643deaf4907a, 0x5f5f9dbe3ec25f61
+	.quad 0x20203d40a01d2080, 0x68680fd0d56768bd, 0x1a1aca3472d01a68, 0xaeaeb7412c19ae82
+	.quad 0xb4b47d755ec9b4ea, 0x5454cea8199a544d, 0x93937f3be5ec9376, 0x22222f44aa0d2288
+	.quad 0x646463c8e907648d, 0xf1f12aff12dbf1e3, 0x7373cce6a2bf73d1, 0x121282245a901248
+	.quad 0x40407a805d3a401d, 0x0808481028400820, 0xc3c3959be856c32b, 0xececdfc57b33ec97
+	.quad 0xdbdb4dab9096db4b, 0xa1a1c05f1f61a1be, 0x8d8d9107831c8d0e, 0x3d3dc87ac9f53df4
+	.quad 0x97975b33f1cc9766, 0x0000000000000000, 0xcfcff983d436cf1b, 0x2b2b6e5687452bac
+	.quad 0x7676e1ecb39776c5, 0x8282e619b0648232, 0xd6d628b1a9fed67f, 0x1b1bc33677d81b6c
+	.quad 0xb5b574775bc1b5ee, 0xafafbe432911af86, 0x6a6a1dd4df776ab5, 0x5050eaa00dba505d
+	.quad 0x4545578a4c124509, 0xf3f338fb18cbf3eb, 0x3030ad60f09d30c0, 0xefefc4c3742bef9b
+	.quad 0x3f3fda7ec3e53ffc, 0x5555c7aa1c925549, 0xa2a2db591079a2b2, 0xeaeae9c96503ea8f
+	.quad 0x65656acaec0f6589, 0xbaba036968b9bad2, 0x2f2f4a5e93652fbc, 0xc0c08e9de74ec027
+	.quad 0xdede60a181bede5f, 0x1c1cfc386ce01c70, 0xfdfd46e72ebbfdd3, 0x4d4d1f9a64524d29
+	.quad 0x92927639e0e49272, 0x7575faeabc8f75c9, 0x0606360c1e300618, 0x8a8aae0998248a12
+	.quad 0xb2b24b7940f9b2f2, 0xe6e685d15963e6bf, 0x0e0e7e1c36700e38, 0x1f1fe73e63f81f7c
+	.quad 0x626255c4f7376295, 0xd4d43ab5a3eed477, 0xa8a8814d3229a89a, 0x96965231f4c49662
+	.quad 0xf9f962ef3a9bf9c3, 0xc5c5a397f666c533, 0x2525104ab1352594, 0x5959abb220f25979
+	.quad 0x8484d015ae54842a, 0x7272c5e4a7b772d5, 0x3939ec72ddd539e4, 0x4c4c1698615a4c2d
+	.quad 0x5e5e94bc3bca5e65, 0x78789ff085e778fd, 0x3838e570d8dd38e0, 0x8c8c980586148c0a
+	.quad 0xd1d117bfb2c6d163, 0xa5a5e4570b41a5ae, 0xe2e2a1d94d43e2af, 0x61614ec2f82f6199
+	.quad 0xb3b3427b45f1b3f6, 0x21213442a5152184, 0x9c9c0825d6949c4a, 0x1e1eee3c66f01e78
+	.quad 0x4343618652224311, 0xc7c7b193fc76c73b, 0xfcfc4fe52bb3fcd7, 0x0404240814200410
+	.quad 0x5151e3a208b25159, 0x9999252fc7bc995e, 0x6d6d22dac44f6da9, 0x0d0d651a39680d34
+	.quad 0xfafa79e93583facf, 0xdfdf69a384b6df5b, 0x7e7ea9fc9bd77ee5, 0x24241948b43d2490
+	.quad 0x3b3bfe76d7c53bec, 0xabab9a4b3d31ab96, 0xcecef081d13ece1f, 0x1111992255881144
+	.quad 0x8f8f8303890c8f06, 0x4e4e049c6b4a4e25, 0xb7b7667351d1b7e6, 0xebebe0cb600beb8b
+	.quad 0x3c3cc178ccfd3cf0, 0x8181fd1fbf7c813e, 0x94944035fed4946a, 0xf7f71cf30cebf7fb
+	.quad 0xb9b9186f67a1b9de, 0x13138b265f98134c, 0x2c2c51589c7d2cb0, 0xd3d305bbb8d6d36b
+	.quad 0xe7e78cd35c6be7bb, 0x6e6e39dccb576ea5, 0xc4c4aa95f36ec437, 0x03031b060f18030c
+	.quad 0x5656dcac138a5645, 0x44445e88491a440d, 0x7f7fa0fe9edf7fe1, 0xa9a9884f3721a99e
+	.quad 0x2a2a6754824d2aa8, 0xbbbb0a6b6db1bbd6, 0xc1c1879fe246c123, 0x5353f1a602a25351
+	.quad 0xdcdc72a58baedc57, 0x0b0b531627580b2c, 0x9d9d0127d39c9d4e, 0x6c6c2bd8c1476cad
+	.quad 0x3131a462f59531c4, 0x7474f3e8b98774cd, 0xf6f615f109e3f6ff, 0x46464c8c430a4605
+	.quad 0xacaca5452609ac8a, 0x8989b50f973c891e, 0x1414b42844a01450, 0xe1e1badf425be1a3
+	.quad 0x1616a62c4eb01658, 0x3a3af774d2cd3ae8, 0x696906d2d06f69b9, 0x090941122d480924
+	.quad 0x7070d7e0ada770dd, 0xb6b66f7154d9b6e2, 0xd0d01ebdb7ced067, 0xededd6c77e3bed93
+	.quad 0xcccce285db2ecc17, 0x42426884572a4215, 0x98982c2dc2b4985a, 0xa4a4ed550e49a4aa
+	.quad 0x28287550885d28a0, 0x5c5c86b831da5c6d, 0xf8f86bed3f93f8c7, 0x8686c211a4448622
+
+	.quad 0x18d83078c0186018, 0x232646af05238c23, 0xc6b891f97ec63fc6, 0xe8fbcd6f13e887e8
+	.quad 0x87cb13a14c872687, 0xb8116d62a9b8dab8, 0x0109020508010401, 0x4f0d9e6e424f214f
+	.quad 0x369b6ceead36d836, 0xa6ff510459a6a2a6, 0xd20cb9bdded26fd2, 0xf50ef706fbf5f3f5
+	.quad 0x7996f280ef79f979, 0x6f30dece5f6fa16f, 0x916d3feffc917e91, 0x52f8a407aa525552
+	.quad 0x6047c0fd27609d60, 0xbc35657689bccabc, 0x9b372bcdac9b569b, 0x8e8a018c048e028e
+	.quad 0xa3d25b1571a3b6a3, 0x0c6c183c600c300c, 0x7b84f68aff7bf17b, 0x35806ae1b535d435
+	.quad 0x1df53a69e81d741d, 0xe0b3dd4753e0a7e0, 0xd721b3acf6d77bd7, 0xc29c99ed5ec22fc2
+	.quad 0x2e435c966d2eb82e, 0x4b29967a624b314b, 0xfe5de121a3fedffe, 0x57d5ae1682574157
+	.quad 0x15bd2a41a8155415, 0x77e8eeb69f77c177, 0x37926eeba537dc37, 0xe59ed7567be5b3e5
+	.quad 0x9f1323d98c9f469f, 0xf023fd17d3f0e7f0, 0x4a20947f6a4a354a, 0xda44a9959eda4fda
+	.quad 0x58a2b025fa587d58, 0xc9cf8fca06c903c9, 0x297c528d5529a429, 0x0a5a1422500a280a
+	.quad 0xb1507f4fe1b1feb1, 0xa0c95d1a69a0baa0, 0x6b14d6da7f6bb16b, 0x85d917ab5c852e85
+	.quad 0xbd3c677381bdcebd, 0x5d8fba34d25d695d, 0x1090205080104010, 0xf407f503f3f4f7f4
+	.quad 0xcbdd8bc016cb0bcb, 0x3ed37cc6ed3ef83e, 0x052d0a1128051405, 0x6778cee61f678167
+	.quad 0xe497d55373e4b7e4, 0x27024ebb25279c27, 0x4173825832411941, 0x8ba70b9d2c8b168b
+	.quad 0xa7f6530151a7a6a7, 0x7db2fa94cf7de97d, 0x954937fbdc956e95, 0xd856ad9f8ed847d8
+	.quad 0xfb70eb308bfbcbfb, 0xeecdc17123ee9fee, 0x7cbbf891c77ced7c, 0x6671cce317668566
+	.quad 0xdd7ba78ea6dd53dd, 0x17af2e4bb8175c17, 0x47458e4602470147, 0x9e1a21dc849e429e
+	.quad 0xcad489c51eca0fca, 0x2d585a99752db42d, 0xbf2e637991bfc6bf, 0x073f0e1b38071c07
+	.quad 0xadac472301ad8ead, 0x5ab0b42fea5a755a, 0x83ef1bb56c833683, 0x33b666ff8533cc33
+	.quad 0x635cc6f23f639163, 0x0212040a10020802, 0xaa93493839aa92aa, 0x71dee2a8af71d971
+	.quad 0xc8c68dcf0ec807c8, 0x19d1327dc8196419, 0x493b927072493949, 0xd95faf9a86d943d9
+	.quad 0xf231f91dc3f2eff2, 0xe3a8db484be3abe3, 0x5bb9b62ae25b715b, 0x88bc0d9234881a88
+	.quad 0x9a3e29c8a49a529a, 0x260b4cbe2d269826, 0x32bf64fa8d32c832, 0xb0597d4ae9b0fab0
+	.quad 0xe9f2cf6a1be983e9, 0x0f771e33780f3c0f, 0xd533b7a6e6d573d5, 0x80f41dba74803a80
+	.quad 0xbe27617c99bec2be, 0xcdeb87de26cd13cd, 0x348968e4bd34d034, 0x483290757a483d48
+	.quad 0xff54e324abffdbff, 0x7a8df48ff77af57a, 0x90643deaf4907a90, 0x5f9dbe3ec25f615f
+	.quad 0x203d40a01d208020, 0x680fd0d56768bd68, 0x1aca3472d01a681a, 0xaeb7412c19ae82ae
+	.quad 0xb47d755ec9b4eab4, 0x54cea8199a544d54, 0x937f3be5ec937693, 0x222f44aa0d228822
+	.quad 0x6463c8e907648d64, 0xf12aff12dbf1e3f1, 0x73cce6a2bf73d173, 0x1282245a90124812
+	.quad 0x407a805d3a401d40, 0x0848102840082008, 0xc3959be856c32bc3, 0xecdfc57b33ec97ec
+	.quad 0xdb4dab9096db4bdb, 0xa1c05f1f61a1bea1, 0x8d9107831c8d0e8d, 0x3dc87ac9f53df43d
+	.quad 0x975b33f1cc976697, 0x0000000000000000, 0xcff983d436cf1bcf, 0x2b6e5687452bac2b
+	.quad 0x76e1ecb39776c576, 0x82e619b064823282, 0xd628b1a9fed67fd6, 0x1bc33677d81b6c1b
+	.quad 0xb574775bc1b5eeb5, 0xafbe432911af86af, 0x6a1dd4df776ab56a, 0x50eaa00dba505d50
+	.quad 0x45578a4c12450945, 0xf338fb18cbf3ebf3, 0x30ad60f09d30c030, 0xefc4c3742bef9bef
+	.quad 0x3fda7ec3e53ffc3f, 0x55c7aa1c92554955, 0xa2db591079a2b2a2, 0xeae9c96503ea8fea
+	.quad 0x656acaec0f658965, 0xba036968b9bad2ba, 0x2f4a5e93652fbc2f, 0xc08e9de74ec027c0
+	.quad 0xde60a181bede5fde, 0x1cfc386ce01c701c, 0xfd46e72ebbfdd3fd, 0x4d1f9a64524d294d
+	.quad 0x927639e0e4927292, 0x75faeabc8f75c975, 0x06360c1e30061806, 0x8aae0998248a128a
+	.quad 0xb24b7940f9b2f2b2, 0xe685d15963e6bfe6, 0x0e7e1c36700e380e, 0x1fe73e63f81f7c1f
+	.quad 0x6255c4f737629562, 0xd43ab5a3eed477d4, 0xa8814d3229a89aa8, 0x965231f4c4966296
+	.quad 0xf962ef3a9bf9c3f9, 0xc5a397f666c533c5, 0x25104ab135259425, 0x59abb220f2597959
+	.quad 0x84d015ae54842a84, 0x72c5e4a7b772d572, 0x39ec72ddd539e439, 0x4c1698615a4c2d4c
+	.quad 0x5e94bc3bca5e655e, 0x789ff085e778fd78, 0x38e570d8dd38e038, 0x8c980586148c0a8c
+	.quad 0xd117bfb2c6d163d1, 0xa5e4570b41a5aea5, 0xe2a1d94d43e2afe2, 0x614ec2f82f619961
+	.quad 0xb3427b45f1b3f6b3, 0x213442a515218421, 0x9c0825d6949c4a9c, 0x1eee3c66f01e781e
+	.quad 0x4361865222431143, 0xc7b193fc76c73bc7, 0xfc4fe52bb3fcd7fc, 0x0424081420041004
+	.quad 0x51e3a208b2515951, 0x99252fc7bc995e99, 0x6d22dac44f6da96d, 0x0d651a39680d340d
+	.quad 0xfa79e93583facffa, 0xdf69a384b6df5bdf, 0x7ea9fc9bd77ee57e, 0x241948b43d249024
+	.quad 0x3bfe76d7c53bec3b, 0xab9a4b3d31ab96ab, 0xcef081d13ece1fce, 0x1199225588114411
+	.quad 0x8f8303890c8f068f, 0x4e049c6b4a4e254e, 0xb7667351d1b7e6b7, 0xebe0cb600beb8beb
+	.quad 0x3cc178ccfd3cf03c, 0x81fd1fbf7c813e81, 0x944035fed4946a94, 0xf71cf30cebf7fbf7
+	.quad 0xb9186f67a1b9deb9, 0x138b265f98134c13, 0x2c51589c7d2cb02c, 0xd305bbb8d6d36bd3
+	.quad 0xe78cd35c6be7bbe7, 0x6e39dccb576ea56e, 0xc4aa95f36ec437c4, 0x031b060f18030c03
+	.quad 0x56dcac138a564556, 0x445e88491a440d44, 0x7fa0fe9edf7fe17f, 0xa9884f3721a99ea9
+	.quad 0x2a6754824d2aa82a, 0xbb0a6b6db1bbd6bb, 0xc1879fe246c123c1, 0x53f1a602a2535153
+	.quad 0xdc72a58baedc57dc, 0x0b531627580b2c0b, 0x9d0127d39c9d4e9d, 0x6c2bd8c1476cad6c
+	.quad 0x31a462f59531c431, 0x74f3e8b98774cd74, 0xf615f109e3f6fff6, 0x464c8c430a460546
+	.quad 0xaca5452609ac8aac, 0x89b50f973c891e89, 0x14b42844a0145014, 0xe1badf425be1a3e1
+	.quad 0x16a62c4eb0165816, 0x3af774d2cd3ae83a, 0x6906d2d06f69b969, 0x0941122d48092409
+	.quad 0x70d7e0ada770dd70, 0xb66f7154d9b6e2b6, 0xd01ebdb7ced067d0, 0xedd6c77e3bed93ed
+	.quad 0xcce285db2ecc17cc, 0x426884572a421542, 0x982c2dc2b4985a98, 0xa4ed550e49a4aaa4
+	.quad 0x287550885d28a028, 0x5c86b831da5c6d5c, 0xf86bed3f93f8c7f8, 0x86c211a444862286
diff -ruNp libgcrypt-1.6.2/cipher/whirlpool.c libgcrypt-1.6.3/cipher/whirlpool.c
--- libgcrypt-1.6.2/cipher/whirlpool.c	2014-08-21 07:50:39.000000000 -0500
+++ libgcrypt-1.6.3/cipher/whirlpool.c	2014-08-28 20:40:19.695123491 -0500
 <at>  <at>  -46,7 +46,11  <at>  <at> 
 /* Number of rounds.  */
 #define R 10
 
-
+/* USE_AMD64_ASM indicates whether to use AMD64 assembly code. */
+#undef USE_AMD64_ASM
+#if defined(__x86_64__) && defined(HAVE_COMPATIBLE_GCC_AMD64_PLATFORM_AS)
+# define USE_AMD64_ASM 1
+#endif
 
 /* Types.  */
 typedef u64 whirlpool_block_t[BLOCK_SIZE / 8];
 <at>  <at>  -73,9 +77,14  <at>  <at>  typedef struct {
 
 /* Convert the block BLOCK into a buffer BUFFER, using I as
    counter.  */
+#ifdef USE_AMD64_ASM
 #define block_to_buffer(buffer, block, i) \
   for (i = 0; i < 8; i++) \
+	buf_put_le64((buffer) + i * 8, (block)[i]);
+#else
+  for (i = 0; i < 8; i++) \
     buf_put_be64((buffer) + i * 8, (block)[i]);
+#endif
 
 /* Copy the block BLOCK_SRC to BLOCK_DST, using I as counter.  */
 #define block_copy(block_dst, block_src, i) \
 <at>  <at>  -1164,7 +1173,19  <at>  <at>  static const u64 C7[256] =
   };
 
 
-?
+#ifdef USE_AMD64_ASM
+/* Assembly implementation of Whirlpool */
+unsigned int
+_gcry_whirlpool_transform_amd64 (void *state, const unsigned char *data);
+
+static unsigned int
+whirlpool_transform (void *ctx, const unsigned char *data)
+{
+	whirlpool_context_t *context = ctx;
+	return _gcry_whirlpool_transform_amd64 (context->hash_state, data)
+           + 4 * sizeof(void*);
+}
+#else /*!USE_AMD64_ASM*/
 /*
  * Transform block.
  */
 <at>  <at>  -1267,7 +1288,7  <at>  <at>  whirlpool_transform (void *ctx, const un
   return /*burn_stack*/ 4 * sizeof(whirlpool_block_t) + 2 * sizeof(int) +
                         4 * sizeof(void*);
 }
-
+#endif /*!USE_AMD64_ASM*/
 
 static void
 whirlpool_init (void *ctx, unsigned int flags)
diff -ruNp libgcrypt-1.6.2/configure libgcrypt-1.6.3/configure
--- libgcrypt-1.6.2/configure	2014-08-21 08:14:09.000000000 -0500
+++ libgcrypt-1.6.3/configure	2014-08-28 20:04:11.750250587 -0500
 <at>  <at>  -18033,6 +18033,13  <at>  <at>  if test "$found" = "1" ; then
 
 $as_echo "#define USE_WHIRLPOOL 1" >>confdefs.h
 
+
+   case "${host}" in
+      x86_64-*-*)
+         # Build with the assembly implementation
+         GCRYPT_DIGESTS="$GCRYPT_DIGESTS whirlpool-amd64.lo"
+      ;;
+   esac
 fi
 
 # rmd160 and sha1 should be included always.
diff -ruNp libgcrypt-1.6.2/configure.ac libgcrypt-1.6.3/configure.ac
--- libgcrypt-1.6.2/configure.ac	2014-08-21 08:03:35.000000000 -0500
+++ libgcrypt-1.6.3/configure.ac	2014-08-28 20:04:11.751250575 -0500
 <at>  <at>  -1866,6 +1866,13  <at>  <at>  LIST_MEMBER(whirlpool, $enabled_digests)
 if test "$found" = "1" ; then
    GCRYPT_DIGESTS="$GCRYPT_DIGESTS whirlpool.lo"
    AC_DEFINE(USE_WHIRLPOOL, 1, [Defined if this module should be included])
+
+   case "${host}" in
+      x86_64-*-*)
+         # Build with the assembly implementation
+         GCRYPT_DIGESTS="$GCRYPT_DIGESTS whirlpool-amd64.lo"
+      ;;
+   esac
 fi
 
 # rmd160 and sha1 should be included always.

____________________________________________________________
Can't remember your password? Do you need a strong and secure password?
Use Password manager! It stores your passwords & protects your account.
Check it out at http://mysecurelogon.com/password-manager
by Werner Koch | 29 Aug 14:54 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-109-gdb3c028

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  db3c0286bf159568aa315d15f9708fe2de02b022 (commit)
      from  e606d5f1bada1f2d21faeedd3fa2cf2dca7b274c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit db3c0286bf159568aa315d15f9708fe2de02b022
Author: Werner Koch <wk <at> gnupg.org>
Date:   Fri Aug 29 14:54:11 2014 +0200

    mpi: Re-indent longlong.h.
    
    --
    Indenting the cpp statements should make longlong.h better readable.

diff --git a/mpi/longlong.h b/mpi/longlong.h
index 4f33937..db98e47 100644
--- a/mpi/longlong.h
+++ b/mpi/longlong.h
 <at>  <at>  -1,5 +1,6  <at>  <at> 
 /* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
-   Note: I added some stuff for use with gnupg
+   Note: This is the Libgcrypt version
+
 
 Copyright (C) 1991, 1992, 1993, 1994, 1996, 1998,
               2000, 2001, 2002, 2003, 2004, 2011 Free Software Foundation, Inc.
 <at>  <at>  -41,7 +42,7  <at>  <at>  MA 02111-1307, USA. */
 /* This is used to make sure no undesirable sharing between different libraries
    that use this file takes place.  */
 #ifndef __MPN
-#define __MPN(x) __##x
+# define __MPN(x) __##x
 #endif
 
 /* Define auxiliary asm macros.
 <at>  <at>  -102,19 +103,22  <at>  <at>  MA 02111-1307, USA. */
 /* We sometimes need to clobber "cc" with gcc2, but that would not be
    understood by gcc1.	Use cpp to avoid major code duplication.  */
 #if __GNUC__ < 2
-#define __CLOBBER_CC
-#define __AND_CLOBBER_CC
+# define __CLOBBER_CC
+# define __AND_CLOBBER_CC
 #else /* __GNUC__ >= 2 */
-#define __CLOBBER_CC : "cc"
-#define __AND_CLOBBER_CC , "cc"
+# define __CLOBBER_CC : "cc"
+# define __AND_CLOBBER_CC , "cc"
 #endif /* __GNUC__ < 2 */
 
+/***************************************
+ ****  Begin CPU Specific Versions  ****
+ ***************************************/
 
 /***************************************
  **************  A29K  *****************
  ***************************************/
 #if (defined (__a29k__) || defined (_AM29K)) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add %1,%4,%5\n"   \
            "addc %0,%2,%3"                                              \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -123,7 +127,7  <at>  <at>  MA 02111-1307, USA. */
 	     "rI" ((USItype)(bh)),                                      \
 	     "%r" ((USItype)(al)),                                      \
 	     "rI" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub %1,%4,%5\n"                                             \
 	   "subc %0,%2,%3"                                              \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -132,7 +136,7  <at>  <at>  MA 02111-1307, USA. */
 	     "rI" ((USItype)(bh)),                                      \
 	     "r" ((USItype)(al)),                                       \
 	     "rI" ((USItype)(bl)))
-#define umul_ppmm(xh, xl, m0, m1) \
+# define umul_ppmm(xh, xl, m0, m1) \
   do {									\
     USItype __m0 = (m0), __m1 = (m1);					\
     __asm__ ("multiplu %0,%1,%2"                                        \
 <at>  <at>  -144,23 +148,23  <at>  <at>  MA 02111-1307, USA. */
 	     : "r" (__m0),                                              \
 	       "r" (__m1));                                             \
   } while (0)
-#define udiv_qrnnd(q, r, n1, n0, d) \
+# define udiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("dividu %0,%3,%4"                                            \
 	   : "=r" ((USItype)(q)),                                       \
 	     "=q" ((USItype)(r))                                        \
 	   : "1" ((USItype)(n1)),                                       \
 	     "r" ((USItype)(n0)),                                       \
 	     "r" ((USItype)(d)))
-#define count_leading_zeros(count, x) \
+# define count_leading_zeros(count, x) \
     __asm__ ("clz %0,%1"                                                \
 	     : "=r" ((USItype)(count))                                  \
 	     : "r" ((USItype)(x)))
-#define COUNT_LEADING_ZEROS_0 32
+# define COUNT_LEADING_ZEROS_0 32
 #endif /* __a29k__ */
 
 
 #if defined (__alpha) && W_TYPE_SIZE == 64
-#define umul_ppmm(ph, pl, m0, m1) \
+# define umul_ppmm(ph, pl, m0, m1) \
   do {									\
     UDItype __m0 = (m0), __m1 = (m1);					\
     __asm__ ("umulh %r1,%2,%0"                                          \
 <at>  <at>  -169,16 +173,16  <at>  <at>  MA 02111-1307, USA. */
 	       "rI" (__m1));                                            \
     (pl) = __m0 * __m1; 						\
   } while (0)
-#define UMUL_TIME 46
-#ifndef LONGLONG_STANDALONE
-#define udiv_qrnnd(q, r, n1, n0, d) \
+# define UMUL_TIME 46
+# ifndef LONGLONG_STANDALONE
+#  define udiv_qrnnd(q, r, n1, n0, d) \
   do { UDItype __r;							\
     (q) = __udiv_qrnnd (&__r, (n1), (n0), (d)); 			\
     (r) = __r;								\
   } while (0)
 extern UDItype __udiv_qrnnd ();
-#define UDIV_TIME 220
-#endif /* LONGLONG_STANDALONE */
+#  define UDIV_TIME 220
+# endif /* !LONGLONG_STANDALONE */
 #endif /* __alpha */
 
 /***************************************
 <at>  <at>  -187,30 +191,31  <at>  <at>  extern UDItype __udiv_qrnnd ();
 #if defined (__arm__) && W_TYPE_SIZE == 32 && \
     (!defined (__thumb__) || defined (__thumb2__))
 /* The __ARM_ARCH define is provided by gcc 4.8.  Construct it otherwise.  */
-#ifndef __ARM_ARCH
-# ifdef __ARM_ARCH_2__
-#  define __ARM_ARCH 2
-# elif defined (__ARM_ARCH_3__) || defined (__ARM_ARCH_3M__)
-#  define __ARM_ARCH 3
-# elif defined (__ARM_ARCH_4__) || defined (__ARM_ARCH_4T__)
-#  define __ARM_ARCH 4
-# elif defined (__ARM_ARCH_5__) || defined (__ARM_ARCH_5E__) \
-       || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
-       || defined(__ARM_ARCH_5TEJ__)
-#  define __ARM_ARCH 5
-# elif defined (__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
-       || defined (__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
-       || defined (__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
-#  define __ARM_ARCH 6
-# elif defined (__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
-       || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
-       || defined(__ARM_ARCH_7EM__)
-#  define __ARM_ARCH 7
-# else
+# ifndef __ARM_ARCH
+#  ifdef __ARM_ARCH_2__
+#   define __ARM_ARCH 2
+#  elif defined (__ARM_ARCH_3__) || defined (__ARM_ARCH_3M__)
+#   define __ARM_ARCH 3
+#  elif defined (__ARM_ARCH_4__) || defined (__ARM_ARCH_4T__)
+#   define __ARM_ARCH 4
+#  elif defined (__ARM_ARCH_5__) || defined (__ARM_ARCH_5E__) \
+        || defined(__ARM_ARCH_5T__) || defined(__ARM_ARCH_5TE__) \
+        || defined(__ARM_ARCH_5TEJ__)
+#   define __ARM_ARCH 5
+#  elif defined (__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) \
+        || defined (__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) \
+        || defined (__ARM_ARCH_6K__) || defined(__ARM_ARCH_6T2__)
+#   define __ARM_ARCH 6
+#  elif defined (__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \
+        || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \
+        || defined(__ARM_ARCH_7EM__)
+#   define __ARM_ARCH 7
+#  else
    /* could not detect? */
-# endif
-#endif
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+#  endif
+# endif /* !__ARM_ARCH */
+
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("adds %1, %4, %5\n"                                          \
 	   "adc  %0, %2, %3"                                            \
 	   : "=r" ((sh)),                                               \
 <at>  <at>  -219,7 +224,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	     "rI" ((USItype)(bh)),                                      \
 	     "%r" ((USItype)(al)),                                      \
 	     "rI" ((USItype)(bl)) __CLOBBER_CC)
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subs %1, %4, %5\n"                                          \
 	   "sbc  %0, %2, %3"                                            \
 	   : "=r" ((sh)),                                               \
 <at>  <at>  -228,8 +233,8  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	     "rI" ((USItype)(bh)),                                      \
 	     "r" ((USItype)(al)),                                       \
 	     "rI" ((USItype)(bl)) __CLOBBER_CC)
-#if (defined __ARM_ARCH && __ARM_ARCH <= 3)
-#define umul_ppmm(xh, xl, a, b) \
+# if (defined __ARM_ARCH && __ARM_ARCH <= 3)
+#  define umul_ppmm(xh, xl, a, b) \
   __asm__ (" <at>  Inlined umul_ppmm\n"                                      \
 	"mov	%|r0, %2, lsr #16		 <at>  AAAA\n"               \
 	"mov	%|r2, %3, lsr #16		 <at>  BBBB\n"               \
 <at>  <at>  -248,30 +253,30  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	   : "r" ((USItype)(a)),                                        \
 	     "r" ((USItype)(b))                                         \
 	   : "r0", "r1", "r2" __AND_CLOBBER_CC)
-#else /* __ARM_ARCH >= 4 */
-#define umul_ppmm(xh, xl, a, b)                                         \
+# else /* __ARM_ARCH >= 4 */
+#  define umul_ppmm(xh, xl, a, b)                                         \
   __asm__ (" <at>  Inlined umul_ppmm\n"                                      \
 	   "umull %1, %0, %2, %3"                                       \
 		   : "=&r" ((xh)),                                      \
 		     "=r" ((xl))                                        \
 		   : "r" ((USItype)(a)),                                \
 		     "r" ((USItype)(b)))
-#endif /* __ARM_ARCH >= 4 */
-#define UMUL_TIME 20
-#define UDIV_TIME 100
-#if (defined __ARM_ARCH && __ARM_ARCH >= 5)
-#define count_leading_zeros(count, x) \
+# endif /* __ARM_ARCH >= 4 */
+# define UMUL_TIME 20
+# define UDIV_TIME 100
+# if (defined __ARM_ARCH && __ARM_ARCH >= 5)
+#  define count_leading_zeros(count, x) \
   __asm__ ("clz %0, %1"                                                 \
 		   : "=r" ((count))                                     \
 		   : "r" ((USItype)(x)))
-#endif /* __ARM_ARCH >= 5 */
+# endif /* __ARM_ARCH >= 5 */
 #endif /* __arm__ */
 
 /***************************************
  **********  ARM64 / Aarch64  **********
  ***************************************/
 #if defined(__aarch64__) && W_TYPE_SIZE == 64
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("adds %1, %4, %5\n"                                          \
            "adc  %0, %2, %3\n"                                          \
            : "=r" ((sh)),                                               \
 <at>  <at>  -280,7 +285,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
              "r" ((UDItype)(bh)),                                       \
              "r" ((UDItype)(al)),                                       \
              "r" ((UDItype)(bl)) __CLOBBER_CC)
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subs %1, %4, %5\n"                                          \
            "sbc  %0, %2, %3\n"                                          \
            : "=r" ((sh)),                                               \
 <at>  <at>  -289,7 +294,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
              "r" ((UDItype)(bh)),                                       \
              "r" ((UDItype)(al)),                                       \
              "r" ((UDItype)(bl)) __CLOBBER_CC)
-#define umul_ppmm(ph, pl, m0, m1) \
+# define umul_ppmm(ph, pl, m0, m1) \
   do {                                                                  \
     UDItype __m0 = (m0), __m1 = (m1), __ph;                             \
     (pl) = __m0 * __m1;                                                 \
 <at>  <at>  -299,7 +304,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
                "r" (__m1));                                             \
     (ph) = __ph; \
   } while (0)
-#define count_leading_zeros(count, x) \
+# define count_leading_zeros(count, x) \
   __asm__ ("clz %0, %1\n"                                               \
            : "=r" ((count))                                             \
            : "r" ((UDItype)(x)))
 <at>  <at>  -309,7 +314,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
  **************  CLIPPER  **************
  ***************************************/
 #if defined (__clipper__) && W_TYPE_SIZE == 32
-#define umul_ppmm(w1, w0, u, v) \
+# define umul_ppmm(w1, w0, u, v) \
   ({union {UDItype __ll;						\
 	   struct {USItype __l, __h;} __i;				\
 	  } __xx;							\
 <at>  <at>  -318,7 +323,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	   : "%0" ((USItype)(u)),                                       \
 	     "r" ((USItype)(v)));                                       \
   (w1) = __xx.__i.__h; (w0) = __xx.__i.__l;})
-#define smul_ppmm(w1, w0, u, v) \
+# define smul_ppmm(w1, w0, u, v) \
   ({union {DItype __ll; 						\
 	   struct {SItype __l, __h;} __i;				\
 	  } __xx;							\
 <at>  <at>  -327,7 +332,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	   : "%0" ((SItype)(u)),                                        \
 	     "r" ((SItype)(v)));                                        \
   (w1) = __xx.__i.__h; (w0) = __xx.__i.__l;})
-#define __umulsidi3(u, v) \
+# define __umulsidi3(u, v) \
   ({UDItype __w;							\
     __asm__ ("mulwux %2,%0"                                             \
 	     : "=r" (__w)                                               \
 <at>  <at>  -341,7 +346,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
  **************  GMICRO  ***************
  ***************************************/
 #if defined (__gmicro__) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add.w %5,%1\n"                                              \
 	   "addx %3,%0"                                                 \
 	   : "=g" ((USItype)(sh)),                                      \
 <at>  <at>  -350,7 +355,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	     "g" ((USItype)(bh)),                                       \
 	     "%1" ((USItype)(al)),                                      \
 	     "g" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub.w %5,%1\n"                                              \
 	   "subx %3,%0"                                                 \
 	   : "=g" ((USItype)(sh)),                                      \
 <at>  <at>  -359,20 +364,20  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	     "g" ((USItype)(bh)),                                       \
 	     "1" ((USItype)(al)),                                       \
 	     "g" ((USItype)(bl)))
-#define umul_ppmm(ph, pl, m0, m1) \
+# define umul_ppmm(ph, pl, m0, m1) \
   __asm__ ("mulx %3,%0,%1"                                              \
 	   : "=g" ((USItype)(ph)),                                      \
 	     "=r" ((USItype)(pl))                                       \
 	   : "%0" ((USItype)(m0)),                                      \
 	     "g" ((USItype)(m1)))
-#define udiv_qrnnd(q, r, nh, nl, d) \
+# define udiv_qrnnd(q, r, nh, nl, d) \
   __asm__ ("divx %4,%0,%1"                                              \
 	   : "=g" ((USItype)(q)),                                       \
 	     "=r" ((USItype)(r))                                        \
 	   : "1" ((USItype)(nh)),                                       \
 	     "0" ((USItype)(nl)),                                       \
 	     "g" ((USItype)(d)))
-#define count_leading_zeros(count, x) \
+# define count_leading_zeros(count, x) \
   __asm__ ("bsch/1 %1,%0"                                               \
 	   : "=g" (count)                                               \
 	   : "g" ((USItype)(x)),                                        \
 <at>  <at>  -384,7 +389,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
  **************  HPPA  *****************
  ***************************************/
 #if defined (__hppa) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("	add %4,%5,%1\n"                                             \
  	   "	addc %2,%3,%0"                                              \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -393,7 +398,7  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	     "rM" ((USItype)(bh)),                                      \
 	     "%rM" ((USItype)(al)),                                     \
 	     "rM" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("	sub %4,%5,%1\n"                                             \
 	   "	subb %2,%3,%0"                                              \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -402,8 +407,8  <at>  <at>  extern UDItype __udiv_qrnnd ();
 	     "rM" ((USItype)(bh)),                                      \
 	     "rM" ((USItype)(al)),                                      \
 	     "rM" ((USItype)(bl)))
-#if defined (_PA_RISC1_1)
-#define umul_ppmm(wh, wl, u, v) \
+# if defined (_PA_RISC1_1)
+#  define umul_ppmm(wh, wl, u, v) \
   do {									\
     union {UDItype __ll;						\
 	   struct {USItype __h, __l;} __i;				\
 <at>  <at>  -415,21 +420,21  <at>  <at>  extern UDItype __udiv_qrnnd ();
     (wh) = __xx.__i.__h;						\
     (wl) = __xx.__i.__l;						\
   } while (0)
-#define UMUL_TIME 8
-#define UDIV_TIME 60
-#else
-#define UMUL_TIME 40
-#define UDIV_TIME 80
-#endif
-#ifndef LONGLONG_STANDALONE
-#define udiv_qrnnd(q, r, n1, n0, d) \
+#  define UMUL_TIME 8
+#  define UDIV_TIME 60
+# else
+#  define UMUL_TIME 40
+#  define UDIV_TIME 80
+# endif
+# ifndef LONGLONG_STANDALONE
+#  define udiv_qrnnd(q, r, n1, n0, d) \
   do { USItype __r;							\
     (q) = __udiv_qrnnd (&__r, (n1), (n0), (d)); 			\
     (r) = __r;								\
   } while (0)
 extern USItype __udiv_qrnnd ();
-#endif /* LONGLONG_STANDALONE */
-#define count_leading_zeros(count, x) \
+# endif /* !LONGLONG_STANDALONE */
+# define count_leading_zeros(count, x) \
   do {								       \
     USItype __tmp;						       \
     __asm__ (				                               \
 <at>  <at>  -457,7 +462,7  <at>  <at>  extern USItype __udiv_qrnnd ();
  **************  I370  *****************
  ***************************************/
 #if (defined (__i370__) || defined (__mvs__)) && W_TYPE_SIZE == 32
-#define umul_ppmm(xh, xl, m0, m1) \
+# define umul_ppmm(xh, xl, m0, m1) \
   do {									\
     union {UDItype __ll;						\
 	   struct {USItype __h, __l;} __i;				\
 <at>  <at>  -472,7 +477,7  <at>  <at>  extern USItype __udiv_qrnnd ();
     (xh) += ((((SItype) __m0 >> 31) & __m1)				\
 	     + (((SItype) __m1 >> 31) & __m0)); 			\
   } while (0)
-#define smul_ppmm(xh, xl, m0, m1) \
+# define smul_ppmm(xh, xl, m0, m1) \
   do {									\
     union {DItype __ll; 						\
 	   struct {USItype __h, __l;} __i;				\
 <at>  <at>  -484,7 +489,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	       "r" (m1));                                               \
     (xh) = __xx.__i.__h; (xl) = __xx.__i.__l;				\
   } while (0)
-#define sdiv_qrnnd(q, r, n1, n0, d) \
+# define sdiv_qrnnd(q, r, n1, n0, d) \
   do {									\
     union {DItype __ll; 						\
 	   struct {USItype __h, __l;} __i;				\
 <at>  <at>  -502,7 +507,7  <at>  <at>  extern USItype __udiv_qrnnd ();
  **************  I386  *****************
  ***************************************/
 #if (defined (__i386__) || defined (__i486__)) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addl %5,%1\n"                                               \
 	   "adcl %3,%0"                                                 \
 	   : "=r" ((sh)),                                               \
 <at>  <at>  -512,7 +517,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "%1" ((USItype)(al)),                                      \
 	     "g" ((USItype)(bl))                                        \
 	   __CLOBBER_CC)
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subl %5,%1\n"                                               \
 	   "sbbl %3,%0"                                                 \
 	   : "=r" ((sh)),                                               \
 <at>  <at>  -522,14 +527,14  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "1" ((USItype)(al)),                                       \
 	     "g" ((USItype)(bl))                                        \
 	   __CLOBBER_CC)
-#define umul_ppmm(w1, w0, u, v) \
+# define umul_ppmm(w1, w0, u, v) \
   __asm__ ("mull %3"                                                    \
 	   : "=a" ((w0)),                                               \
 	     "=d" ((w1))                                                \
 	   : "%0" ((USItype)(u)),                                       \
 	     "rm" ((USItype)(v))                                        \
 	   __CLOBBER_CC)
-#define udiv_qrnnd(q, r, n1, n0, d) \
+# define udiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("divl %4"                                                    \
 	   : "=a" ((q)),                                                \
 	     "=d" ((r))                                                 \
 <at>  <at>  -537,7 +542,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "1" ((USItype)(n1)),                                       \
 	     "rm" ((USItype)(d))                                        \
 	   __CLOBBER_CC)
-#define count_leading_zeros(count, x) \
+# define count_leading_zeros(count, x) \
   do {									\
     USItype __cbtmp;							\
     __asm__ ("bsrl %1,%0"                                               \
 <at>  <at>  -545,21 +550,21  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     __CLOBBER_CC);						\
     (count) = __cbtmp ^ 31;						\
   } while (0)
-#define count_trailing_zeros(count, x) \
+# define count_trailing_zeros(count, x) \
   __asm__ ("bsfl %1,%0" : "=r" (count) : "rm" ((USItype)(x)) __CLOBBER_CC)
-#ifndef UMUL_TIME
-#define UMUL_TIME 40
-#endif
-#ifndef UDIV_TIME
-#define UDIV_TIME 40
-#endif
+# ifndef UMUL_TIME
+#  define UMUL_TIME 40
+# endif
+# ifndef UDIV_TIME
+#  define UDIV_TIME 40
+# endif
 #endif /* 80x86 */
 
 /***************************************
  *********** AMD64 / x86-64 ************
  ***************************************/
 #if defined(__x86_64) && W_TYPE_SIZE == 64
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addq %5,%1\n"                                               \
 	   "adcq %3,%0"                                                 \
 	   : "=r" ((sh)),                                               \
 <at>  <at>  -569,7 +574,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "1" ((UDItype)(al)),                                       \
 	     "g"  ((UDItype)(bl))                                       \
 	   __CLOBBER_CC)
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subq %5,%1\n"                                               \
 	   "sbbq %3,%0"                                                 \
 	   : "=r" ((sh)),                                               \
 <at>  <at>  -579,14 +584,14  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "1" ((UDItype)(al)),                                       \
 	     "g" ((UDItype)(bl))                                        \
 	   __CLOBBER_CC)
-#define umul_ppmm(w1, w0, u, v) \
+# define umul_ppmm(w1, w0, u, v) \
   __asm__ ("mulq %3"                                                    \
 	   : "=a" ((w0)),                                               \
 	     "=d" ((w1))                                                \
 	   : "0" ((UDItype)(u)),                                        \
 	     "rm" ((UDItype)(v))                                        \
 	   __CLOBBER_CC)
-#define udiv_qrnnd(q, r, n1, n0, d) \
+# define udiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("divq %4"                                                    \
 	   : "=a" ((q)),                                                \
 	     "=d" ((r))                                                 \
 <at>  <at>  -594,7 +599,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "1" ((UDItype)(n1)),                                       \
 	     "rm" ((UDItype)(d))                                        \
 	   __CLOBBER_CC)
-#define count_leading_zeros(count, x) \
+# define count_leading_zeros(count, x) \
   do {                                                                  \
     UDItype __cbtmp;                                                    \
     __asm__ ("bsrq %1,%0"                                               \
 <at>  <at>  -602,7 +607,7  <at>  <at>  extern USItype __udiv_qrnnd ();
              __CLOBBER_CC);                                             \
     (count) = __cbtmp ^ 63;                                             \
   } while (0)
-#define count_trailing_zeros(count, x) \
+# define count_trailing_zeros(count, x) \
   do {                                                                  \
     UDItype __cbtmp;                                                    \
     __asm__ ("bsfq %1,%0"                                               \
 <at>  <at>  -610,12 +615,12  <at>  <at>  extern USItype __udiv_qrnnd ();
              __CLOBBER_CC);                                             \
     (count) = __cbtmp;                                                  \
   } while (0)
-#ifndef UMUL_TIME
-#define UMUL_TIME 40
-#endif
-#ifndef UDIV_TIME
-#define UDIV_TIME 40
-#endif
+# ifndef UMUL_TIME
+#  define UMUL_TIME 40
+# endif
+# ifndef UDIV_TIME
+#  define UDIV_TIME 40
+# endif
 #endif /* __x86_64 */
 
 
 <at>  <at>  -623,7 +628,7  <at>  <at>  extern USItype __udiv_qrnnd ();
  **************  I860  *****************
  ***************************************/
 #if defined (__i860__) && W_TYPE_SIZE == 32
-#define rshift_rhlc(r,h,l,c) \
+# define rshift_rhlc(r,h,l,c) \
   __asm__ ("shr %3,r0,r0\n"  \
            "shrd %1,%2,%0"   \
 	   "=r" (r) : "r" (h), "r" (l), "rn" (c))
 <at>  <at>  -633,7 +638,7  <at>  <at>  extern USItype __udiv_qrnnd ();
  **************  I960  *****************
  ***************************************/
 #if defined (__i960__) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("cmpo 1,0\n"      \
            "addc %5,%4,%1\n" \
            "addc %3,%2,%0"   \
 <at>  <at>  -643,7 +648,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "dI" ((USItype)(bh)),                                      \
 	     "%dI" ((USItype)(al)),                                     \
 	     "dI" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("cmpo 0,0\n"      \
            "subc %5,%4,%1\n" \
            "subc %3,%2,%0"   \
 <at>  <at>  -653,7 +658,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "dI" ((USItype)(bh)),                                      \
 	     "dI" ((USItype)(al)),                                      \
 	     "dI" ((USItype)(bl)))
-#define umul_ppmm(w1, w0, u, v) \
+# define umul_ppmm(w1, w0, u, v) \
   ({union {UDItype __ll;						\
 	   struct {USItype __l, __h;} __i;				\
 	  } __xx;							\
 <at>  <at>  -662,14 +667,14  <at>  <at>  extern USItype __udiv_qrnnd ();
 	   : "%dI" ((USItype)(u)),                                      \
 	     "dI" ((USItype)(v)));                                      \
   (w1) = __xx.__i.__h; (w0) = __xx.__i.__l;})
-#define __umulsidi3(u, v) \
+# define __umulsidi3(u, v) \
   ({UDItype __w;							\
     __asm__ ("emul      %2,%1,%0"                                       \
 	     : "=d" (__w)                                               \
 	     : "%dI" ((USItype)(u)),                                    \
 	       "dI" ((USItype)(v)));                                    \
     __w; })
-#define udiv_qrnnd(q, r, nh, nl, d) \
+# define udiv_qrnnd(q, r, nh, nl, d) \
   do {									\
     union {UDItype __ll;						\
 	   struct {USItype __l, __h;} __i;				\
 <at>  <at>  -681,7 +686,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "dI" ((USItype)(d)));                                      \
     (r) = __rq.__i.__l; (q) = __rq.__i.__h;				\
   } while (0)
-#define count_leading_zeros(count, x) \
+# define count_leading_zeros(count, x) \
   do {									\
     USItype __cbtmp;							\
     __asm__ ("scanbit %1,%0"                                            \
 <at>  <at>  -689,9 +694,9  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     : "r" ((USItype)(x)));                                     \
     (count) = __cbtmp ^ 31;						\
   } while (0)
-#define COUNT_LEADING_ZEROS_0 (-32) /* sic */
-#if defined (__i960mx)		/* what is the proper symbol to test??? */
-#define rshift_rhlc(r,h,l,c) \
+# define COUNT_LEADING_ZEROS_0 (-32) /* sic */
+# if defined (__i960mx)  /* what is the proper symbol to test??? */
+#  define rshift_rhlc(r,h,l,c) \
   do {									\
     union {UDItype __ll;						\
 	   struct {USItype __l, __h;} __i;				\
 <at>  <at>  -700,15 +705,16  <at>  <at>  extern USItype __udiv_qrnnd ();
     __asm__ ("shre %2,%1,%0"                                            \
 	     : "=d" (r) : "dI" (__nn.__ll), "dI" (c));                  \
   }
-#endif /* i960mx */
+# endif /* i960mx */
 #endif /* i960 */
 
 
 /***************************************
  **************  68000	****************
  ***************************************/
-#if (defined (__mc68000__) || defined (__mc68020__) || defined (__NeXT__) || defined(mc68020)) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+#if (defined (__mc68000__) || defined (__mc68020__)                     \
+     || defined (__NeXT__) || defined(mc68020)) && W_TYPE_SIZE == 32
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add%.l %5,%1\n"                                             \
 	   "addx%.l %3,%0"                                              \
 	   : "=d" ((USItype)(sh)),                                      \
 <at>  <at>  -717,7 +723,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "d" ((USItype)(bh)),                                       \
 	     "%1" ((USItype)(al)),                                      \
 	     "g" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub%.l %5,%1\n"                                             \
 	   "subx%.l %3,%0"                                              \
 	   : "=d" ((USItype)(sh)),                                      \
 <at>  <at>  -726,36 +732,36  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "d" ((USItype)(bh)),                                       \
 	     "1" ((USItype)(al)),                                       \
 	     "g" ((USItype)(bl)))
-#if (defined (__mc68020__) || defined (__NeXT__) || defined(mc68020))
-#define umul_ppmm(w1, w0, u, v) \
+# if (defined (__mc68020__) || defined (__NeXT__) || defined(mc68020))
+#  define umul_ppmm(w1, w0, u, v) \
   __asm__ ("mulu%.l %3,%1:%0"                                           \
 	   : "=d" ((USItype)(w0)),                                      \
 	     "=d" ((USItype)(w1))                                       \
 	   : "%0" ((USItype)(u)),                                       \
 	     "dmi" ((USItype)(v)))
-#define UMUL_TIME 45
-#define udiv_qrnnd(q, r, n1, n0, d) \
+#  define UMUL_TIME 45
+#  define udiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("divu%.l %4,%1:%0"                                           \
 	   : "=d" ((USItype)(q)),                                       \
 	     "=d" ((USItype)(r))                                        \
 	   : "0" ((USItype)(n0)),                                       \
 	     "1" ((USItype)(n1)),                                       \
 	     "dmi" ((USItype)(d)))
-#define UDIV_TIME 90
-#define sdiv_qrnnd(q, r, n1, n0, d) \
+#  define UDIV_TIME 90
+#  define sdiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("divs%.l %4,%1:%0"                                           \
 	   : "=d" ((USItype)(q)),                                       \
 	     "=d" ((USItype)(r))                                        \
 	   : "0" ((USItype)(n0)),                                       \
 	     "1" ((USItype)(n1)),                                       \
 	     "dmi" ((USItype)(d)))
-#define count_leading_zeros(count, x) \
+#  define count_leading_zeros(count, x) \
   __asm__ ("bfffo %1{%b2:%b2},%0"                                       \
 	   : "=d" ((USItype)(count))                                    \
 	   : "od" ((USItype)(x)), "n" (0))
-#define COUNT_LEADING_ZEROS_0 32
-#else /* not mc68020 */
-#define umul_ppmm(xh, xl, a, b) \
+#  define COUNT_LEADING_ZEROS_0 32
+# else /* not mc68020 */
+#  define umul_ppmm(xh, xl, a, b) \
   do { USItype __umul_tmp1, __umul_tmp2;			  \
 	__asm__ ("| Inlined umul_ppmm                         \n" \
  "        move%.l %5,%3                                       \n" \
 <at>  <at>  -783,9 +789,9  <at>  <at>  extern USItype __udiv_qrnnd ();
 		"=d" (__umul_tmp1), "=&d" (__umul_tmp2)           \
 	      : "%2" ((USItype)(a)), "d" ((USItype)(b)));         \
   } while (0)
-#define UMUL_TIME 100
-#define UDIV_TIME 400
-#endif /* not mc68020 */
+#  define UMUL_TIME 100
+#  define UDIV_TIME 400
+# endif /* not mc68020 */
 #endif /* mc68000 */
 
 
 <at>  <at>  -793,7 +799,7  <at>  <at>  extern USItype __udiv_qrnnd ();
  **************  88000	****************
  ***************************************/
 #if defined (__m88000__) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addu.co %1,%r4,%r5\n"                                       \
 	   "addu.ci %0,%r2,%r3"                                         \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -802,7 +808,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "rJ" ((USItype)(bh)),                                      \
 	     "%rJ" ((USItype)(al)),                                     \
 	     "rJ" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subu.co %1,%r4,%r5\n"                                       \
 	   "subu.ci %0,%r2,%r3"                                         \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -811,7 +817,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "rJ" ((USItype)(bh)),                                      \
 	     "rJ" ((USItype)(al)),                                      \
 	     "rJ" ((USItype)(bl)))
-#define count_leading_zeros(count, x) \
+# define count_leading_zeros(count, x) \
   do {									\
     USItype __cbtmp;							\
     __asm__ ("ff1 %0,%1"                                                \
 <at>  <at>  -819,9 +825,9  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     : "r" ((USItype)(x)));                                     \
     (count) = __cbtmp ^ 31;						\
   } while (0)
-#define COUNT_LEADING_ZEROS_0 63 /* sic */
-#if defined (__m88110__)
-#define umul_ppmm(wh, wl, u, v) \
+# define COUNT_LEADING_ZEROS_0 63 /* sic */
+# if defined (__m88110__)
+#  define umul_ppmm(wh, wl, u, v) \
   do {									\
     union {UDItype __ll;						\
 	   struct {USItype __h, __l;} __i;				\
 <at>  <at>  -830,7 +836,7  <at>  <at>  extern USItype __udiv_qrnnd ();
     (wh) = __x.__i.__h; 						\
     (wl) = __x.__i.__l; 						\
   } while (0)
-#define udiv_qrnnd(q, r, n1, n0, d) \
+#  define udiv_qrnnd(q, r, n1, n0, d) \
   ({union {UDItype __ll;						\
 	   struct {USItype __h, __l;} __i;				\
 	  } __x, __q;							\
 <at>  <at>  -838,36 +844,36  <at>  <at>  extern USItype __udiv_qrnnd ();
   __asm__ ("divu.d %0,%1,%2"                                            \
 	   : "=r" (__q.__ll) : "r" (__x.__ll), "r" (d));                \
   (r) = (n0) - __q.__l * (d); (q) = __q.__l; })
-#define UMUL_TIME 5
-#define UDIV_TIME 25
-#else
-#define UMUL_TIME 17
-#define UDIV_TIME 150
-#endif /* __m88110__ */
+#  define UMUL_TIME 5
+#  define UDIV_TIME 25
+# else
+#  define UMUL_TIME 17
+#  define UDIV_TIME 150
+# endif /* __m88110__ */
 #endif /* __m88000__ */
 
 /***************************************
  **************  MIPS  *****************
  ***************************************/
 #if defined (__mips__) && W_TYPE_SIZE == 32
-#if defined (__clang__) || (__GNUC__ >= 5) || (__GNUC__ == 4 && \
+# if defined (__clang__) || (__GNUC__ >= 5) || (__GNUC__ == 4 && \
                                                __GNUC_MINOR__ >= 4)
-#define umul_ppmm(w1, w0, u, v) \
+#  define umul_ppmm(w1, w0, u, v) \
   do {                                                                  \
     UDItype _r;                                                         \
     _r = (UDItype) u * v;                                               \
     (w1) = _r >> 32;                                                    \
     (w0) = (USItype) _r;                                                \
   } while (0)
-#elif __GNUC__ > 2 || __GNUC_MINOR__ >= 7
-#define umul_ppmm(w1, w0, u, v) \
+# elif __GNUC__ > 2 || __GNUC_MINOR__ >= 7
+#  define umul_ppmm(w1, w0, u, v) \
   __asm__ ("multu %2,%3"                                                \
 	   : "=l" ((USItype)(w0)),                                      \
 	     "=h" ((USItype)(w1))                                       \
 	   : "d" ((USItype)(u)),                                        \
 	     "d" ((USItype)(v)))
-#else
-#define umul_ppmm(w1, w0, u, v) \
+# else
+#  define umul_ppmm(w1, w0, u, v) \
   __asm__ ("multu %2,%3 \n" \
 	   "mflo %0 \n"     \
 	   "mfhi %1"                                                        \
 <at>  <at>  -875,33 +881,33  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "=d" ((USItype)(w1))                                       \
 	   : "d" ((USItype)(u)),                                        \
 	     "d" ((USItype)(v)))
-#endif
-#define UMUL_TIME 10
-#define UDIV_TIME 100
+# endif
+# define UMUL_TIME 10
+# define UDIV_TIME 100
 #endif /* __mips__ */
 
 /***************************************
  **************  MIPS/64  **************
  ***************************************/
 #if (defined (__mips) && __mips >= 3) && W_TYPE_SIZE == 64
-#if (__GNUC__ >= 5) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 4)
+# if (__GNUC__ >= 5) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 4)
 typedef unsigned int UTItype __attribute__ ((mode (TI)));
-#define umul_ppmm(w1, w0, u, v) \
+#  define umul_ppmm(w1, w0, u, v) \
   do {                                                                 \
     UTItype _r;                                                        \
     _r = (UTItype) u * v;                                              \
     (w1) = _r >> 64;                                                   \
     (w0) = (UDItype) _r;                                               \
   } while (0)
-#elif __GNUC__ > 2 || __GNUC_MINOR__ >= 7
-#define umul_ppmm(w1, w0, u, v) \
+# elif __GNUC__ > 2 || __GNUC_MINOR__ >= 7
+#  define umul_ppmm(w1, w0, u, v) \
   __asm__ ("dmultu %2,%3"                                               \
 	   : "=l" ((UDItype)(w0)),                                      \
 	     "=h" ((UDItype)(w1))                                       \
 	   : "d" ((UDItype)(u)),                                        \
 	     "d" ((UDItype)(v)))
-#else
-#define umul_ppmm(w1, w0, u, v) \
+# else
+#  define umul_ppmm(w1, w0, u, v) \
   __asm__ ("dmultu %2,%3 \n"    \
 	   "mflo %0 \n"         \
 	   "mfhi %1"                                                        \
 <at>  <at>  -909,9 +915,9  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	     "=d" ((UDItype)(w1))                                       \
 	   : "d" ((UDItype)(u)),                                        \
 	     "d" ((UDItype)(v)))
-#endif
-#define UMUL_TIME 20
-#define UDIV_TIME 140
+# endif
+# define UMUL_TIME 20
+# define UDIV_TIME 140
 #endif /* __mips__ */
 
 
 <at>  <at>  -919,7 +925,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
  **************  32000	****************
  ***************************************/
 #if defined (__ns32000__) && W_TYPE_SIZE == 32
-#define umul_ppmm(w1, w0, u, v) \
+# define umul_ppmm(w1, w0, u, v) \
   ({union {UDItype __ll;						\
 	   struct {USItype __l, __h;} __i;				\
 	  } __xx;							\
 <at>  <at>  -928,14 +934,14  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	   : "%0" ((USItype)(u)),                                       \
 	     "g" ((USItype)(v)));                                       \
   (w1) = __xx.__i.__h; (w0) = __xx.__i.__l;})
-#define __umulsidi3(u, v) \
+# define __umulsidi3(u, v) \
   ({UDItype __w;							\
     __asm__ ("meid %2,%0"                                               \
 	     : "=g" (__w)                                               \
 	     : "%0" ((USItype)(u)),                                     \
 	       "g" ((USItype)(v)));                                     \
     __w; })
-#define udiv_qrnnd(q, r, n1, n0, d) \
+# define udiv_qrnnd(q, r, n1, n0, d) \
   ({union {UDItype __ll;						\
 	   struct {USItype __l, __h;} __i;				\
 	  } __xx;							\
 <at>  <at>  -945,7 +951,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	   : "0" (__xx.__ll),                                           \
 	     "g" ((USItype)(d)));                                       \
   (r) = __xx.__i.__l; (q) = __xx.__i.__h; })
-#define count_trailing_zeros(count,x) \
+# define count_trailing_zeros(count,x) \
   do {
     __asm__ ("ffsd      %2,%0"                                          \
 	     : "=r" ((USItype) (count))                                 \
 <at>  <at>  -959,7 +965,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
  **************  PPC  ******************
  ***************************************/
 #if (defined (_ARCH_PPC) || defined (_IBMR2)) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   do {									\
     if (__builtin_constant_p (bh) && (bh) == 0) 			\
       __asm__ ("{a%I4|add%I4c} %1,%3,%4\n\t{aze|addze} %0,%2"           \
 <at>  <at>  -984,7 +990,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	       "%r" ((USItype)(al)),                                    \
 	       "rI" ((USItype)(bl)));                                   \
   } while (0)
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   do {									\
     if (__builtin_constant_p (ah) && (ah) == 0) 			\
       __asm__ ("{sf%I3|subf%I3c} %1,%4,%3\n\t{sfze|subfze} %0,%2"       \
 <at>  <at>  -1023,13 +1029,13  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 		 "rI" ((USItype)(al)),                                  \
 		 "r" ((USItype)(bl)));                                  \
   } while (0)
-#define count_leading_zeros(count, x) \
+# define count_leading_zeros(count, x) \
   __asm__ ("{cntlz|cntlzw} %0,%1"                                       \
 	   : "=r" ((count))                                             \
 	   : "r" ((USItype)(x)))
-#define COUNT_LEADING_ZEROS_0 32
-#if defined (_ARCH_PPC)
-#define umul_ppmm(ph, pl, m0, m1) \
+# define COUNT_LEADING_ZEROS_0 32
+# if defined (_ARCH_PPC)
+#  define umul_ppmm(ph, pl, m0, m1) \
   do {									\
     USItype __m0 = (m0), __m1 = (m1);					\
     __asm__ ("mulhwu %0,%1,%2"                                          \
 <at>  <at>  -1038,8 +1044,8  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	       "r" (__m1));                                             \
     (pl) = __m0 * __m1; 						\
   } while (0)
-#define UMUL_TIME 15
-#define smul_ppmm(ph, pl, m0, m1) \
+#  define UMUL_TIME 15
+#  define smul_ppmm(ph, pl, m0, m1) \
   do {									\
     SItype __m0 = (m0), __m1 = (m1);					\
     __asm__ ("mulhw %0,%1,%2"                                           \
 <at>  <at>  -1048,10 +1054,10  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	       "r" (__m1));                                             \
     (pl) = __m0 * __m1; 						\
   } while (0)
-#define SMUL_TIME 14
-#define UDIV_TIME 120
-#else
-#define umul_ppmm(xh, xl, m0, m1) \
+#  define SMUL_TIME 14
+#  define UDIV_TIME 120
+# else
+#  define umul_ppmm(xh, xl, m0, m1) \
   do {									\
     USItype __m0 = (m0), __m1 = (m1);					\
     __asm__ ("mul %0,%2,%3"                                             \
 <at>  <at>  -1062,20 +1068,20  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
     (xh) += ((((SItype) __m0 >> 31) & __m1)				\
 	     + (((SItype) __m1 >> 31) & __m0)); 			\
   } while (0)
-#define UMUL_TIME 8
-#define smul_ppmm(xh, xl, m0, m1) \
+#  define UMUL_TIME 8
+#  define smul_ppmm(xh, xl, m0, m1) \
   __asm__ ("mul %0,%2,%3"                                               \
 	   : "=r" ((SItype)(xh)),                                       \
 	     "=q" ((SItype)(xl))                                        \
 	   : "r" (m0),                                                  \
 	     "r" (m1))
-#define SMUL_TIME 4
-#define sdiv_qrnnd(q, r, nh, nl, d) \
+#  define SMUL_TIME 4
+#  define sdiv_qrnnd(q, r, nh, nl, d) \
   __asm__ ("div %0,%2,%4"                                               \
 	   : "=r" ((SItype)(q)), "=q" ((SItype)(r))                     \
 	   : "r" ((SItype)(nh)), "1" ((SItype)(nl)), "r" ((SItype)(d)))
-#define UDIV_TIME 100
-#endif
+#  define UDIV_TIME 100
+# endif
 #endif /* Power architecture variants.	*/
 
 /* Powerpc 64 bit support taken from gmp-4.1.2. */
 <at>  <at>  -1140,7 +1146,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
  **************  PYR  ******************
  ***************************************/
 #if defined (__pyr__) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addw        %5,%1 \n" \
 	   "addwc	%3,%0"                                          \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -1149,7 +1155,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	     "g" ((USItype)(bh)),                                       \
 	     "%1" ((USItype)(al)),                                      \
 	     "g" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subw        %5,%1 \n" \
 	   "subwb	%3,%0"                                          \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -1159,7 +1165,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	     "1" ((USItype)(al)),                                       \
 	     "g" ((USItype)(bl)))
 /* This insn works on Pyramids with AP, XP, or MI CPUs, but not with SP.  */
-#define umul_ppmm(w1, w0, u, v) \
+# define umul_ppmm(w1, w0, u, v) \
   ({union {UDItype __ll;						\
 	   struct {USItype __h, __l;} __i;				\
 	  } __xx;							\
 <at>  <at>  -1176,7 +1182,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
  **************  RT/ROMP  **************
  ***************************************/
 #if defined (__ibm032__) /* RT/ROMP */	&& W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("a %1,%5 \n" \
 	   "ae %0,%3"                                                   \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -1185,7 +1191,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	     "r" ((USItype)(bh)),                                       \
 	     "%1" ((USItype)(al)),                                      \
 	     "r" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("s %1,%5\n" \
 	   "se %0,%3"                                                   \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -1194,7 +1200,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	     "r" ((USItype)(bh)),                                       \
 	     "1" ((USItype)(al)),                                       \
 	     "r" ((USItype)(bl)))
-#define umul_ppmm(ph, pl, m0, m1) \
+# define umul_ppmm(ph, pl, m0, m1) \
   do {									\
     USItype __m0 = (m0), __m1 = (m1);					\
     __asm__ (								\
 <at>  <at>  -1226,9 +1232,9  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
     (ph) += ((((SItype) __m0 >> 31) & __m1)				\
 	     + (((SItype) __m1 >> 31) & __m0)); 			\
   } while (0)
-#define UMUL_TIME 20
-#define UDIV_TIME 200
-#define count_leading_zeros(count, x) \
+# define UMUL_TIME 20
+# define UDIV_TIME 200
+# define count_leading_zeros(count, x) \
   do {									\
     if ((x) >= 0x10000) 						\
       __asm__ ("clz     %0,%1"                                          \
 <at>  <at>  -1250,7 +1256,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
  ***************************************/
 #if (defined (__sh2__) || defined(__sh3__) || defined(__SH4__) ) \
     && W_TYPE_SIZE == 32
-#define umul_ppmm(w1, w0, u, v) \
+# define umul_ppmm(w1, w0, u, v) \
   __asm__ (								\
         "dmulu.l %2,%3\n"  \
 	"sts	macl,%1\n" \
 <at>  <at>  -1260,14 +1266,14  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	   : "r" ((USItype)(u)),                                        \
 	     "r" ((USItype)(v))                                         \
 	   : "macl", "mach")
-#define UMUL_TIME 5
+# define UMUL_TIME 5
 #endif
 
 /***************************************
  **************  SPARC	****************
  ***************************************/
 #if defined (__sparc__) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addcc %r4,%5,%1\n" \
 	   "addx %r2,%3,%0"                                             \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -1277,7 +1283,7  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	     "%rJ" ((USItype)(al)),                                     \
 	     "rI" ((USItype)(bl))                                       \
 	   __CLOBBER_CC)
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subcc %r4,%5,%1\n" \
 	   "subx %r2,%3,%0"                                             \
 	   : "=r" ((USItype)(sh)),                                      \
 <at>  <at>  -1287,20 +1293,20  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	     "rJ" ((USItype)(al)),                                      \
 	     "rI" ((USItype)(bl))                                       \
 	   __CLOBBER_CC)
-#if defined (__sparc_v8__)
+# if defined (__sparc_v8__)
 /* Don't match immediate range because, 1) it is not often useful,
    2) the 'I' flag thinks of the range as a 13 bit signed interval,
    while we want to match a 13 bit interval, sign extended to 32 bits,
    but INTERPRETED AS UNSIGNED.  */
-#define umul_ppmm(w1, w0, u, v) \
+#  define umul_ppmm(w1, w0, u, v) \
   __asm__ ("umul %2,%3,%1;rd %%y,%0"                                    \
 	   : "=r" ((USItype)(w1)),                                      \
 	     "=r" ((USItype)(w0))                                       \
 	   : "r" ((USItype)(u)),                                        \
 	     "r" ((USItype)(v)))
-#define UMUL_TIME 5
-#ifndef SUPERSPARC	/* SuperSPARC's udiv only handles 53 bit dividends */
-#define udiv_qrnnd(q, r, n1, n0, d) \
+#  define UMUL_TIME 5
+#  ifndef SUPERSPARC	/* SuperSPARC's udiv only handles 53 bit dividends */
+#   define udiv_qrnnd(q, r, n1, n0, d) \
   do {									\
     USItype __q;							\
     __asm__ ("mov %1,%%y;nop;nop;nop;udiv %2,%3,%0"                     \
 <at>  <at>  -1311,20 +1317,20  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
     (r) = (n0) - __q * (d);						\
     (q) = __q;								\
   } while (0)
-#define UDIV_TIME 25
-#endif /* SUPERSPARC */
-#else /* ! __sparc_v8__ */
-#if defined (__sparclite__)
+#   define UDIV_TIME 25
+#  endif /*!SUPERSPARC */
+# else /* ! __sparc_v8__ */
+#  if defined (__sparclite__)
 /* This has hardware multiply but not divide.  It also has two additional
    instructions scan (ffs from high bit) and divscc.  */
-#define umul_ppmm(w1, w0, u, v) \
+#   define umul_ppmm(w1, w0, u, v) \
   __asm__ ("umul %2,%3,%1;rd %%y,%0"                                    \
 	   : "=r" ((USItype)(w1)),                                      \
 	     "=r" ((USItype)(w0))                                       \
 	   : "r" ((USItype)(u)),                                        \
 	     "r" ((USItype)(v)))
-#define UMUL_TIME 5
-#define udiv_qrnnd(q, r, n1, n0, d) \
+#   define UMUL_TIME 5
+#   define udiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("! Inlined udiv_qrnnd                                     \n" \
  "        wr	%%g0,%2,%%y	! Not a delayed write for sparclite  \n" \
  "        tst	%%g0                                                 \n" \
 <at>  <at>  -1370,19 +1376,19  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	     "r" ((USItype)(n0)),                                       \
 	     "rI" ((USItype)(d))                                        \
 	   : "%g1" __AND_CLOBBER_CC)
-#define UDIV_TIME 37
-#define count_leading_zeros(count, x) \
+#   define UDIV_TIME 37
+#   define count_leading_zeros(count, x) \
   __asm__ ("scan %1,0,%0"                                               \
 	   : "=r" ((USItype)(x))                                        \
 	   : "r" ((USItype)(count)))
 /* Early sparclites return 63 for an argument of 0, but they warn that future
    implementations might change this.  Therefore, leave COUNT_LEADING_ZEROS_0
    undefined.  */
-#endif /* __sparclite__ */
-#endif /* __sparc_v8__ */
+#  endif /* !__sparclite__ */
+# endif /* !__sparc_v8__ */
 /* Default to sparc v7 versions of umul_ppmm and udiv_qrnnd.  */
-#ifndef umul_ppmm
-#define umul_ppmm(w1, w0, u, v) \
+# ifndef umul_ppmm
+#  define umul_ppmm(w1, w0, u, v) \
   __asm__ ("! Inlined umul_ppmm                                        \n" \
  "        wr	%%g0,%2,%%y	! SPARC has 0-3 delay insn after a wr  \n" \
  "        sra	%3,31,%%g2	! Don't move this insn                 \n" \
 <at>  <at>  -1428,19 +1434,19  <at>  <at>  typedef unsigned int UTItype __attribute__ ((mode (TI)));
 	   : "%rI" ((USItype)(u)),                                      \
 	     "r" ((USItype)(v))                                         \
 	   : "%g1", "%g2" __AND_CLOBBER_CC)
-#define UMUL_TIME 39		/* 39 instructions */
-#endif
-#ifndef udiv_qrnnd
-#ifndef LONGLONG_STANDALONE
-#define udiv_qrnnd(q, r, n1, n0, d) \
+#  define UMUL_TIME 39		/* 39 instructions */
+# endif /* umul_ppmm */
+# ifndef udiv_qrnnd
+#  ifndef LONGLONG_STANDALONE
+#   define udiv_qrnnd(q, r, n1, n0, d) \
   do { USItype __r;							\
     (q) = __udiv_qrnnd (&__r, (n1), (n0), (d)); 			\
     (r) = __r;								\
   } while (0)
 extern USItype __udiv_qrnnd ();
-#define UDIV_TIME 140
-#endif /* LONGLONG_STANDALONE */
-#endif /* udiv_qrnnd */
+#   define UDIV_TIME 140
+#  endif /* LONGLONG_STANDALONE */
+# endif /* udiv_qrnnd */
 #endif /* __sparc__ */
 
 
 <at>  <at>  -1448,7 +1454,7  <at>  <at>  extern USItype __udiv_qrnnd ();
  **************  VAX  ******************
  ***************************************/
 #if defined (__vax__) && W_TYPE_SIZE == 32
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addl2 %5,%1\n" \
 	   "adwc %3,%0"                                                 \
 	   : "=g" ((USItype)(sh)),                                      \
 <at>  <at>  -1457,7 +1463,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "g" ((USItype)(bh)),                                       \
 	     "%1" ((USItype)(al)),                                      \
 	     "g" ((USItype)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subl2 %5,%1\n" \
 	   "sbwc %3,%0"                                                 \
 	   : "=g" ((USItype)(sh)),                                      \
 <at>  <at>  -1466,7 +1472,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "g" ((USItype)(bh)),                                       \
 	     "1" ((USItype)(al)),                                       \
 	     "g" ((USItype)(bl)))
-#define umul_ppmm(xh, xl, m0, m1) \
+# define umul_ppmm(xh, xl, m0, m1) \
   do {									\
     union {UDItype __ll;						\
 	   struct {USItype __l, __h;} __i;				\
 <at>  <at>  -1480,7 +1486,7  <at>  <at>  extern USItype __udiv_qrnnd ();
     (xh) += ((((SItype) __m0 >> 31) & __m1)				\
 	     + (((SItype) __m1 >> 31) & __m0)); 			\
   } while (0)
-#define sdiv_qrnnd(q, r, n1, n0, d) \
+# define sdiv_qrnnd(q, r, n1, n0, d) \
   do {									\
     union {DItype __ll; 						\
 	   struct {SItype __l, __h;} __i;				\
 <at>  <at>  -1497,7 +1503,7  <at>  <at>  extern USItype __udiv_qrnnd ();
  **************  Z8000	****************
  ***************************************/
 #if defined (__z8000__) && W_TYPE_SIZE == 16
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+# define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add %H1,%H5\n\tadc  %H0,%H3"                                \
 	   : "=r" ((unsigned int)(sh)),                                 \
 	     "=&r" ((unsigned int)(sl))                                 \
 <at>  <at>  -1505,7 +1511,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "r" ((unsigned int)(bh)),                                  \
 	     "%1" ((unsigned int)(al)),                                 \
 	     "rQR" ((unsigned int)(bl)))
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+# define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub %H1,%H5\n\tsbc  %H0,%H3"                                \
 	   : "=r" ((unsigned int)(sh)),                                 \
 	     "=&r" ((unsigned int)(sl))                                 \
 <at>  <at>  -1513,7 +1519,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 	     "r" ((unsigned int)(bh)),                                  \
 	     "1" ((unsigned int)(al)),                                  \
 	     "rQR" ((unsigned int)(bl)))
-#define umul_ppmm(xh, xl, m0, m1) \
+# define umul_ppmm(xh, xl, m0, m1) \
   do {									\
     union {long int __ll;						\
 	   struct {unsigned int __h, __l;} __i; 			\
 <at>  <at>  -1530,6 +1536,11  <at>  <at>  extern USItype __udiv_qrnnd ();
   } while (0)
 #endif /* __z8000__ */
 
+
+/***************************************
+ *****  End CPU Specific Versions  *****
+ ***************************************/
+
 #endif /* __GNUC__ */
 #endif /* !__riscos__ */
 
 <at>  <at>  -1538,7 +1549,7  <at>  <at>  extern USItype __udiv_qrnnd ();
  ***********  Generic Versions	********
  ***************************************/
 #if !defined (umul_ppmm) && defined (__umulsidi3)
-#define umul_ppmm(ph, pl, m0, m1) \
+#  define umul_ppmm(ph, pl, m0, m1) \
   {									\
     UDWtype __ll = __umulsidi3 (m0, m1);				\
     ph = (UWtype) (__ll >> W_TYPE_SIZE);				\
 <at>  <at>  -1547,7 +1558,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 #endif
 
 #if !defined (__umulsidi3)
-#define __umulsidi3(u, v) \
+#  define __umulsidi3(u, v) \
   ({UWtype __hi, __lo;							\
     umul_ppmm (__hi, __lo, u, v);					\
     ((UDWtype) __hi << W_TYPE_SIZE) | __lo; })
 <at>  <at>  -1556,7 +1567,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 /* If this machine has no inline assembler, use C macros.  */
 
 #if !defined (add_ssaaaa)
-#define add_ssaaaa(sh, sl, ah, al, bh, bl) \
+#  define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   do {									\
     UWtype __x; 							\
     __x = (al) + (bl);							\
 <at>  <at>  -1566,7 +1577,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 #endif
 
 #if !defined (sub_ddmmss)
-#define sub_ddmmss(sh, sl, ah, al, bh, bl) \
+#  define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   do {									\
     UWtype __x; 							\
     __x = (al) - (bl);							\
 <at>  <at>  -1576,7 +1587,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 #endif
 
 #if !defined (umul_ppmm)
-#define umul_ppmm(w1, w0, u, v) 					\
+#  define umul_ppmm(w1, w0, u, v) 					\
   do {									\
     UWtype __x0, __x1, __x2, __x3;					\
     UHWtype __ul, __vl, __uh, __vh;					\
 <at>  <at>  -1603,7 +1614,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 #endif
 
 #if !defined (umul_ppmm)
-#define smul_ppmm(w1, w0, u, v) 					\
+#  define smul_ppmm(w1, w0, u, v) 					\
   do {									\
     UWtype __w1;							\
     UWtype __m0 = (u), __m1 = (v);					\
 <at>  <at>  -1653,7 +1664,7  <at>  <at>  extern USItype __udiv_qrnnd ();
 /* If the processor has no udiv_qrnnd but sdiv_qrnnd, go through
    __udiv_w_sdiv (defined in libgcc or elsewhere).  */
 #if !defined (udiv_qrnnd) && defined (sdiv_qrnnd)
-#define udiv_qrnnd(q, r, nh, nl, d) \
+#  define udiv_qrnnd(q, r, nh, nl, d) \
   do {									\
     UWtype __r; 							\
     (q) = __MPN(udiv_w_sdiv) (&__r, nh, nl, d); 			\
 <at>  <at>  -1663,18 +1674,18  <at>  <at>  extern USItype __udiv_qrnnd ();
 
 /* If udiv_qrnnd was not defined for this processor, use __udiv_qrnnd_c.  */
 #if !defined (udiv_qrnnd)
-#define UDIV_NEEDS_NORMALIZATION 1
-#define udiv_qrnnd __udiv_qrnnd_c
+#  define UDIV_NEEDS_NORMALIZATION 1
+#  define udiv_qrnnd __udiv_qrnnd_c
 #endif
 
 #if !defined (count_leading_zeros)
 extern
-#ifdef __STDC__
+#  ifdef __STDC__
 const
-#endif
+#  endif
 unsigned char _gcry_clz_tab[];
-#define MPI_INTERNAL_NEED_CLZ_TAB 1
-#define count_leading_zeros(count, x) \
+#  define MPI_INTERNAL_NEED_CLZ_TAB 1
+#  define count_leading_zeros(count, x) \
   do {									\
     UWtype __xr = (x);							\
     UWtype __a; 							\
 <at>  <at>  -1695,21 +1706,25  <at>  <at>  unsigned char _gcry_clz_tab[];
     (count) = W_TYPE_SIZE - (_gcry_clz_tab[__xr >> __a] + __a);		\
   } while (0)
 /* This version gives a well-defined value for zero. */
-#define COUNT_LEADING_ZEROS_0 W_TYPE_SIZE
-#endif
+#  define COUNT_LEADING_ZEROS_0 W_TYPE_SIZE
+#endif /* !count_leading_zeros */
 
 #if !defined (count_trailing_zeros)
 /* Define count_trailing_zeros using count_leading_zeros.  The latter might be
    defined in asm, but if it is not, the C version above is good enough.  */
-#define count_trailing_zeros(count, x) \
+#  define count_trailing_zeros(count, x) \
   do {									\
     UWtype __ctz_x = (x);						\
     UWtype __ctz_c;							\
     count_leading_zeros (__ctz_c, __ctz_x & -__ctz_x);			\
     (count) = W_TYPE_SIZE - 1 - __ctz_c;				\
   } while (0)
-#endif
+#endif /* !count_trailing_zeros */
 
 #ifndef UDIV_NEEDS_NORMALIZATION
-#define UDIV_NEEDS_NORMALIZATION 0
+#  define UDIV_NEEDS_NORMALIZATION 0
 #endif
+
+/***************************************
+ ******  longlong.h ends here  *********
+ ***************************************/

-----------------------------------------------------------------------

Summary of changes:
 mpi/longlong.h |  513 +++++++++++++++++++++++++++++---------------------------
 1 file changed, 264 insertions(+), 249 deletions(-)


hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org


_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
And Sch | 28 Aug 20:02 2014

[PATCH 1/1] Improved whirlpool hash performance

* cipher/whirlpool.c (whirlpool_transform, sbox, added macro): Added macro and rearranged round
function to alternate between reading to and writing from different state and key variables. Two
whirlpool_context_t variables removed, two were replaced, the sizes of state and key doubled, so
overall the burn stack stays the same. buffer_to_block and block_xor were combined into one operation.
The sbox was converted to one large table, because it is faster than many small tables.
--

Benchmark on different systems:

Intel(R) Atom(TM) CPU N570    <at>  1.66GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     63.40 ns/B     15.04 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |     46.21 ns/B     20.64 MiB/s         - c/B

Intel(R) Core(TM) i5-4670 CPU  <at>  3.40GHz
before:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      7.75 ns/B     123.0 MiB/s         - c/B
after:
Hash:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 WHIRLPOOL      |      6.70 ns/B     142.3 MiB/s         - c/B

This one actually shows greater improvement on the Atom system.

Signed-off-by: Andrei Scherer <andsch <at> inbox.com>

---

diff -ruNp libgcrypt-1.6.2/cipher/whirlpool.c libgcrypt-1.6.3/cipher/whirlpool.c
--- libgcrypt-1.6.2/cipher/whirlpool.c	2014-08-21 07:50:39.000000000 -0500
+++ libgcrypt-1.6.3/cipher/whirlpool.c	2014-08-28 12:47:04.917824140 -0500
 <at>  <at>  -87,6 +87,17  <at>  <at>  typedef struct {
   for (i = 0; i < 8; i++) \
     block_dst[i] ^= block_src[i];

+/* XOR lookup boxes with index SRC [(SHIFT + n) & 7] >> x. */
+#define WHIRLPOOL_XOR(src, shift) \
+	C[((unsigned int)(src[ (shift)         ] >> 56)       )          ] ^ \
+	C[((unsigned int)(src[((shift) + 7) & 7] >> 48) & 0xff) +  256   ] ^ \
+	C[((unsigned int)(src[((shift) + 6) & 7] >> 40) & 0xff) + (256*2)] ^ \
+	C[((unsigned int)(src[((shift) + 5) & 7] >> 32) & 0xff) + (256*3)] ^ \
+	C[((unsigned int)(src[((shift) + 4) & 7] >> 24) & 0xff) + (256*4)] ^ \
+	C[((unsigned int)(src[((shift) + 3) & 7] >> 16) & 0xff) + (256*5)] ^ \
+	C[((unsigned int)(src[((shift) + 2) & 7] >>  8) & 0xff) + (256*6)] ^ \
+	C[((unsigned int)(src[((shift) + 1) & 7]      ) & 0xff) + (256*7)] \
+
 ?

 /* Round constants.  */
 <at>  <at>  -107,7 +118,7  <at>  <at>  static const u64 rc[R] =
 ?

 /* Main lookup boxes.  */
-static const u64 C0[256] =
+static const u64 C[8*256] =
   {
     U64_C (0x18186018c07830d8), U64_C (0x23238c2305af4626),
     U64_C (0xc6c63fc67ef991b8), U64_C (0xe8e887e8136fcdfb),
 <at>  <at>  -237,10 +248,7  <at>  <at>  static const u64 C0[256] =
     U64_C (0x98985a98b4c22d2c), U64_C (0xa4a4aaa4490e55ed),
     U64_C (0x2828a0285d885075), U64_C (0x5c5c6d5cda31b886),
     U64_C (0xf8f8c7f8933fed6b), U64_C (0x8686228644a411c2),
-  };

-static const u64 C1[256] =
-  {
     U64_C (0xd818186018c07830), U64_C (0x2623238c2305af46),
     U64_C (0xb8c6c63fc67ef991), U64_C (0xfbe8e887e8136fcd),
     U64_C (0xcb878726874ca113), U64_C (0x11b8b8dab8a9626d),
 <at>  <at>  -369,10 +377,7  <at>  <at>  static const u64 C1[256] =
     U64_C (0x2c98985a98b4c22d), U64_C (0xeda4a4aaa4490e55),
     U64_C (0x752828a0285d8850), U64_C (0x865c5c6d5cda31b8),
     U64_C (0x6bf8f8c7f8933fed), U64_C (0xc28686228644a411),
-  };

-static const u64 C2[256] =
-  {
     U64_C (0x30d818186018c078), U64_C (0x462623238c2305af),
     U64_C (0x91b8c6c63fc67ef9), U64_C (0xcdfbe8e887e8136f),
     U64_C (0x13cb878726874ca1), U64_C (0x6d11b8b8dab8a962),
 <at>  <at>  -501,10 +506,7  <at>  <at>  static const u64 C2[256] =
     U64_C (0x2d2c98985a98b4c2), U64_C (0x55eda4a4aaa4490e),
     U64_C (0x50752828a0285d88), U64_C (0xb8865c5c6d5cda31),
     U64_C (0xed6bf8f8c7f8933f), U64_C (0x11c28686228644a4),
-  };

-static const u64 C3[256] =
-  {
     U64_C (0x7830d818186018c0), U64_C (0xaf462623238c2305),
     U64_C (0xf991b8c6c63fc67e), U64_C (0x6fcdfbe8e887e813),
     U64_C (0xa113cb878726874c), U64_C (0x626d11b8b8dab8a9),
 <at>  <at>  -633,10 +635,7  <at>  <at>  static const u64 C3[256] =
     U64_C (0xc22d2c98985a98b4), U64_C (0x0e55eda4a4aaa449),
     U64_C (0x8850752828a0285d), U64_C (0x31b8865c5c6d5cda),
     U64_C (0x3fed6bf8f8c7f893), U64_C (0xa411c28686228644),
-  };

-static const u64 C4[256] =
-  {
     U64_C (0xc07830d818186018), U64_C (0x05af462623238c23),
     U64_C (0x7ef991b8c6c63fc6), U64_C (0x136fcdfbe8e887e8),
     U64_C (0x4ca113cb87872687), U64_C (0xa9626d11b8b8dab8),
 <at>  <at>  -765,10 +764,7  <at>  <at>  static const u64 C4[256] =
     U64_C (0xb4c22d2c98985a98), U64_C (0x490e55eda4a4aaa4),
     U64_C (0x5d8850752828a028), U64_C (0xda31b8865c5c6d5c),
     U64_C (0x933fed6bf8f8c7f8), U64_C (0x44a411c286862286),
-  };

-static const u64 C5[256] =
-  {
     U64_C (0x18c07830d8181860), U64_C (0x2305af462623238c),
     U64_C (0xc67ef991b8c6c63f), U64_C (0xe8136fcdfbe8e887),
     U64_C (0x874ca113cb878726), U64_C (0xb8a9626d11b8b8da),
 <at>  <at>  -897,10 +893,7  <at>  <at>  static const u64 C5[256] =
     U64_C (0x98b4c22d2c98985a), U64_C (0xa4490e55eda4a4aa),
     U64_C (0x285d8850752828a0), U64_C (0x5cda31b8865c5c6d),
     U64_C (0xf8933fed6bf8f8c7), U64_C (0x8644a411c2868622),
-  };

-static const u64 C6[256] =
-  {
     U64_C (0x6018c07830d81818), U64_C (0x8c2305af46262323),
     U64_C (0x3fc67ef991b8c6c6), U64_C (0x87e8136fcdfbe8e8),
     U64_C (0x26874ca113cb8787), U64_C (0xdab8a9626d11b8b8),
 <at>  <at>  -1029,10 +1022,7  <at>  <at>  static const u64 C6[256] =
     U64_C (0x5a98b4c22d2c9898), U64_C (0xaaa4490e55eda4a4),
     U64_C (0xa0285d8850752828), U64_C (0x6d5cda31b8865c5c),
     U64_C (0xc7f8933fed6bf8f8), U64_C (0x228644a411c28686),
-  };

-static const u64 C7[256] =
-  {
     U64_C (0x186018c07830d818), U64_C (0x238c2305af462623),
     U64_C (0xc63fc67ef991b8c6), U64_C (0xe887e8136fcdfbe8),
     U64_C (0x8726874ca113cb87), U64_C (0xb8dab8a9626d11b8),
 <at>  <at>  -1163,7 +1153,6  <at>  <at>  static const u64 C7[256] =
     U64_C (0xf8c7f8933fed6bf8), U64_C (0x86228644a411c286),
   };

-
 ?
 /*
  * Transform block.
 <at>  <at>  -1172,97 +1161,36  <at>  <at>  static unsigned int
 whirlpool_transform (void *ctx, const unsigned char *data)
 {
   whirlpool_context_t *context = ctx;
-  whirlpool_block_t data_block;
-  whirlpool_block_t key;
-  whirlpool_block_t state;
-  whirlpool_block_t block;
+  u64 key[2][BLOCK_SIZE / 8];
+  u64 state[2][BLOCK_SIZE / 8];
   unsigned int r;
   unsigned int i;

-  buffer_to_block (data, data_block, i);
-  block_copy (key, context->hash_state, i);
-  block_copy (state, context->hash_state, i);
-  block_xor (state, data_block, i);
+  /* buffer_to_block and block_xor at once */
+
+  for (i = 0; i < 8; i++)
+    state[0][i] = buf_get_be64((data) + i * 8) ^ context->hash_state[i];
+
+  block_copy (key[0], context->hash_state, i);
+  block_copy (context->hash_state, state[0], i);

-  for (r = 0; r < R; r++)
+  for (r = 0, i = 0; r < R; r++, i = !i)
     {
-      /* Compute round key K^r.  */
+      /* Compute round key K^r, and apply r-th round transformation, interleaved  */

-      block[0] = (C0[(key[0] >> 56) & 0xFF] ^ C1[(key[7] >> 48) & 0xFF] ^
-		  C2[(key[6] >> 40) & 0xFF] ^ C3[(key[5] >> 32) & 0xFF] ^
-		  C4[(key[4] >> 24) & 0xFF] ^ C5[(key[3] >> 16) & 0xFF] ^
-		  C6[(key[2] >>  8) & 0xFF] ^ C7[(key[1] >>  0) & 0xFF] ^ rc[r]);
-      block[1] = (C0[(key[1] >> 56) & 0xFF] ^ C1[(key[0] >> 48) & 0xFF] ^
-		  C2[(key[7] >> 40) & 0xFF] ^ C3[(key[6] >> 32) & 0xFF] ^
-		  C4[(key[5] >> 24) & 0xFF] ^ C5[(key[4] >> 16) & 0xFF] ^
-		  C6[(key[3] >>  8) & 0xFF] ^ C7[(key[2] >>  0) & 0xFF]);
-      block[2] = (C0[(key[2] >> 56) & 0xFF] ^ C1[(key[1] >> 48) & 0xFF] ^
-		  C2[(key[0] >> 40) & 0xFF] ^ C3[(key[7] >> 32) & 0xFF] ^
-		  C4[(key[6] >> 24) & 0xFF] ^ C5[(key[5] >> 16) & 0xFF] ^
-		  C6[(key[4] >>  8) & 0xFF] ^ C7[(key[3] >>  0) & 0xFF]);
-      block[3] = (C0[(key[3] >> 56) & 0xFF] ^ C1[(key[2] >> 48) & 0xFF] ^
-		  C2[(key[1] >> 40) & 0xFF] ^ C3[(key[0] >> 32) & 0xFF] ^
-		  C4[(key[7] >> 24) & 0xFF] ^ C5[(key[6] >> 16) & 0xFF] ^
-		  C6[(key[5] >>  8) & 0xFF] ^ C7[(key[4] >>  0) & 0xFF]);
-      block[4] = (C0[(key[4] >> 56) & 0xFF] ^ C1[(key[3] >> 48) & 0xFF] ^
-		  C2[(key[2] >> 40) & 0xFF] ^ C3[(key[1] >> 32) & 0xFF] ^
-		  C4[(key[0] >> 24) & 0xFF] ^ C5[(key[7] >> 16) & 0xFF] ^
-		  C6[(key[6] >>  8) & 0xFF] ^ C7[(key[5] >>  0) & 0xFF]);
-      block[5] = (C0[(key[5] >> 56) & 0xFF] ^ C1[(key[4] >> 48) & 0xFF] ^
-		  C2[(key[3] >> 40) & 0xFF] ^ C3[(key[2] >> 32) & 0xFF] ^
-		  C4[(key[1] >> 24) & 0xFF] ^ C5[(key[0] >> 16) & 0xFF] ^
-		  C6[(key[7] >>  8) & 0xFF] ^ C7[(key[6] >>  0) & 0xFF]);
-      block[6] = (C0[(key[6] >> 56) & 0xFF] ^ C1[(key[5] >> 48) & 0xFF] ^
-		  C2[(key[4] >> 40) & 0xFF] ^ C3[(key[3] >> 32) & 0xFF] ^
-		  C4[(key[2] >> 24) & 0xFF] ^ C5[(key[1] >> 16) & 0xFF] ^
-		  C6[(key[0] >>  8) & 0xFF] ^ C7[(key[7] >>  0) & 0xFF]);
-      block[7] = (C0[(key[7] >> 56) & 0xFF] ^ C1[(key[6] >> 48) & 0xFF] ^
-		  C2[(key[5] >> 40) & 0xFF] ^ C3[(key[4] >> 32) & 0xFF] ^
-		  C4[(key[3] >> 24) & 0xFF] ^ C5[(key[2] >> 16) & 0xFF] ^
-		  C6[(key[1] >>  8) & 0xFF] ^ C7[(key[0] >>  0) & 0xFF]);
-      block_copy (key, block, i);
-
-      /* Apply r-th round transformation.  */
-
-      block[0] = (C0[(state[0] >> 56) & 0xFF] ^ C1[(state[7] >> 48) & 0xFF] ^
-		  C2[(state[6] >> 40) & 0xFF] ^ C3[(state[5] >> 32) & 0xFF] ^
-		  C4[(state[4] >> 24) & 0xFF] ^ C5[(state[3] >> 16) & 0xFF] ^
-		  C6[(state[2] >>  8) & 0xFF] ^ C7[(state[1] >>  0) & 0xFF] ^ key[0]);
-      block[1] = (C0[(state[1] >> 56) & 0xFF] ^ C1[(state[0] >> 48) & 0xFF] ^
-		  C2[(state[7] >> 40) & 0xFF] ^ C3[(state[6] >> 32) & 0xFF] ^
-		  C4[(state[5] >> 24) & 0xFF] ^ C5[(state[4] >> 16) & 0xFF] ^
-		  C6[(state[3] >>  8) & 0xFF] ^ C7[(state[2] >>  0) & 0xFF] ^ key[1]);
-      block[2] = (C0[(state[2] >> 56) & 0xFF] ^ C1[(state[1] >> 48) & 0xFF] ^
-		  C2[(state[0] >> 40) & 0xFF] ^ C3[(state[7] >> 32) & 0xFF] ^
-		  C4[(state[6] >> 24) & 0xFF] ^ C5[(state[5] >> 16) & 0xFF] ^
-		  C6[(state[4] >>  8) & 0xFF] ^ C7[(state[3] >>  0) & 0xFF] ^ key[2]);
-      block[3] = (C0[(state[3] >> 56) & 0xFF] ^ C1[(state[2] >> 48) & 0xFF] ^
-		  C2[(state[1] >> 40) & 0xFF] ^ C3[(state[0] >> 32) & 0xFF] ^
-		  C4[(state[7] >> 24) & 0xFF] ^ C5[(state[6] >> 16) & 0xFF] ^
-		  C6[(state[5] >>  8) & 0xFF] ^ C7[(state[4] >>  0) & 0xFF] ^ key[3]);
-      block[4] = (C0[(state[4] >> 56) & 0xFF] ^ C1[(state[3] >> 48) & 0xFF] ^
-		  C2[(state[2] >> 40) & 0xFF] ^ C3[(state[1] >> 32) & 0xFF] ^
-		  C4[(state[0] >> 24) & 0xFF] ^ C5[(state[7] >> 16) & 0xFF] ^
-		  C6[(state[6] >>  8) & 0xFF] ^ C7[(state[5] >>  0) & 0xFF] ^ key[4]);
-      block[5] = (C0[(state[5] >> 56) & 0xFF] ^ C1[(state[4] >> 48) & 0xFF] ^
-		  C2[(state[3] >> 40) & 0xFF] ^ C3[(state[2] >> 32) & 0xFF] ^
-		  C4[(state[1] >> 24) & 0xFF] ^ C5[(state[0] >> 16) & 0xFF] ^
-		  C6[(state[7] >>  8) & 0xFF] ^ C7[(state[6] >>  0) & 0xFF] ^ key[5]);
-      block[6] = (C0[(state[6] >> 56) & 0xFF] ^ C1[(state[5] >> 48) & 0xFF] ^
-		  C2[(state[4] >> 40) & 0xFF] ^ C3[(state[3] >> 32) & 0xFF] ^
-		  C4[(state[2] >> 24) & 0xFF] ^ C5[(state[1] >> 16) & 0xFF] ^
-		  C6[(state[0] >>  8) & 0xFF] ^ C7[(state[7] >>  0) & 0xFF] ^ key[6]);
-      block[7] = (C0[(state[7] >> 56) & 0xFF] ^ C1[(state[6] >> 48) & 0xFF] ^
-		  C2[(state[5] >> 40) & 0xFF] ^ C3[(state[4] >> 32) & 0xFF] ^
-		  C4[(state[3] >> 24) & 0xFF] ^ C5[(state[2] >> 16) & 0xFF] ^
-		  C6[(state[1] >>  8) & 0xFF] ^ C7[(state[0] >>  0) & 0xFF] ^ key[7]);
-      block_copy (state, block, i);
+      state[!i][0] = WHIRLPOOL_XOR(state[i], 0) ^ (key[!i][0] = WHIRLPOOL_XOR(key[i], 0) ^ rc[r]);
+      state[!i][1] = WHIRLPOOL_XOR(state[i], 1) ^ (key[!i][1] = WHIRLPOOL_XOR(key[i], 1));
+      state[!i][2] = WHIRLPOOL_XOR(state[i], 2) ^ (key[!i][2] = WHIRLPOOL_XOR(key[i], 2));
+      state[!i][3] = WHIRLPOOL_XOR(state[i], 3) ^ (key[!i][3] = WHIRLPOOL_XOR(key[i], 3));
+      state[!i][4] = WHIRLPOOL_XOR(state[i], 4) ^ (key[!i][4] = WHIRLPOOL_XOR(key[i], 4));
+      state[!i][5] = WHIRLPOOL_XOR(state[i], 5) ^ (key[!i][5] = WHIRLPOOL_XOR(key[i], 5));
+      state[!i][6] = WHIRLPOOL_XOR(state[i], 6) ^ (key[!i][6] = WHIRLPOOL_XOR(key[i], 6));
+      state[!i][7] = WHIRLPOOL_XOR(state[i], 7) ^ (key[!i][7] = WHIRLPOOL_XOR(key[i], 7));
     }

   /* Compression.  */

-  block_xor (context->hash_state, data_block, i);
-  block_xor (context->hash_state, state, i);
+  block_xor (context->hash_state, state[0], i);

   return /*burn_stack*/ 4 * sizeof(whirlpool_block_t) + 2 * sizeof(int) +
                         4 * sizeof(void*);

____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
Visit http://www.inbox.com/photosharing to find out more!

Gmane