David Goulet | 30 Jul 18:01 2014
Picon

_gcry_ath_mutex_lock: Assertion

Hello!

We are currently working on a test suite for libotr[1] which heavily
uses libgcrypt (version 1.5.3 in Debian). We use a small client that
exchanges OTR messages between two threads using libotr for stress,
regression and fuzzing tests.

While receiving messages at the same time, we can end up with this from
the gcry_md_read() call.

	lt-client: ath.c:193: _gcry_ath_mutex_lock: Assertion `*lock ==
	((ath_mutex_t) 0)' failed.

You can find a full gdb backtrace here --> http://pastebin.com/cqJDe7dR
Part are optimized out but let me know, I can provide you a -O0 version.

I did a small investiguation and it seems that there is contention on
this lock which for some reasons assert() when it's locked.

	ath_mutex_lock (&digests_registered_lock);

I see that in libgcrypt 1.6, it has been removed so is there a way to
avoid the issue for < 1.6 without defining NDEBUG?

Thanks!
David

[1] https://bugs.otr.im/projects/libotr
(Continue reading)

Kristian Fiskerstrand | 29 Jul 19:37 2014

[PATCH] Fix a segfault for x32 ABIs resulting in erronuous detection, of size for BYTES_PER_MPI_LIMB

Please find enclosed a patch for libgcrypt versus current git master
that fixes gnupg segfault for x32 ABIs reported at
https://bugs.gentoo.org/show_bug.cgi?id=512762

--

-- 
----------------------------
Kristian Fiskerstrand
Blog: http://blog.sumptuouscapital.com
Twitter:  <at> krifisk
----------------------------
Public PGP key 0xE3EDFAE3 at hkp://pool.sks-keyservers.net
fpr:94CB AFDD 3034 5109 5618 35AA 0B7F 8B60 E3ED FAE3
----------------------------
Ne nuntium necare
Don't kill the messenger
_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
by Werner Koch | 25 Jul 08:21 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-98-g4556f9b

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  4556f9b19c024f16bdf542da7173395c0741b91d (commit)
       via  0e10902ad7584277ac966367efc712b183784532 (commit)
       via  4e0bf1b9190ce08fb23eb3ae0c3be58954ff36ab (commit)
      from  4846e52728970e3117f3a046ef9010be089a3ae4 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 4556f9b19c024f16bdf542da7173395c0741b91d
Author: Werner Koch <wk <at> gnupg.org>
Date:   Thu Jul 24 12:30:32 2014 +0200

    ecc: Support the non-standard 0x40 compression flag for EdDSA.

    * cipher/ecc.c (ecc_generate): Check the "comp" flag for EdDSA.
    * cipher/ecc-eddsa.c (eddsa_encode_x_y): Add arg WITH_PREFIX.
    (_gcry_ecc_eddsa_encodepoint): Ditto.
    (_gcry_ecc_eddsa_ensure_compact): Handle the 0x40 compression prefix.
    (_gcry_ecc_eddsa_decodepoint): Ditto.
    * tests/keygrip.c: Check an compresssed with prefix Ed25519 key.
    * tests/t-ed25519.inp: Ditto.

diff --git a/cipher/ecc-common.h b/cipher/ecc-common.h
(Continue reading)

Markus Teich | 21 Jul 22:50 2014
Picon

[PATCH] typo

---
 mpi/ec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mpi/ec.c b/mpi/ec.c
index 4f35de0..2dd1397 100644
--- a/mpi/ec.c
+++ b/mpi/ec.c
 <at>  <at>  -999,7 +999,7  <at>  <at>  add_points_edwards (mpi_point_t result,
 #define G (ctx->t.scratch[6])
 #define tmp (ctx->t.scratch[7])

-  /* Compute: (X_3 : Y_3 : Z_3) = (X_1 : Y_1 : Z_1) + (X_2 : Y_2 : Z_3)  */
+  /* Compute: (X_3 : Y_3 : Z_3) = (X_1 : Y_1 : Z_1) + (X_2 : Y_2 : Z_2)  */

   /* A = Z1 ยท Z2 */
   ec_mulm (A, Z1, Z2, ctx);
--

-- 
1.8.5.5

_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
by NIIBE Yutaka | 16 Jul 10:14 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-95-g4846e52

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  4846e52728970e3117f3a046ef9010be089a3ae4 (commit)
      from  1b9b00bbe41bbed32563f1102049521e703e72bd (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 4846e52728970e3117f3a046ef9010be089a3ae4
Author: NIIBE Yutaka <gniibe <at> fsij.org>
Date:   Wed Jul 16 17:05:55 2014 +0900

    mpi: Add mpi_swap_cond.

    * mpi/mpiutil.c (_gcry_mpi_swap_cond): New.
    * src/mpi.h (mpi_swap_cond): New.
    --

    This is an internal function for now.

diff --git a/mpi/mpiutil.c b/mpi/mpiutil.c
index fdce578..f74dd91 100644
--- a/mpi/mpiutil.c
+++ b/mpi/mpiutil.c
 <at>  <at>  -542,6 +542,34  <at>  <at>  _gcry_mpi_swap (gcry_mpi_t a, gcry_mpi_t b)
(Continue reading)

Christian Grothoff | 10 Jul 19:35 2014
Picon

Re: ec subtract

Hohey!

Sounds like a question for libgcrypt hackers.  Extending the API to
make it trivial to invert a point would make sense IMO.

Happy hacking!

Christian

On 07/10/2014 07:14 PM, Markus Teich wrote:
> Heyho,
>
> for the ECBD I need to subtract two points on the curve (Z_{i+1} - Z_{i-1}). I
> found out[0], that I have to invert the y value of the second point to achieve
> this. However in libgcrypt (1.6.1) there seems to be no function, which achieves
> that easily[1]. Do I really have to use gcry_mpi_point_get, gcry_mpi_sub and
> gcry_mpi_point_set to invert the point?
>
> A gcry_mpi_ec_sub should definitely be supplied by the libgcrypt API.
>
> --Markus
>
>
> [0] http://crypto.stackexchange.com/questions/11316/subtracting-a-point-in-elliptic-curve-cryptography
> [1] https://www.gnupg.org/documentation/manuals/gcrypt/EC-functions.html

Attachment (0x48426C7E.asc): application/pgp-keys, 25 KiB
_______________________________________________
(Continue reading)

Erik Nyquist | 8 Jul 12:35 2014
Picon

AES-NI support detection: possible bug?

I tried compiling libgcrypt-1.5.0 on a platform with a Quark SoC (intel low- power SoC, which does not support AES instructions):

root <at> clanton:/media/mmcblk0p1# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 5
model           : 9
model name      : 05/09
stepping        : 0
cpu MHz         : 399.076
cache size      : 0 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : yes
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 7
wp              : yes
flags           : fpu vme pse tsc msr pae cx8 apic pbe nx smep
bogomips        : 798.15
clflush size    : 32
cache_alignment : 32
address sizes   : 32 bits physical, 32 bits virtual
power management:

During configure, the feature detection for AES-NI appears to return with the result that yes, AES instructions are supported (incorrectly so):

 Try using AES-NI crypto:   yes


So at run time, I get an 'illegal instruction' error. This was initially noticed while trying to connect to a wifi access point using wpa_supplicant- it can also be seen when running the tests included with libgcrypt:

root <at> clanton:/media/mmcblk0p1/libgcrypt-1.5.0-beta1# make check

ciphers:arcfour:blowfish:cast5:des:aes:twofish:serpent:rfc2268:seed:camellia:
pubkeys:dsa:elgamal:rsa:ecc:
digests:crc:md4:md5:rmd160:sha1:sha256:sha512:tiger:whirlpool:
rnd-mod:linux:
mpi-asm:i586/mpih-add1.S:i586/mpih-sub1.S:i586/mpih-mul1.S:i586/mpih-mul2.S:i586/mpih-mul3.S:i586/mpih-lshift.S:i586/mpih-rshift.S:
hwflist:
fips-mode:n:n:
PASS: version
PASS: t-mpi-bit
PASS: prime
PASS: register
PASS: ac
PASS: ac-schemes
PASS: ac-data
/bin/sh: line 4:  7998 Illegal instruction     ${dir}$tst
FAIL: basic
PASS: mpitests
PASS: tsexp
PASS: keygen
PASS: pubkey
PASS: hmac
PASS: keygrip
PASS: fips186-dsa
PASS: aeswrap
PASS: curves
PASS: random
MD5             50ms   120ms   750ms    90ms    50ms
SHA1           130ms   190ms   830ms   170ms   130ms
RIPEMD160      140ms   200ms   850ms   190ms   140ms
TIGER192       250ms   360ms  1150ms   320ms   250ms
SHA256         290ms   430ms  1140ms   330ms   290ms
SHA384         500ms   720ms  1330ms   540ms   480ms
SHA512         490ms   730ms  1320ms   540ms   480ms
SHA224         290ms   440ms  1130ms   330ms   290ms
MD4             40ms   100ms   750ms    80ms    40ms
CRC32           30ms    40ms   570ms    80ms    40ms
CRC32RFC1510    30ms    30ms   570ms    80ms    40ms
CRC24RFC2440   260ms   260ms   770ms   300ms   270ms
WHIRLPOOL     1740ms  1950ms  2530ms  1820ms  1740ms
TIGER          260ms   350ms  1150ms   320ms   250ms
TIGER2         260ms   350ms  1150ms   320ms   250ms

                ECB/Stream         CBC             CFB             OFB             CTR
             --------------- --------------- --------------- --------------- ---------------
3DES          1160ms  1160ms  1220ms  1260ms  1200ms  1220ms  1220ms  1210ms  1310ms  1330ms
CAST5          400ms   410ms   460ms   470ms   440ms   460ms   460ms   460ms   550ms   530ms
BLOWFISH       380ms   410ms   430ms   490ms   410ms   430ms   430ms   430ms   530ms   520ms
AES            340ms   350ms/bin/sh: line 4:  8244 Illegal instruction     ${dir}$tst
FAIL: benchmark
========================================
2 of 19 tests failed
========================================
make[2]: *** [check-TESTS] Error 1
make[2]: Leaving directory `/media/mmcblk0p1/libgcrypt-1.5.0-beta1/tests'
make[1]: *** [check-am] Error 2
make[1]: Leaving directory `/media/mmcblk0p1/libgcrypt-1.5.0-beta1/tests'
make: *** [check-recursive] Error 1


Has anyone seen any similar issues with configure failing to detect AES support accurately?

Erik.

_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
Colin Davis | 8 Jul 07:45 2014

Make fig2dev optional?

Installing fig2dev is unfortunately rather difficult on OSX, since it has a lot of prerequisites, and is
not bundled with homebrew.
Unfortunately, I didn't see an easy way to bypass creating these figures, other than manually editing the
Makefile every time..

I added a check, which will look to see if fig2dev is installed. If not, it doesn't add the images to BUILT_SOURCES
There may be a cleaner way to do this, but this lets me build lib

There may well be a cleaner way to do this, but lets me build without fig2dev, while still running it if
fig2dev is installed.

-CPD

https://gist.githubusercontent.com/e1ven/01244536540ae08c7dc1/raw/14b91133ddf0f616856afbab49d3251291606b5f/fig2dev.patch
From a038af5b9d747161f79ef0f28dfb848aa50fe81f Mon Sep 17 00:00:00 2001
From: Colin Davis <e1ven <at> e1ven.com>
Date: Tue, 8 Jul 2014 01:23:29 -0400
Subject: [PATCH] Make fig2dev optional

---
 configure.ac    |  2 ++
 doc/Makefile.am | 18 ++++++++++--------
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/configure.ac b/configure.ac
index c5952c7..58c276f 100644
--- a/configure.ac
+++ b/configure.ac
 <at>  <at>  -1334,6 +1334,8  <at>  <at>  if test "$gcry_cv_cc_arm_arch_is_v6" = "yes" ; then
      [Defined if ARM architecture is v6 or newer])
 fi

+AC_CHECK_PROG([fig2dev], fig2dev, yes, no)
+AM_CONDITIONAL([FOUND_FIG2DEV], [test "x$fig2dev" = xyes])

 #
 # Check whether GCC inline assembler supports NEON instructions
diff --git a/doc/Makefile.am b/doc/Makefile.am
index 30330bb..782ec6a 100644
--- a/doc/Makefile.am
+++ b/doc/Makefile.am
 <at>  <at>  -17,18 +17,20  <at>  <at> 
 # License along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA

-EXTRA_DIST = README.apichanges HACKING DCO \
-	     libgcrypt-modules.eps fips-fsm.eps \
-	     libgcrypt-modules.png fips-fsm.png \
-             libgcrypt-modules.pdf fips-fsm.pdf \
-	     yat2m.c
-
-DISTCLEANFILES = gcrypt.cps yat2m-stamp.tmp yat2m-stamp $(myman_pages)
-CLEANFILES = yat2m

+if FOUND_FIG2DEV
 BUILT_SOURCES = libgcrypt-modules.eps fips-fsm.eps \
                 libgcrypt-modules.png fips-fsm.png \
                 libgcrypt-modules.pdf fips-fsm.pdf
+else
+	BUILT_SOURCES = ""
+endif
+
+EXTRA_DIST = README.apichanges HACKING DCO \
+	     yat2m.c $(BUILT_SOURCES)
+
+DISTCLEANFILES = gcrypt.cps yat2m-stamp.tmp yat2m-stamp $(myman_pages)
+CLEANFILES = yat2m

 info_TEXINFOS = gcrypt.texi
 gcrypt_TEXINFOS = lgpl.texi gpl.texi libgcrypt-modules.fig fips-fsm.fig
--

-- 
2.0.1
Dmitry Eremin-Solenikov | 30 Jun 02:04 2014
Picon

[PATCH 1/3] Stribog: change endianness of the final result

* cipher/stribog.c: change endianness of the hash result.
* tests/basic.c (check_digests): adapt Stribog tests.
--
Stribog standard (GOST R 34.11-2012) is a bit vague on the
representation of the final result. This mistake is supported by GOST
signatures being not so clear on the endianness of the hash value.
Fix the Stribog result endianness to fully confirm to standard.
This is proven by a (draft) publication of PBKDF2 test cases done by
TC26.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov <at> gmail.com>
---
 cipher/stribog.c | 165 +++++++++++++++++++++++++------------------------------
 tests/basic.c    |  24 ++++----
 2 files changed, 86 insertions(+), 103 deletions(-)

diff --git a/cipher/stribog.c b/cipher/stribog.c
index 942bbf4..360cfec 100644
--- a/cipher/stribog.c
+++ b/cipher/stribog.c
 <at>  <at>  -1080,70 +1080,70  <at>  <at>  static const u64 stribog_table[8][256] =
     U64_C(0x72d14d3493b2e388), U64_C(0xd6a30f258c153427) },
 };

-static const u64 C16[13][16] =
+static const u64 C16[12][8] =
 {
-  { U64_C(0xdd806559f2a64507), U64_C(0x05767436cc744d23),
-    U64_C(0xa2422a08a460d315), U64_C(0x4b7ce09192676901),
-    U64_C(0x714eb88d7585c4fc), U64_C(0x2f6a76432e45d016),
-    U64_C(0xebcb2f81c0657c1f), U64_C(0xb1085bda1ecadae9) },
-  { U64_C(0xe679047021b19bb7), U64_C(0x55dda21bd7cbcd56),
-    U64_C(0x5cb561c2db0aa7ca), U64_C(0x9ab5176b12d69958),
-    U64_C(0x61d55e0f16b50131), U64_C(0xf3feea720a232b98),
-    U64_C(0x4fe39d460f70b5d7), U64_C(0x6fa3b58aa99d2f1a) },
-  { U64_C(0x991e96f50aba0ab2), U64_C(0xc2b6f443867adb31),
-    U64_C(0xc1c93a376062db09), U64_C(0xd3e20fe490359eb1),
-    U64_C(0xf2ea7514b1297b7b), U64_C(0x06f15e5f529c1f8b),
-    U64_C(0x0a39fc286a3d8435), U64_C(0xf574dcac2bce2fc7) },
-  { U64_C(0x220cbebc84e3d12e), U64_C(0x3453eaa193e837f1),
-    U64_C(0xd8b71333935203be), U64_C(0xa9d72c82ed03d675),
-    U64_C(0x9d721cad685e353f), U64_C(0x488e857e335c3c7d),
-    U64_C(0xf948e1a05d71e4dd), U64_C(0xef1fdfb3e81566d2) },
-  { U64_C(0x601758fd7c6cfe57), U64_C(0x7a56a27ea9ea63f5),
-    U64_C(0xdfff00b723271a16), U64_C(0xbfcd1747253af5a3),
-    U64_C(0x359e35d7800fffbd), U64_C(0x7f151c1f1686104a),
-    U64_C(0x9a3f410c6ca92363), U64_C(0x4bea6bacad474799) },
-  { U64_C(0xfa68407a46647d6e), U64_C(0xbf71c57236904f35),
-    U64_C(0x0af21f66c2bec6b6), U64_C(0xcffaa6b71c9ab7b4),
-    U64_C(0x187f9ab49af08ec6), U64_C(0x2d66c4f95142a46c),
-    U64_C(0x6fa4c33b7a3039c0), U64_C(0xae4faeae1d3ad3d9) },
-  { U64_C(0x8886564d3a14d493), U64_C(0x3517454ca23c4af3),
-    U64_C(0x06476983284a0504), U64_C(0x0992abc52d822c37),
-    U64_C(0xd3473e33197a93c9), U64_C(0x399ec6c7e6bf87c9),
-    U64_C(0x51ac86febf240954), U64_C(0xf4c70e16eeaac5ec) },
-  { U64_C(0xa47f0dd4bf02e71e), U64_C(0x36acc2355951a8d9),
-    U64_C(0x69d18d2bd1a5c42f), U64_C(0xf4892bcb929b0690),
-    U64_C(0x89b4443b4ddbc49a), U64_C(0x4eb7f8719c36de1e),
-    U64_C(0x03e7aa020c6e4141), U64_C(0x9b1f5b424d93c9a7) },
-  { U64_C(0x7261445183235adb), U64_C(0x0e38dc92cb1f2a60),
-    U64_C(0x7b2b8a9aa6079c54), U64_C(0x800a440bdbb2ceb1),
-    U64_C(0x3cd955b7e00d0984), U64_C(0x3a7d3a1b25894224),
-    U64_C(0x944c9ad8ec165fde), U64_C(0x378f5a541631229b) },
-  { U64_C(0x74b4c7fb98459ced), U64_C(0x3698fad1153bb6c3),
-    U64_C(0x7a1e6c303b7652f4), U64_C(0x9fe76702af69334b),
-    U64_C(0x1fffe18a1b336103), U64_C(0x8941e71cff8a78db),
-    U64_C(0x382ae548b2e4f3f3), U64_C(0xabbedea680056f52) },
-  { U64_C(0x6bcaa4cd81f32d1b), U64_C(0xdea2594ac06fd85d),
-    U64_C(0xefbacd1d7d476e98), U64_C(0x8a1d71efea48b9ca),
-    U64_C(0x2001802114846679), U64_C(0xd8fa6bbbebab0761),
-    U64_C(0x3002c6cd635afe94), U64_C(0x7bcd9ed0efc889fb) },
-  { U64_C(0x48bc924af11bd720), U64_C(0xfaf417d5d9b21b99),
-    U64_C(0xe71da4aa88e12852), U64_C(0x5d80ef9d1891cc86),
-    U64_C(0xf82012d430219f9b), U64_C(0xcda43c32bcdf1d77),
-    U64_C(0xd21380b00449b17a), U64_C(0x378ee767f11631ba) },
+  { U64_C(0xb1085bda1ecadae9), U64_C(0xebcb2f81c0657c1f),
+    U64_C(0x2f6a76432e45d016), U64_C(0x714eb88d7585c4fc),
+    U64_C(0x4b7ce09192676901), U64_C(0xa2422a08a460d315),
+    U64_C(0x05767436cc744d23), U64_C(0xdd806559f2a64507) },
+  { U64_C(0x6fa3b58aa99d2f1a), U64_C(0x4fe39d460f70b5d7),
+    U64_C(0xf3feea720a232b98), U64_C(0x61d55e0f16b50131),
+    U64_C(0x9ab5176b12d69958), U64_C(0x5cb561c2db0aa7ca),
+    U64_C(0x55dda21bd7cbcd56), U64_C(0xe679047021b19bb7) },
+  { U64_C(0xf574dcac2bce2fc7), U64_C(0x0a39fc286a3d8435),
+    U64_C(0x06f15e5f529c1f8b), U64_C(0xf2ea7514b1297b7b),
+    U64_C(0xd3e20fe490359eb1), U64_C(0xc1c93a376062db09),
+    U64_C(0xc2b6f443867adb31), U64_C(0x991e96f50aba0ab2) },
+  { U64_C(0xef1fdfb3e81566d2), U64_C(0xf948e1a05d71e4dd),
+    U64_C(0x488e857e335c3c7d), U64_C(0x9d721cad685e353f),
+    U64_C(0xa9d72c82ed03d675), U64_C(0xd8b71333935203be),
+    U64_C(0x3453eaa193e837f1), U64_C(0x220cbebc84e3d12e) },
+  { U64_C(0x4bea6bacad474799), U64_C(0x9a3f410c6ca92363),
+    U64_C(0x7f151c1f1686104a), U64_C(0x359e35d7800fffbd),
+    U64_C(0xbfcd1747253af5a3), U64_C(0xdfff00b723271a16),
+    U64_C(0x7a56a27ea9ea63f5), U64_C(0x601758fd7c6cfe57) },
+  { U64_C(0xae4faeae1d3ad3d9), U64_C(0x6fa4c33b7a3039c0),
+    U64_C(0x2d66c4f95142a46c), U64_C(0x187f9ab49af08ec6),
+    U64_C(0xcffaa6b71c9ab7b4), U64_C(0x0af21f66c2bec6b6),
+    U64_C(0xbf71c57236904f35), U64_C(0xfa68407a46647d6e) },
+  { U64_C(0xf4c70e16eeaac5ec), U64_C(0x51ac86febf240954),
+    U64_C(0x399ec6c7e6bf87c9), U64_C(0xd3473e33197a93c9),
+    U64_C(0x0992abc52d822c37), U64_C(0x06476983284a0504),
+    U64_C(0x3517454ca23c4af3), U64_C(0x8886564d3a14d493) },
+  { U64_C(0x9b1f5b424d93c9a7), U64_C(0x03e7aa020c6e4141),
+    U64_C(0x4eb7f8719c36de1e), U64_C(0x89b4443b4ddbc49a),
+    U64_C(0xf4892bcb929b0690), U64_C(0x69d18d2bd1a5c42f),
+    U64_C(0x36acc2355951a8d9), U64_C(0xa47f0dd4bf02e71e) },
+  { U64_C(0x378f5a541631229b), U64_C(0x944c9ad8ec165fde),
+    U64_C(0x3a7d3a1b25894224), U64_C(0x3cd955b7e00d0984),
+    U64_C(0x800a440bdbb2ceb1), U64_C(0x7b2b8a9aa6079c54),
+    U64_C(0x0e38dc92cb1f2a60), U64_C(0x7261445183235adb) },
+  { U64_C(0xabbedea680056f52), U64_C(0x382ae548b2e4f3f3),
+    U64_C(0x8941e71cff8a78db), U64_C(0x1fffe18a1b336103),
+    U64_C(0x9fe76702af69334b), U64_C(0x7a1e6c303b7652f4),
+    U64_C(0x3698fad1153bb6c3), U64_C(0x74b4c7fb98459ced) },
+  { U64_C(0x7bcd9ed0efc889fb), U64_C(0x3002c6cd635afe94),
+    U64_C(0xd8fa6bbbebab0761), U64_C(0x2001802114846679),
+    U64_C(0x8a1d71efea48b9ca), U64_C(0xefbacd1d7d476e98),
+    U64_C(0xdea2594ac06fd85d), U64_C(0x6bcaa4cd81f32d1b) },
+  { U64_C(0x378ee767f11631ba), U64_C(0xd21380b00449b17a),
+    U64_C(0xcda43c32bcdf1d77), U64_C(0xf82012d430219f9b),
+    U64_C(0x5d80ef9d1891cc86), U64_C(0xe71da4aa88e12852),
+    U64_C(0xfaf417d5d9b21b99), U64_C(0x48bc924af11bd720) },
 };

 
 #define strido(out, temp, i) do { \
 	u64 t; \
-	t  = stribog_table[0][(temp[0] >> (i * 8)) & 0xff]; \
-	t ^= stribog_table[1][(temp[1] >> (i * 8)) & 0xff]; \
-	t ^= stribog_table[2][(temp[2] >> (i * 8)) & 0xff]; \
-	t ^= stribog_table[3][(temp[3] >> (i * 8)) & 0xff]; \
-	t ^= stribog_table[4][(temp[4] >> (i * 8)) & 0xff]; \
-	t ^= stribog_table[5][(temp[5] >> (i * 8)) & 0xff]; \
-	t ^= stribog_table[6][(temp[6] >> (i * 8)) & 0xff]; \
-	t ^= stribog_table[7][(temp[7] >> (i * 8)) & 0xff]; \
-	out[i] = t; } while(0)
+	t  = stribog_table[0][(temp[7] >> (i * 8)) & 0xff]; \
+	t ^= stribog_table[1][(temp[6] >> (i * 8)) & 0xff]; \
+	t ^= stribog_table[2][(temp[5] >> (i * 8)) & 0xff]; \
+	t ^= stribog_table[3][(temp[4] >> (i * 8)) & 0xff]; \
+	t ^= stribog_table[4][(temp[3] >> (i * 8)) & 0xff]; \
+	t ^= stribog_table[5][(temp[2] >> (i * 8)) & 0xff]; \
+	t ^= stribog_table[6][(temp[1] >> (i * 8)) & 0xff]; \
+	t ^= stribog_table[7][(temp[0] >> (i * 8)) & 0xff]; \
+	out[7-i] = t; } while(0)

 static void LPSX (u64 *out, const u64 *a, const u64 *b)
 {
 <at>  <at>  -1227,14 +1227,14  <at>  <at>  transform_bits (STRIBOG_CONTEXT *hd, const unsigned char *data, unsigned count)
   int i;

   for (i = 0; i < 8; i++)
-    M[i] = buf_get_le64(data + i * 8);
+    M[7-i] = buf_get_le64(data + i * 8);

   g (hd->h, M, hd->N);
-  l = hd->N[0];
-  hd->N[0] += count;
-  if (hd->N[0] < l)
+  l = hd->N[7];
+  hd->N[7] += count;
+  if (hd->N[7] < l)
     { /* overflow */
-      for (i = 1; i < 8; i++)
+      for (i = 6; i >= 0; i++)
         {
           hd->N[i]++;
           if (hd->N[i] != 0)
 <at>  <at>  -1242,22 +1242,12  <at>  <at>  transform_bits (STRIBOG_CONTEXT *hd, const unsigned char *data, unsigned count)
         }
     }

-  hd->Sigma[0] += M[0];
-  for (i = 1; i < 8; i++)
-    if (hd->Sigma[i-1] < M[i-1])
-      hd->Sigma[i] += M[i] + 1;
+  hd->Sigma[7] += M[7];
+  for (i = 7; i >= 1; i--)
+    if (hd->Sigma[i] < M[i])
+      hd->Sigma[i-1] += M[i-1] + 1;
     else
-      hd->Sigma[i] += M[i];
-}
-
-static unsigned int
-transform_blk (void *context, const unsigned char *inbuf_arg)
-{
-  STRIBOG_CONTEXT *hd = context;
-
-  transform_bits (hd, inbuf_arg, 64 * 8);
-
-  return /* burn_stack */ 768;
+      hd->Sigma[i-1] += M[i-1];
 }

 static unsigned int
 <at>  <at>  -1267,7 +1257,8  <at>  <at>  transform ( void *c, const unsigned char *data, size_t nblks )

   do
     {
-      burn = transform_blk (c, data);
+      transform_bits (c, data, 64 * 8);
+      burn = /* burn_stack */ 768;
       data += 64;
     }
   while (--nblks);
 <at>  <at>  -1300,32 +1291,24  <at>  <at>  stribog_final (void *context)
   g (hd->h, hd->Sigma, Z);

   for (i = 0; i < 8; i++)
-    hd->h[i] = le_bswap64(hd->h[i]);
+    hd->h[i] = be_bswap64(hd->h[i]);

   _gcry_burn_stack (768);
 }

 static byte *
-stribog_read_512 (void *context)
+stribog_read (void *context)
 {
   STRIBOG_CONTEXT *hd = context;

   return hd->result;
 }

-static byte *
-stribog_read_256 (void *context)
-{
-  STRIBOG_CONTEXT *hd = context;
-
-  return hd->result + 32;
-}
-
 gcry_md_spec_t _gcry_digest_spec_stribog_256 =
   {
     GCRY_MD_STRIBOG256, {0, 0},
     "STRIBOG256", NULL, 0, NULL, 32,
-    stribog_init_256, _gcry_md_block_write, stribog_final, stribog_read_256,
+    stribog_init_256, _gcry_md_block_write, stribog_final, stribog_read,
     sizeof (STRIBOG_CONTEXT)
   };

 <at>  <at>  -1333,6 +1316,6  <at>  <at>  gcry_md_spec_t _gcry_digest_spec_stribog_512 =
   {
     GCRY_MD_STRIBOG512, {0, 0},
     "STRIBOG512", NULL, 0, NULL, 64,
-    stribog_init_512, _gcry_md_block_write, stribog_final, stribog_read_512,
+    stribog_init_512, _gcry_md_block_write, stribog_final, stribog_read,
     sizeof (STRIBOG_CONTEXT)
   };
diff --git a/tests/basic.c b/tests/basic.c
index 6d70cfd..f312fc0 100644
--- a/tests/basic.c
+++ b/tests/basic.c
 <at>  <at>  -4870,32 +4870,32  <at>  <at>  check_digests (void)
 	"\x8a\xcc\x14\x53\xb4\x87\xc8\x5c\x95\x9a\x3e\x85\x8c\x7d\x6e\x0c" },
       { GCRY_MD_STRIBOG512,
         "012345678901234567890123456789012345678901234567890123456789012",
-        "\x1b\x54\xd0\x1a\x4a\xf5\xb9\xd5\xcc\x3d\x86\xd6\x8d\x28\x54\x62"
-        "\xb1\x9a\xbc\x24\x75\x22\x2f\x35\xc0\x85\x12\x2b\xe4\xba\x1f\xfa"
-        "\x00\xad\x30\xf8\x76\x7b\x3a\x82\x38\x4c\x65\x74\xf0\x24\xc3\x11"
-        "\xe2\xa4\x81\x33\x2b\x08\xef\x7f\x41\x79\x78\x91\xc1\x64\x6f\x48" },
+        "\x48\x6f\x64\xc1\x91\x78\x79\x41\x7f\xef\x08\x2b\x33\x81\xa4\xe2"
+        "\x11\xc3\x24\xf0\x74\x65\x4c\x38\x82\x3a\x7b\x76\xf8\x30\xad\x00"
+        "\xfa\x1f\xba\xe4\x2b\x12\x85\xc0\x35\x2f\x22\x75\x24\xbc\x9a\xb1"
+        "\x62\x54\x28\x8d\xd6\x86\x3d\xcc\xd5\xb9\xf5\x4a\x1a\xd0\x54\x1b" },
       { GCRY_MD_STRIBOG256,
         "012345678901234567890123456789012345678901234567890123456789012",
-        "\x9d\x15\x1e\xef\xd8\x59\x0b\x89\xda\xa6\xba\x6c\xb7\x4a\xf9\x27"
-        "\x5d\xd0\x51\x02\x6b\xb1\x49\xa4\x52\xfd\x84\xe5\xe5\x7b\x55\x00" },
+        "\x00\x55\x7b\xe5\xe5\x84\xfd\x52\xa4\x49\xb1\x6b\x02\x51\xd0\x5d"
+        "\x27\xf9\x4a\xb7\x6c\xba\xa6\xda\x89\x0b\x59\xd8\xef\x1e\x15\x9d" },
       { GCRY_MD_STRIBOG512,
         "\xd1\xe5\x20\xe2\xe5\xf2\xf0\xe8\x2c\x20\xd1\xf2\xf0\xe8\xe1\xee"
         "\xe6\xe8\x20\xe2\xed\xf3\xf6\xe8\x2c\x20\xe2\xe5\xfe\xf2\xfa\x20"
         "\xf1\x20\xec\xee\xf0\xff\x20\xf1\xf2\xf0\xe5\xeb\xe0\xec\xe8\x20"
         "\xed\xe0\x20\xf5\xf0\xe0\xe1\xf0\xfb\xff\x20\xef\xeb\xfa\xea\xfb"
         "\x20\xc8\xe3\xee\xf0\xe5\xe2\xfb",
-        "\x1e\x88\xe6\x22\x26\xbf\xca\x6f\x99\x94\xf1\xf2\xd5\x15\x69\xe0"
-        "\xda\xf8\x47\x5a\x3b\x0f\xe6\x1a\x53\x00\xee\xe4\x6d\x96\x13\x76"
-        "\x03\x5f\xe8\x35\x49\xad\xa2\xb8\x62\x0f\xcd\x7c\x49\x6c\xe5\xb3"
-        "\x3f\x0c\xb9\xdd\xdc\x2b\x64\x60\x14\x3b\x03\xda\xba\xc9\xfb\x28" },
+        "\x28\xfb\xc9\xba\xda\x03\x3b\x14\x60\x64\x2b\xdc\xdd\xb9\x0c\x3f"
+        "\xb3\xe5\x6c\x49\x7c\xcd\x0f\x62\xb8\xa2\xad\x49\x35\xe8\x5f\x03"
+        "\x76\x13\x96\x6d\xe4\xee\x00\x53\x1a\xe6\x0f\x3b\x5a\x47\xf8\xda"
+        "\xe0\x69\x15\xd5\xf2\xf1\x94\x99\x6f\xca\xbf\x26\x22\xe6\x88\x1e" },
       { GCRY_MD_STRIBOG256,
         "\xd1\xe5\x20\xe2\xe5\xf2\xf0\xe8\x2c\x20\xd1\xf2\xf0\xe8\xe1\xee"
         "\xe6\xe8\x20\xe2\xed\xf3\xf6\xe8\x2c\x20\xe2\xe5\xfe\xf2\xfa\x20"
         "\xf1\x20\xec\xee\xf0\xff\x20\xf1\xf2\xf0\xe5\xeb\xe0\xec\xe8\x20"
         "\xed\xe0\x20\xf5\xf0\xe0\xe1\xf0\xfb\xff\x20\xef\xeb\xfa\xea\xfb"
         "\x20\xc8\xe3\xee\xf0\xe5\xe2\xfb",
-        "\x9d\xd2\xfe\x4e\x90\x40\x9e\x5d\xa8\x7f\x53\x97\x6d\x74\x05\xb0"
-        "\xc0\xca\xc6\x28\xfc\x66\x9a\x74\x1d\x50\x06\x3c\x55\x7e\x8f\x50" },
+        "\x50\x8f\x7e\x55\x3c\x06\x50\x1d\x74\x9a\x66\xfc\x28\xc6\xca\xc0"
+        "\xb0\x05\x74\x6d\x97\x53\x7f\xa8\x5d\x9e\x40\x90\x4e\xfe\xd2\x9d" },
       {	0 }
     };
   gcry_error_t err;
--

-- 
2.0.0
by Jussi Kivilinna | 29 Jun 16:45 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-94-g1b9b00b

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  1b9b00bbe41bbed32563f1102049521e703e72bd (commit)
      from  066f068bd0bc4d8e01f1f18b6153cdc8d2c245d7 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 1b9b00bbe41bbed32563f1102049521e703e72bd
Author: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
Date:   Sun Jun 29 17:36:29 2014 +0300

    Speed-up SHA-1 NEON assembly implementation

    * cipher/sha1-armv7-neon.S: Tweak implementation for speed-up.
    --

    Benchmark on Cortex-A8 1008Mhz:

    New:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
     SHA1           |      7.04 ns/B     135.4 MiB/s      7.10 c/B

    Old:
                    |  nanosecs/byte   mebibytes/sec   cycles/byte
     SHA1           |      7.79 ns/B     122.4 MiB/s      7.85 c/B

    Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>

diff --git a/cipher/sha1-armv7-neon.S b/cipher/sha1-armv7-neon.S
index 95b677d..f314d8e 100644
--- a/cipher/sha1-armv7-neon.S
+++ b/cipher/sha1-armv7-neon.S
 <at>  <at>  -1,5 +1,5  <at>  <at> 
 /* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
- * Copyright (C) 2013 Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
+ * Copyright (C) 2013-2014 Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
  *
  * Based on sha1.c:
  *  Copyright (C) 1998, 2001, 2002, 2003, 2008 Free Software Foundation, Inc.
 <at>  <at>  -26,12 +26,12  <at>  <at> 
     defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
     defined(HAVE_GCC_INLINE_ASM_NEON) && defined(USE_SHA1)

-.data
-
 .syntax unified
 .fpu neon
 .arm

+.text
+
 #ifdef __PIC__
 #  define GET_DATA_POINTER(reg, name, rtmp) \
 		ldr reg, 1f; \
 <at>  <at>  -69,16 +69,13  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 .LK4:	.long K4, K4, K4, K4

 
-.text
-
 /* Register macros */

 #define RSTATE r0
 #define RDATA r1
 #define RNBLKS r2
 #define ROLDSTACK r3
-#define RK lr
-#define RWK r12
+#define RWK lr

 #define _a r4
 #define _b r5
 <at>  <at>  -89,6 +86,7  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 #define RT0 r9
 #define RT1 r10
 #define RT2 r11
+#define RT3 r12

 #define W0 q0
 #define W1 q1
 <at>  <at>  -104,7 +102,10  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 #define tmp2 q10
 #define tmp3 q11

-#define curK q12
+#define qK1 q12
+#define qK2 q13
+#define qK3 q14
+#define qK4 q15

 
 /* Round function macros. */
 <at>  <at>  -112,43 +113,43  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 #define WK_offs(i) (((i) & 15) * 4)

 #define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
-	and RT0, c, b; \
+	ldr RT3, [sp, WK_offs(i)]; \
 		pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	bic RT0, d, b; \
 	add e, e, a, ror #(32 - 5); \
-	ldr RT2, [sp, WK_offs(i)]; \
-	bic RT1, d, b; \
-	add e, RT2; \
+	and RT1, c, b; \
 		pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	add RT0, RT0, RT3; \
+	add e, e, RT1; \
 	ror b, #(32 - 30); \
-	eor RT0, RT1; \
 		pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
-	add e, RT0;
+	add e, e, RT0;

 #define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
-	eor RT0, c, b; \
+	ldr RT3, [sp, WK_offs(i)]; \
 		pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	eor RT0, d, b; \
 	add e, e, a, ror #(32 - 5); \
-	ldr RT2, [sp, WK_offs(i)]; \
-	eor RT0, d; \
+	eor RT0, RT0, c; \
 		pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
-	add e, RT2; \
+	add e, e, RT3; \
 	ror b, #(32 - 30); \
 		pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
-	add e, RT0; \
+	add e, e, RT0; \

 #define _R_F3(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
-	eor RT0, c, b; \
+	ldr RT3, [sp, WK_offs(i)]; \
 		pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	eor RT0, b, c; \
+	and RT1, b, c; \
 	add e, e, a, ror #(32 - 5); \
-	ldr RT2, [sp, WK_offs(i)]; \
-	and RT1, c, b; \
-	and RT0, d; \
-	add e, RT2; \
 		pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	and RT0, RT0, d; \
+	add RT1, RT1, RT3; \
+	add e, e, RT0; \
 	ror b, #(32 - 30); \
-	add e, RT1; \
 		pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
-	add e, RT0;
+	add e, e, RT1;

 #define _R_F4(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
 	_R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28)
 <at>  <at>  -183,10 +184,10  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 	vst1.32   {tmp2, tmp3}, [RWK];				\

 #define WPRECALC_00_15_0(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	add       RWK, sp, #(WK_offs(0));			\
+	vld1.32   {tmp0, tmp1}, [RDATA]!;			\

 #define WPRECALC_00_15_1(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vld1.32   {tmp0, tmp1}, [RDATA]!;			\
+	add       RWK, sp, #(WK_offs(0));			\

 #define WPRECALC_00_15_2(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
 	vrev32.8  W0, tmp0;		/* big => little */	\
 <at>  <at>  -225,25 +226,25  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 /********* Precalc macros for rounds 16-31 ************************************/

 #define WPRECALC_16_31_0(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	add       RWK, sp, #(WK_offs(i));	\
-
-#define WPRECALC_16_31_1(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
 	veor      tmp0, tmp0;			\
 	vext.8    W, W_m16, W_m12, #8;		\

-#define WPRECALC_16_31_2(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
+#define WPRECALC_16_31_1(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
+	add       RWK, sp, #(WK_offs(i));	\
 	vext.8    tmp0, W_m04, tmp0, #4;	\
+
+#define WPRECALC_16_31_2(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
+	veor      tmp0, tmp0, W_m16;		\
 	veor.32   W, W, W_m08;			\

 #define WPRECALC_16_31_3(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor      tmp0, tmp0, W_m16;		\
 	veor      tmp1, tmp1;			\
+	veor      W, W, tmp0;			\

 #define WPRECALC_16_31_4(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor      W, W, tmp0;			\
+	vshl.u32  tmp0, W, #1;			\

 #define WPRECALC_16_31_5(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vshl.u32  tmp0, W, #1;			\
 	vext.8    tmp1, tmp1, W, #(16-12);	\
 	vshr.u32  W, W, #31;			\

 <at>  <at>  -270,28 +271,28  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 /********* Precalc macros for rounds 32-79 ************************************/

 #define WPRECALC_32_79_0(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	add RWK, sp, #(WK_offs(i&~3)); \
+	veor W, W_m28; \

 #define WPRECALC_32_79_1(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor W, W_m28; \
+	vext.8 tmp0, W_m08, W_m04, #8; \

 #define WPRECALC_32_79_2(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vext.8 tmp0, W_m08, W_m04, #8; \
+	veor W, W_m16; \

 #define WPRECALC_32_79_3(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor W, W_m16; \
+	veor W, tmp0; \

 #define WPRECALC_32_79_4(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor W, tmp0; \
+	add RWK, sp, #(WK_offs(i&~3)); \

 #define WPRECALC_32_79_5(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vshr.u32 tmp0, W, #30; \
+	vshl.u32 tmp1, W, #2; \

 #define WPRECALC_32_79_6(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vshl.u32 W, W, #2; \
+	vshr.u32 tmp0, W, #30; \

 #define WPRECALC_32_79_7(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vorr W, tmp0, W; \
+	vorr W, tmp0, tmp1; \

 #define WPRECALC_32_79_8(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
 	vadd.u32 tmp0, W, curK; \
 <at>  <at>  -326,20 +327,26  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   beq .Ldo_nothing;

   push {r4-r12, lr};
+
+  GET_DATA_POINTER(RT3, .LK_VEC, _a);
   vpush {q4-q7};

   mov ROLDSTACK, sp;
-  GET_DATA_POINTER(RK, .LK_VEC, _a);

   /* Align stack. */
   sub sp, #(16*4);
   and sp, #(~(16-1));

+  vld1.32 {qK1-qK2}, [RT3]!; /* Load K1,K2 */
+
   /* Get the values of the chaining variables. */
   ldm RSTATE, {_a-_e};

+  vld1.32 {qK3-qK4}, [RT3]; /* Load K3,K4 */
+
+#undef curK
+#define curK qK1
   /* Precalc 0-15. */
-  vld1.32 {curK}, [RK]!; /* Load K1. */
   W_PRECALC_00_15();

   b .Loop;
 <at>  <at>  -352,7 +359,8  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   _R( _d, _e, _a, _b, _c, F1,  2, WPRECALC_16_31_6, WPRECALC_16_31_7, WPRECALC_16_31_8, 16, W4, W5, W6, W7,
W0, _, _, _ );
   _R( _c, _d, _e, _a, _b, F1,  3, WPRECALC_16_31_9, WPRECALC_16_31_10,WPRECALC_16_31_11,16, W4, W5, W6,
W7, W0, _, _, _ );

-  vld1.32 {curK}, [RK]!; /* Load K2. */
+#undef curK
+#define curK qK2
   _R( _b, _c, _d, _e, _a, F1,  4, WPRECALC_16_31_0, WPRECALC_16_31_1, WPRECALC_16_31_2, 20, W3, W4, W5, W6,
W7, _, _, _ );
   _R( _a, _b, _c, _d, _e, F1,  5, WPRECALC_16_31_3, WPRECALC_16_31_4, WPRECALC_16_31_5, 20, W3, W4, W5, W6,
W7, _, _, _ );
   _R( _e, _a, _b, _c, _d, F1,  6, WPRECALC_16_31_6, WPRECALC_16_31_7, WPRECALC_16_31_8, 20, W3, W4, W5, W6,
W7, _, _, _ );
 <at>  <at>  -371,72 +379,75  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   /* Transform 16-63 + Precalc 32-79. */
   _R( _e, _a, _b, _c, _d, F1, 16, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 32, W0, W1, W2, W3,
W4, W5, W6, W7);
   _R( _d, _e, _a, _b, _c, F1, 17, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 32, W0, W1, W2, W3,
W4, W5, W6, W7);
-  _R( _c, _d, _e, _a, _b, F1, 18, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            32, W0, W1, W2, W3, W4, W5, W6, W7);
+  _R( _c, _d, _e, _a, _b, F1, 18, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 32, W0, W1, W2, W3, W4, W5, W6, W7);
   _R( _b, _c, _d, _e, _a, F1, 19, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 32, W0, W1, W2, W3, W4, W5, W6, W7);

   _R( _a, _b, _c, _d, _e, F2, 20, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 36, W7, W0, W1, W2,
W3, W4, W5, W6);
   _R( _e, _a, _b, _c, _d, F2, 21, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 36, W7, W0, W1, W2,
W3, W4, W5, W6);
-  _R( _d, _e, _a, _b, _c, F2, 22, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            36, W7, W0, W1, W2, W3, W4, W5, W6);
+  _R( _d, _e, _a, _b, _c, F2, 22, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 36, W7, W0, W1, W2, W3, W4, W5, W6);
   _R( _c, _d, _e, _a, _b, F2, 23, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 36, W7, W0, W1, W2, W3, W4, W5, W6);

-  vld1.32 {curK}, [RK]!; /* Load K3. */
+#undef curK
+#define curK qK3
   _R( _b, _c, _d, _e, _a, F2, 24, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 40, W6, W7, W0, W1,
W2, W3, W4, W5);
   _R( _a, _b, _c, _d, _e, F2, 25, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 40, W6, W7, W0, W1,
W2, W3, W4, W5);
-  _R( _e, _a, _b, _c, _d, F2, 26, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            40, W6, W7, W0, W1, W2, W3, W4, W5);
+  _R( _e, _a, _b, _c, _d, F2, 26, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 40, W6, W7, W0, W1, W2, W3, W4, W5);
   _R( _d, _e, _a, _b, _c, F2, 27, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 40, W6, W7, W0, W1, W2, W3, W4, W5);

   _R( _c, _d, _e, _a, _b, F2, 28, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 44, W5, W6, W7, W0,
W1, W2, W3, W4);
   _R( _b, _c, _d, _e, _a, F2, 29, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 44, W5, W6, W7, W0,
W1, W2, W3, W4);
-  _R( _a, _b, _c, _d, _e, F2, 30, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            44, W5, W6, W7, W0, W1, W2, W3, W4);
+  _R( _a, _b, _c, _d, _e, F2, 30, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 44, W5, W6, W7, W0, W1, W2, W3, W4);
   _R( _e, _a, _b, _c, _d, F2, 31, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 44, W5, W6, W7, W0, W1, W2, W3, W4);

   _R( _d, _e, _a, _b, _c, F2, 32, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 48, W4, W5, W6, W7,
W0, W1, W2, W3);
   _R( _c, _d, _e, _a, _b, F2, 33, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 48, W4, W5, W6, W7,
W0, W1, W2, W3);
-  _R( _b, _c, _d, _e, _a, F2, 34, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            48, W4, W5, W6, W7, W0, W1, W2, W3);
+  _R( _b, _c, _d, _e, _a, F2, 34, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 48, W4, W5, W6, W7, W0, W1, W2, W3);
   _R( _a, _b, _c, _d, _e, F2, 35, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 48, W4, W5, W6, W7, W0, W1, W2, W3);

   _R( _e, _a, _b, _c, _d, F2, 36, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 52, W3, W4, W5, W6,
W7, W0, W1, W2);
   _R( _d, _e, _a, _b, _c, F2, 37, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 52, W3, W4, W5, W6,
W7, W0, W1, W2);
-  _R( _c, _d, _e, _a, _b, F2, 38, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            52, W3, W4, W5, W6, W7, W0, W1, W2);
+  _R( _c, _d, _e, _a, _b, F2, 38, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 52, W3, W4, W5, W6, W7, W0, W1, W2);
   _R( _b, _c, _d, _e, _a, F2, 39, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 52, W3, W4, W5, W6, W7, W0, W1, W2);

   _R( _a, _b, _c, _d, _e, F3, 40, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 56, W2, W3, W4, W5,
W6, W7, W0, W1);
   _R( _e, _a, _b, _c, _d, F3, 41, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 56, W2, W3, W4, W5,
W6, W7, W0, W1);
-  _R( _d, _e, _a, _b, _c, F3, 42, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            56, W2, W3, W4, W5, W6, W7, W0, W1);
+  _R( _d, _e, _a, _b, _c, F3, 42, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 56, W2, W3, W4, W5, W6, W7, W0, W1);
   _R( _c, _d, _e, _a, _b, F3, 43, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 56, W2, W3, W4, W5, W6, W7, W0, W1);

-  vld1.32 {curK}, [RK]!; /* Load K4. */
+#undef curK
+#define curK qK4
   _R( _b, _c, _d, _e, _a, F3, 44, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 60, W1, W2, W3, W4,
W5, W6, W7, W0);
   _R( _a, _b, _c, _d, _e, F3, 45, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 60, W1, W2, W3, W4,
W5, W6, W7, W0);
-  _R( _e, _a, _b, _c, _d, F3, 46, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            60, W1, W2, W3, W4, W5, W6, W7, W0);
+  _R( _e, _a, _b, _c, _d, F3, 46, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 60, W1, W2, W3, W4, W5, W6, W7, W0);
   _R( _d, _e, _a, _b, _c, F3, 47, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 60, W1, W2, W3, W4, W5, W6, W7, W0);

   _R( _c, _d, _e, _a, _b, F3, 48, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 64, W0, W1, W2, W3,
W4, W5, W6, W7);
   _R( _b, _c, _d, _e, _a, F3, 49, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 64, W0, W1, W2, W3,
W4, W5, W6, W7);
-  _R( _a, _b, _c, _d, _e, F3, 50, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            64, W0, W1, W2, W3, W4, W5, W6, W7);
+  _R( _a, _b, _c, _d, _e, F3, 50, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 64, W0, W1, W2, W3, W4, W5, W6, W7);
   _R( _e, _a, _b, _c, _d, F3, 51, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 64, W0, W1, W2, W3, W4, W5, W6, W7);

   _R( _d, _e, _a, _b, _c, F3, 52, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 68, W7, W0, W1, W2,
W3, W4, W5, W6);
   _R( _c, _d, _e, _a, _b, F3, 53, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 68, W7, W0, W1, W2,
W3, W4, W5, W6);
-  _R( _b, _c, _d, _e, _a, F3, 54, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            68, W7, W0, W1, W2, W3, W4, W5, W6);
+  _R( _b, _c, _d, _e, _a, F3, 54, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 68, W7, W0, W1, W2, W3, W4, W5, W6);
   _R( _a, _b, _c, _d, _e, F3, 55, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 68, W7, W0, W1, W2, W3, W4, W5, W6);

   _R( _e, _a, _b, _c, _d, F3, 56, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 72, W6, W7, W0, W1,
W2, W3, W4, W5);
   _R( _d, _e, _a, _b, _c, F3, 57, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 72, W6, W7, W0, W1,
W2, W3, W4, W5);
-  _R( _c, _d, _e, _a, _b, F3, 58, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            72, W6, W7, W0, W1, W2, W3, W4, W5);
+  _R( _c, _d, _e, _a, _b, F3, 58, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 72, W6, W7, W0, W1, W2, W3, W4, W5);
   _R( _b, _c, _d, _e, _a, F3, 59, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 72, W6, W7, W0, W1, W2, W3, W4, W5);

-  sub RK, #64;
+  subs RNBLKS, #1;
+
   _R( _a, _b, _c, _d, _e, F4, 60, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 76, W5, W6, W7, W0,
W1, W2, W3, W4);
   _R( _e, _a, _b, _c, _d, F4, 61, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 76, W5, W6, W7, W0,
W1, W2, W3, W4);
-  _R( _d, _e, _a, _b, _c, F4, 62, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            76, W5, W6, W7, W0, W1, W2, W3, W4);
+  _R( _d, _e, _a, _b, _c, F4, 62, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 76, W5, W6, W7, W0, W1, W2, W3, W4);
   _R( _c, _d, _e, _a, _b, F4, 63, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 76, W5, W6, W7, W0, W1, W2, W3, W4);

-  subs RNBLKS, #1;
   beq .Lend;

   /* Transform 64-79 + Precalc 0-15 of next block. */
-  vld1.32 {curK}, [RK]!; /* Load K1. */
+#undef curK
+#define curK qK1
   _R( _b, _c, _d, _e, _a, F4, 64, WPRECALC_00_15_0, dummy, dummy, _, _, _, _, _, _, _, _, _ );
   _R( _a, _b, _c, _d, _e, F4, 65, WPRECALC_00_15_1, dummy, dummy, _, _, _, _, _, _, _, _, _ );
   _R( _e, _a, _b, _c, _d, F4, 66, WPRECALC_00_15_2, dummy, dummy, _, _, _, _, _, _, _, _, _ );
 <at>  <at>  -458,14 +469,13  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   _R( _b, _c, _d, _e, _a, F4, 79, WPRECALC_00_15_11, dummy, WPRECALC_00_15_12, _, _, _, _, _, _, _, _, _ );

   /* Update the chaining variables. */
-  ldm RSTATE, {RT0-RT2};
+  ldm RSTATE, {RT0-RT3};
   add _a, RT0;
-  ldr RT0, [RSTATE, #state_h3];
+  ldr RT0, [RSTATE, #state_h4];
   add _b, RT1;
-  ldr RT1, [RSTATE, #state_h4];
   add _c, RT2;
-  add _d, RT0;
-  add _e, RT1;
+  add _d, RT3;
+  add _e, RT0;
   stm RSTATE, {_a-_e};

   b .Loop;
 <at>  <at>  -493,15 +503,14  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   mov sp, ROLDSTACK;

   /* Update the chaining variables. */
-  ldm RSTATE, {RT0-RT2};
+  ldm RSTATE, {RT0-RT3};
   add _a, RT0;
-  ldr RT0, [RSTATE, #state_h3];
+  ldr RT0, [RSTATE, #state_h4];
   add _b, RT1;
-  ldr RT1, [RSTATE, #state_h4];
   add _c, RT2;
-  add _d, RT0;
+  add _d, RT3;
   vpop {q4-q7};
-  add _e, RT1;
+  add _e, RT0;
   stm RSTATE, {_a-_e};

   /* burn_stack */

-----------------------------------------------------------------------

Summary of changes:
 cipher/sha1-armv7-neon.S |  155 ++++++++++++++++++++++++----------------------
 1 file changed, 82 insertions(+), 73 deletions(-)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
Jussi Kivilinna | 29 Jun 15:36 2014
Picon
Picon

[PATCH] Speed-up SHA-1 NEON assembly implementation

* cipher/sha1-armv7-neon.S: Tweak implementation for speed-up.
--

Benchmark on Cortex-A8 1008Mhz:

New:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |      7.04 ns/B     135.4 MiB/s      7.10 c/B

Old:
                |  nanosecs/byte   mebibytes/sec   cycles/byte
 SHA1           |      7.79 ns/B     122.4 MiB/s      7.85 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
---
 cipher/sha1-armv7-neon.S |  155 ++++++++++++++++++++++++----------------------
 1 file changed, 82 insertions(+), 73 deletions(-)

diff --git a/cipher/sha1-armv7-neon.S b/cipher/sha1-armv7-neon.S
index 95b677d..f314d8e 100644
--- a/cipher/sha1-armv7-neon.S
+++ b/cipher/sha1-armv7-neon.S
 <at>  <at>  -1,5 +1,5  <at>  <at> 
 /* sha1-armv7-neon.S - ARM/NEON accelerated SHA-1 transform function
- * Copyright (C) 2013 Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
+ * Copyright (C) 2013-2014 Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
  *
  * Based on sha1.c:
  *  Copyright (C) 1998, 2001, 2002, 2003, 2008 Free Software Foundation, Inc.
 <at>  <at>  -26,12 +26,12  <at>  <at> 
     defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
     defined(HAVE_GCC_INLINE_ASM_NEON) && defined(USE_SHA1)

-.data
-
 .syntax unified
 .fpu neon
 .arm

+.text
+
 #ifdef __PIC__
 #  define GET_DATA_POINTER(reg, name, rtmp) \
 		ldr reg, 1f; \
 <at>  <at>  -69,16 +69,13  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 .LK4:	.long K4, K4, K4, K4

 
-.text
-
 /* Register macros */

 #define RSTATE r0
 #define RDATA r1
 #define RNBLKS r2
 #define ROLDSTACK r3
-#define RK lr
-#define RWK r12
+#define RWK lr

 #define _a r4
 #define _b r5
 <at>  <at>  -89,6 +86,7  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 #define RT0 r9
 #define RT1 r10
 #define RT2 r11
+#define RT3 r12

 #define W0 q0
 #define W1 q1
 <at>  <at>  -104,7 +102,10  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 #define tmp2 q10
 #define tmp3 q11

-#define curK q12
+#define qK1 q12
+#define qK2 q13
+#define qK3 q14
+#define qK4 q15

 
 /* Round function macros. */
 <at>  <at>  -112,43 +113,43  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 #define WK_offs(i) (((i) & 15) * 4)

 #define _R_F1(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
-	and RT0, c, b; \
+	ldr RT3, [sp, WK_offs(i)]; \
 		pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	bic RT0, d, b; \
 	add e, e, a, ror #(32 - 5); \
-	ldr RT2, [sp, WK_offs(i)]; \
-	bic RT1, d, b; \
-	add e, RT2; \
+	and RT1, c, b; \
 		pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	add RT0, RT0, RT3; \
+	add e, e, RT1; \
 	ror b, #(32 - 30); \
-	eor RT0, RT1; \
 		pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
-	add e, RT0;
+	add e, e, RT0;

 #define _R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
-	eor RT0, c, b; \
+	ldr RT3, [sp, WK_offs(i)]; \
 		pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	eor RT0, d, b; \
 	add e, e, a, ror #(32 - 5); \
-	ldr RT2, [sp, WK_offs(i)]; \
-	eor RT0, d; \
+	eor RT0, RT0, c; \
 		pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
-	add e, RT2; \
+	add e, e, RT3; \
 	ror b, #(32 - 30); \
 		pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
-	add e, RT0; \
+	add e, e, RT0; \

 #define _R_F3(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
-	eor RT0, c, b; \
+	ldr RT3, [sp, WK_offs(i)]; \
 		pre1(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	eor RT0, b, c; \
+	and RT1, b, c; \
 	add e, e, a, ror #(32 - 5); \
-	ldr RT2, [sp, WK_offs(i)]; \
-	and RT1, c, b; \
-	and RT0, d; \
-	add e, RT2; \
 		pre2(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
+	and RT0, RT0, d; \
+	add RT1, RT1, RT3; \
+	add e, e, RT0; \
 	ror b, #(32 - 30); \
-	add e, RT1; \
 		pre3(i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28); \
-	add e, RT0;
+	add e, e, RT1;

 #define _R_F4(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28) \
 	_R_F2(a,b,c,d,e,i,pre1,pre2,pre3,i16,W,W_m04,W_m08,W_m12,W_m16,W_m20,W_m24,W_m28)
 <at>  <at>  -183,10 +184,10  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 	vst1.32   {tmp2, tmp3}, [RWK];				\

 #define WPRECALC_00_15_0(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	add       RWK, sp, #(WK_offs(0));			\
+	vld1.32   {tmp0, tmp1}, [RDATA]!;			\

 #define WPRECALC_00_15_1(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vld1.32   {tmp0, tmp1}, [RDATA]!;			\
+	add       RWK, sp, #(WK_offs(0));			\

 #define WPRECALC_00_15_2(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
 	vrev32.8  W0, tmp0;		/* big => little */	\
 <at>  <at>  -225,25 +226,25  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 /********* Precalc macros for rounds 16-31 ************************************/

 #define WPRECALC_16_31_0(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	add       RWK, sp, #(WK_offs(i));	\
-
-#define WPRECALC_16_31_1(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
 	veor      tmp0, tmp0;			\
 	vext.8    W, W_m16, W_m12, #8;		\

-#define WPRECALC_16_31_2(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
+#define WPRECALC_16_31_1(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
+	add       RWK, sp, #(WK_offs(i));	\
 	vext.8    tmp0, W_m04, tmp0, #4;	\
+
+#define WPRECALC_16_31_2(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
+	veor      tmp0, tmp0, W_m16;		\
 	veor.32   W, W, W_m08;			\

 #define WPRECALC_16_31_3(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor      tmp0, tmp0, W_m16;		\
 	veor      tmp1, tmp1;			\
+	veor      W, W, tmp0;			\

 #define WPRECALC_16_31_4(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor      W, W, tmp0;			\
+	vshl.u32  tmp0, W, #1;			\

 #define WPRECALC_16_31_5(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vshl.u32  tmp0, W, #1;			\
 	vext.8    tmp1, tmp1, W, #(16-12);	\
 	vshr.u32  W, W, #31;			\

 <at>  <at>  -270,28 +271,28  <at>  <at>  gcry_sha1_armv7_neon_K_VEC:
 /********* Precalc macros for rounds 32-79 ************************************/

 #define WPRECALC_32_79_0(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	add RWK, sp, #(WK_offs(i&~3)); \
+	veor W, W_m28; \

 #define WPRECALC_32_79_1(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor W, W_m28; \
+	vext.8 tmp0, W_m08, W_m04, #8; \

 #define WPRECALC_32_79_2(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vext.8 tmp0, W_m08, W_m04, #8; \
+	veor W, W_m16; \

 #define WPRECALC_32_79_3(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor W, W_m16; \
+	veor W, tmp0; \

 #define WPRECALC_32_79_4(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	veor W, tmp0; \
+	add RWK, sp, #(WK_offs(i&~3)); \

 #define WPRECALC_32_79_5(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vshr.u32 tmp0, W, #30; \
+	vshl.u32 tmp1, W, #2; \

 #define WPRECALC_32_79_6(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vshl.u32 W, W, #2; \
+	vshr.u32 tmp0, W, #30; \

 #define WPRECALC_32_79_7(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
-	vorr W, tmp0, W; \
+	vorr W, tmp0, tmp1; \

 #define WPRECALC_32_79_8(i, W, W_m04, W_m08, W_m12, W_m16, W_m20, W_m24, W_m28) \
 	vadd.u32 tmp0, W, curK; \
 <at>  <at>  -326,20 +327,26  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   beq .Ldo_nothing;

   push {r4-r12, lr};
+
+  GET_DATA_POINTER(RT3, .LK_VEC, _a);
   vpush {q4-q7};

   mov ROLDSTACK, sp;
-  GET_DATA_POINTER(RK, .LK_VEC, _a);

   /* Align stack. */
   sub sp, #(16*4);
   and sp, #(~(16-1));

+  vld1.32 {qK1-qK2}, [RT3]!; /* Load K1,K2 */
+
   /* Get the values of the chaining variables. */
   ldm RSTATE, {_a-_e};

+  vld1.32 {qK3-qK4}, [RT3]; /* Load K3,K4 */
+
+#undef curK
+#define curK qK1
   /* Precalc 0-15. */
-  vld1.32 {curK}, [RK]!; /* Load K1. */
   W_PRECALC_00_15();

   b .Loop;
 <at>  <at>  -352,7 +359,8  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   _R( _d, _e, _a, _b, _c, F1,  2, WPRECALC_16_31_6, WPRECALC_16_31_7, WPRECALC_16_31_8, 16, W4, W5, W6, W7,
W0, _, _, _ );
   _R( _c, _d, _e, _a, _b, F1,  3, WPRECALC_16_31_9, WPRECALC_16_31_10,WPRECALC_16_31_11,16, W4, W5, W6,
W7, W0, _, _, _ );

-  vld1.32 {curK}, [RK]!; /* Load K2. */
+#undef curK
+#define curK qK2
   _R( _b, _c, _d, _e, _a, F1,  4, WPRECALC_16_31_0, WPRECALC_16_31_1, WPRECALC_16_31_2, 20, W3, W4, W5, W6,
W7, _, _, _ );
   _R( _a, _b, _c, _d, _e, F1,  5, WPRECALC_16_31_3, WPRECALC_16_31_4, WPRECALC_16_31_5, 20, W3, W4, W5, W6,
W7, _, _, _ );
   _R( _e, _a, _b, _c, _d, F1,  6, WPRECALC_16_31_6, WPRECALC_16_31_7, WPRECALC_16_31_8, 20, W3, W4, W5, W6,
W7, _, _, _ );
 <at>  <at>  -371,72 +379,75  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   /* Transform 16-63 + Precalc 32-79. */
   _R( _e, _a, _b, _c, _d, F1, 16, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 32, W0, W1, W2, W3,
W4, W5, W6, W7);
   _R( _d, _e, _a, _b, _c, F1, 17, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 32, W0, W1, W2, W3,
W4, W5, W6, W7);
-  _R( _c, _d, _e, _a, _b, F1, 18, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            32, W0, W1, W2, W3, W4, W5, W6, W7);
+  _R( _c, _d, _e, _a, _b, F1, 18, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 32, W0, W1, W2, W3, W4, W5, W6, W7);
   _R( _b, _c, _d, _e, _a, F1, 19, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 32, W0, W1, W2, W3, W4, W5, W6, W7);

   _R( _a, _b, _c, _d, _e, F2, 20, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 36, W7, W0, W1, W2,
W3, W4, W5, W6);
   _R( _e, _a, _b, _c, _d, F2, 21, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 36, W7, W0, W1, W2,
W3, W4, W5, W6);
-  _R( _d, _e, _a, _b, _c, F2, 22, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            36, W7, W0, W1, W2, W3, W4, W5, W6);
+  _R( _d, _e, _a, _b, _c, F2, 22, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 36, W7, W0, W1, W2, W3, W4, W5, W6);
   _R( _c, _d, _e, _a, _b, F2, 23, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 36, W7, W0, W1, W2, W3, W4, W5, W6);

-  vld1.32 {curK}, [RK]!; /* Load K3. */
+#undef curK
+#define curK qK3
   _R( _b, _c, _d, _e, _a, F2, 24, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 40, W6, W7, W0, W1,
W2, W3, W4, W5);
   _R( _a, _b, _c, _d, _e, F2, 25, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 40, W6, W7, W0, W1,
W2, W3, W4, W5);
-  _R( _e, _a, _b, _c, _d, F2, 26, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            40, W6, W7, W0, W1, W2, W3, W4, W5);
+  _R( _e, _a, _b, _c, _d, F2, 26, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 40, W6, W7, W0, W1, W2, W3, W4, W5);
   _R( _d, _e, _a, _b, _c, F2, 27, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 40, W6, W7, W0, W1, W2, W3, W4, W5);

   _R( _c, _d, _e, _a, _b, F2, 28, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 44, W5, W6, W7, W0,
W1, W2, W3, W4);
   _R( _b, _c, _d, _e, _a, F2, 29, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 44, W5, W6, W7, W0,
W1, W2, W3, W4);
-  _R( _a, _b, _c, _d, _e, F2, 30, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            44, W5, W6, W7, W0, W1, W2, W3, W4);
+  _R( _a, _b, _c, _d, _e, F2, 30, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 44, W5, W6, W7, W0, W1, W2, W3, W4);
   _R( _e, _a, _b, _c, _d, F2, 31, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 44, W5, W6, W7, W0, W1, W2, W3, W4);

   _R( _d, _e, _a, _b, _c, F2, 32, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 48, W4, W5, W6, W7,
W0, W1, W2, W3);
   _R( _c, _d, _e, _a, _b, F2, 33, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 48, W4, W5, W6, W7,
W0, W1, W2, W3);
-  _R( _b, _c, _d, _e, _a, F2, 34, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            48, W4, W5, W6, W7, W0, W1, W2, W3);
+  _R( _b, _c, _d, _e, _a, F2, 34, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 48, W4, W5, W6, W7, W0, W1, W2, W3);
   _R( _a, _b, _c, _d, _e, F2, 35, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 48, W4, W5, W6, W7, W0, W1, W2, W3);

   _R( _e, _a, _b, _c, _d, F2, 36, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 52, W3, W4, W5, W6,
W7, W0, W1, W2);
   _R( _d, _e, _a, _b, _c, F2, 37, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 52, W3, W4, W5, W6,
W7, W0, W1, W2);
-  _R( _c, _d, _e, _a, _b, F2, 38, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            52, W3, W4, W5, W6, W7, W0, W1, W2);
+  _R( _c, _d, _e, _a, _b, F2, 38, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 52, W3, W4, W5, W6, W7, W0, W1, W2);
   _R( _b, _c, _d, _e, _a, F2, 39, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 52, W3, W4, W5, W6, W7, W0, W1, W2);

   _R( _a, _b, _c, _d, _e, F3, 40, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 56, W2, W3, W4, W5,
W6, W7, W0, W1);
   _R( _e, _a, _b, _c, _d, F3, 41, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 56, W2, W3, W4, W5,
W6, W7, W0, W1);
-  _R( _d, _e, _a, _b, _c, F3, 42, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            56, W2, W3, W4, W5, W6, W7, W0, W1);
+  _R( _d, _e, _a, _b, _c, F3, 42, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 56, W2, W3, W4, W5, W6, W7, W0, W1);
   _R( _c, _d, _e, _a, _b, F3, 43, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 56, W2, W3, W4, W5, W6, W7, W0, W1);

-  vld1.32 {curK}, [RK]!; /* Load K4. */
+#undef curK
+#define curK qK4
   _R( _b, _c, _d, _e, _a, F3, 44, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 60, W1, W2, W3, W4,
W5, W6, W7, W0);
   _R( _a, _b, _c, _d, _e, F3, 45, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 60, W1, W2, W3, W4,
W5, W6, W7, W0);
-  _R( _e, _a, _b, _c, _d, F3, 46, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            60, W1, W2, W3, W4, W5, W6, W7, W0);
+  _R( _e, _a, _b, _c, _d, F3, 46, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 60, W1, W2, W3, W4, W5, W6, W7, W0);
   _R( _d, _e, _a, _b, _c, F3, 47, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 60, W1, W2, W3, W4, W5, W6, W7, W0);

   _R( _c, _d, _e, _a, _b, F3, 48, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 64, W0, W1, W2, W3,
W4, W5, W6, W7);
   _R( _b, _c, _d, _e, _a, F3, 49, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 64, W0, W1, W2, W3,
W4, W5, W6, W7);
-  _R( _a, _b, _c, _d, _e, F3, 50, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            64, W0, W1, W2, W3, W4, W5, W6, W7);
+  _R( _a, _b, _c, _d, _e, F3, 50, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 64, W0, W1, W2, W3, W4, W5, W6, W7);
   _R( _e, _a, _b, _c, _d, F3, 51, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 64, W0, W1, W2, W3, W4, W5, W6, W7);

   _R( _d, _e, _a, _b, _c, F3, 52, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 68, W7, W0, W1, W2,
W3, W4, W5, W6);
   _R( _c, _d, _e, _a, _b, F3, 53, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 68, W7, W0, W1, W2,
W3, W4, W5, W6);
-  _R( _b, _c, _d, _e, _a, F3, 54, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            68, W7, W0, W1, W2, W3, W4, W5, W6);
+  _R( _b, _c, _d, _e, _a, F3, 54, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 68, W7, W0, W1, W2, W3, W4, W5, W6);
   _R( _a, _b, _c, _d, _e, F3, 55, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 68, W7, W0, W1, W2, W3, W4, W5, W6);

   _R( _e, _a, _b, _c, _d, F3, 56, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 72, W6, W7, W0, W1,
W2, W3, W4, W5);
   _R( _d, _e, _a, _b, _c, F3, 57, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 72, W6, W7, W0, W1,
W2, W3, W4, W5);
-  _R( _c, _d, _e, _a, _b, F3, 58, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            72, W6, W7, W0, W1, W2, W3, W4, W5);
+  _R( _c, _d, _e, _a, _b, F3, 58, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 72, W6, W7, W0, W1, W2, W3, W4, W5);
   _R( _b, _c, _d, _e, _a, F3, 59, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 72, W6, W7, W0, W1, W2, W3, W4, W5);

-  sub RK, #64;
+  subs RNBLKS, #1;
+
   _R( _a, _b, _c, _d, _e, F4, 60, WPRECALC_32_79_0, WPRECALC_32_79_1, WPRECALC_32_79_2, 76, W5, W6, W7, W0,
W1, W2, W3, W4);
   _R( _e, _a, _b, _c, _d, F4, 61, WPRECALC_32_79_3, WPRECALC_32_79_4, WPRECALC_32_79_5, 76, W5, W6, W7, W0,
W1, W2, W3, W4);
-  _R( _d, _e, _a, _b, _c, F4, 62, WPRECALC_32_79_6, WPRECALC_32_79_7, dummy,            76, W5, W6, W7, W0, W1, W2, W3, W4);
+  _R( _d, _e, _a, _b, _c, F4, 62, WPRECALC_32_79_6, dummy,            WPRECALC_32_79_7, 76, W5, W6, W7, W0, W1, W2, W3, W4);
   _R( _c, _d, _e, _a, _b, F4, 63, WPRECALC_32_79_8, dummy,            WPRECALC_32_79_9, 76, W5, W6, W7, W0, W1, W2, W3, W4);

-  subs RNBLKS, #1;
   beq .Lend;

   /* Transform 64-79 + Precalc 0-15 of next block. */
-  vld1.32 {curK}, [RK]!; /* Load K1. */
+#undef curK
+#define curK qK1
   _R( _b, _c, _d, _e, _a, F4, 64, WPRECALC_00_15_0, dummy, dummy, _, _, _, _, _, _, _, _, _ );
   _R( _a, _b, _c, _d, _e, F4, 65, WPRECALC_00_15_1, dummy, dummy, _, _, _, _, _, _, _, _, _ );
   _R( _e, _a, _b, _c, _d, F4, 66, WPRECALC_00_15_2, dummy, dummy, _, _, _, _, _, _, _, _, _ );
 <at>  <at>  -458,14 +469,13  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   _R( _b, _c, _d, _e, _a, F4, 79, WPRECALC_00_15_11, dummy, WPRECALC_00_15_12, _, _, _, _, _, _, _, _, _ );

   /* Update the chaining variables. */
-  ldm RSTATE, {RT0-RT2};
+  ldm RSTATE, {RT0-RT3};
   add _a, RT0;
-  ldr RT0, [RSTATE, #state_h3];
+  ldr RT0, [RSTATE, #state_h4];
   add _b, RT1;
-  ldr RT1, [RSTATE, #state_h4];
   add _c, RT2;
-  add _d, RT0;
-  add _e, RT1;
+  add _d, RT3;
+  add _e, RT0;
   stm RSTATE, {_a-_e};

   b .Loop;
 <at>  <at>  -493,15 +503,14  <at>  <at>  _gcry_sha1_transform_armv7_neon:
   mov sp, ROLDSTACK;

   /* Update the chaining variables. */
-  ldm RSTATE, {RT0-RT2};
+  ldm RSTATE, {RT0-RT3};
   add _a, RT0;
-  ldr RT0, [RSTATE, #state_h3];
+  ldr RT0, [RSTATE, #state_h4];
   add _b, RT1;
-  ldr RT1, [RSTATE, #state_h4];
   add _c, RT2;
-  add _d, RT0;
+  add _d, RT3;
   vpop {q4-q7};
-  add _e, RT1;
+  add _e, RT0;
   stm RSTATE, {_a-_e};

   /* burn_stack */

Gmane