Jan Bilek | 28 Nov 02:10 2014
Picon

AES192 & AES256 in CBC mode [libgcrypt]

Hello,

I've just bounced in a potential problem with libgcrypt while trying to do AES192 & AES256 in CBC mode.

All works well with AES128 for all cipher modes, however when moving to AES192 & AES256 and GCRY_CIPHER_MODE_CBC it looks like all buffers are being written just in first 128 bits of output.

Please see example code attached.

Let me know if you'll be able to confirm that and if confirmed if I may help with fixing it.

Thank you & Kind Regards,
Jan
Jan Bilek CTO, EFTlab Pty Ltd email: jan.bilek <at> eftlab.co.uk mob: +61 (0) 498 103 179 This message contains confidential information and is intended only for the addressee(s). E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. EFTlab Ltd cannot accept liability for any errors or omissions in the contents of this message, which may arise as a result of e-mail transmission. Please note that EFTlab Ltd may monitor, analyse and archive email traffic, data and the content of email for the purposes of security, legal compliance and staff training. If you have received this email in error please notify us at support <at> eftlab.co.uk. EFTlab is a limited company registered in England & Wales with Reg No. 07528943. The Registered Office is 21-27 Lamb's Conduit Street, London, WC1N 3GS.
Attachment (crypto_aes.cpp): text/x-c++src, 6065 bytes
_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
by Werner Koch | 24 Nov 12:32 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-128-gd53ea84

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  d53ea84bed37b973f7ce59262c50b33700cd8311 (commit)
       via  1b4210c204a5ef5e631187509e011b8468a134ef (commit)
      from  e6130034506013d6153465a2bedb6fb08a43f74d (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit d53ea84bed37b973f7ce59262c50b33700cd8311
Author: Werner Koch <wk <at> gnupg.org>
Date:   Mon Nov 24 12:28:33 2014 +0100

    Remove duplicated prototypes.

    * src/gcrypt-int.h (_gcry_mpi_ec_new, _gcry_mpi_ec_set_mpi)
    (gcry_mpi_ec_set_point): Remove.
    --

    Thos used gpg_error_t instead of gpg_err_code_t and the picky AIX
    compiler takes this as a severe error.

    Signed-off-by: Werner Koch <wk <at> gnupg.org>

diff --git a/src/gcrypt-int.h b/src/gcrypt-int.h
index 918937b..29d4fd3 100644
--- a/src/gcrypt-int.h
+++ b/src/gcrypt-int.h
 <at>  <at>  -416,15 +416,10  <at>  <at>  gcry_mpi_point_t _gcry_mpi_point_set (gcry_mpi_point_t point,
 gcry_mpi_point_t _gcry_mpi_point_snatch_set (gcry_mpi_point_t point,
                                             gcry_mpi_t x, gcry_mpi_t y,
                                             gcry_mpi_t z);
-gpg_error_t _gcry_mpi_ec_new (gcry_ctx_t *r_ctx,
-                             gcry_sexp_t keyparam, const char *curvename);
+
 gcry_mpi_t _gcry_mpi_ec_get_mpi (const char *name, gcry_ctx_t ctx, int copy);
 gcry_mpi_point_t _gcry_mpi_ec_get_point (const char *name,
                                         gcry_ctx_t ctx, int copy);
-gpg_error_t _gcry_mpi_ec_set_mpi (const char *name, gcry_mpi_t newvalue,
-                                 gcry_ctx_t ctx);
-gpg_error_t _gcry_mpi_ec_set_point (const char *name, gcry_mpi_point_t newvalue,
-                                   gcry_ctx_t ctx);
 int _gcry_mpi_ec_get_affine (gcry_mpi_t x, gcry_mpi_t y, gcry_mpi_point_t point,
                              mpi_ec_t ctx);
 void _gcry_mpi_ec_dup (gcry_mpi_point_t w, gcry_mpi_point_t u, gcry_ctx_t ctx);

commit 1b4210c204a5ef5e631187509e011b8468a134ef
Author: Werner Koch <wk <at> gnupg.org>
Date:   Tue Oct 14 21:29:33 2014 +0200

    tests: Add a prime mode to benchmark.

    * tests/benchmark.c (progress_cb): Add a single char mode.
    (prime_bench): New.
    (main): Add a "prime" mode.  Factor with_progress out to file scope.

    Signed-off-by: Werner Koch <wk <at> gnupg.org>

diff --git a/tests/benchmark.c b/tests/benchmark.c
index 2621551..5bf92da 100644
--- a/tests/benchmark.c
+++ b/tests/benchmark.c
 <at>  <at>  -62,6 +62,12  <at>  <at>  static int in_fips_mode;
 /* Whether we are running as part of the regression test suite.  */
 static int in_regression_test;

+/* Whether --progress is in use.  */
+static int with_progress;
+
+/* Runtime flag to switch to a different progress output.  */
+static int single_char_progress;
+

 static const char sample_private_dsa_key_1024[] =
 "(private-key\n"
 <at>  <at>  -429,9 +435,17  <at>  <at>  progress_cb (void *cb_data, const char *what, int printchar,
 {
   (void)cb_data;

-  fprintf (stderr, PGM ": progress (%s %c %d %d)\n",
-           what, printchar, current, total);
-  fflush (stderr);
+  if (single_char_progress)
+    {
+      fputc (printchar, stdout);
+      fflush (stderr);
+    }
+  else
+    {
+      fprintf (stderr, PGM ": progress (%s %c %d %d)\n",
+               what, printchar, current, total);
+      fflush (stderr);
+    }
 }

 
 <at>  <at>  -1544,6 +1558,51  <at>  <at>  mpi_bench (void)
 }

 
+static void
+prime_bench (void)
+{
+  gpg_error_t err;
+  int i;
+  gcry_mpi_t prime;
+  int old_prog = single_char_progress;
+
+  single_char_progress = 1;
+  if (!with_progress)
+    printf ("%-10s", "prime");
+  fflush (stdout);
+  start_timer ();
+  for (i=0; i < 10; i++)
+    {
+      if (with_progress)
+        fputs ("primegen ", stdout);
+      err = gcry_prime_generate (&prime,
+                                 1024, 0,
+                                 NULL,
+                                 NULL, NULL,
+                                 GCRY_WEAK_RANDOM,
+                                 GCRY_PRIME_FLAG_SECRET);
+      if (with_progress)
+        {
+          fputc ('\n', stdout);
+          fflush (stdout);
+        }
+      if (err)
+        {
+          fprintf (stderr, PGM ": error creating prime: %s\n",
+                   gpg_strerror (err));
+          exit (1);
+        }
+      gcry_mpi_release (prime);
+    }
+  stop_timer ();
+  if (with_progress)
+    printf ("%-10s", "prime");
+  printf (" %s\n", elapsed_time ()); fflush (stdout);
+
+  single_char_progress = old_prog;
+}
+
+
 int
 main( int argc, char **argv )
 {
 <at>  <at>  -1551,7 +1610,6  <at>  <at>  main( int argc, char **argv )
   int no_blinding = 0;
   int use_random_daemon = 0;
   int use_secmem = 0;
-  int with_progress = 0;
   int debug = 0;
   int pk_count = 100;

 <at>  <at>  -1582,7 +1640,7  <at>  <at>  main( int argc, char **argv )
       else if (!strcmp (*argv, "--help"))
         {
           fputs ("usage: benchmark "
-                 "[md|mac|cipher|random|mpi|rsa|dsa|ecc [algonames]]\n",
+                 "[md|mac|cipher|random|mpi|rsa|dsa|ecc|prime [algonames]]\n",
                  stdout);
           exit (0);
         }
 <at>  <at>  -1833,6 +1891,11  <at>  <at>  main( int argc, char **argv )
         gcry_control (GCRYCTL_ENABLE_QUICK_RANDOM, 0);
         ecc_bench (pk_count, 1);
     }
+  else if ( !strcmp (*argv, "prime"))
+    {
+        gcry_control (GCRYCTL_ENABLE_QUICK_RANDOM, 0);
+        prime_bench ();
+    }
   else
     {
       fprintf (stderr, PGM ": bad arguments\n");

-----------------------------------------------------------------------

Summary of changes:
 src/gcrypt-int.h  |    7 +----
 tests/benchmark.c |   73 +++++++++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 69 insertions(+), 11 deletions(-)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
by NIIBE Yutaka | 20 Nov 01:46 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-126-ge613003

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  e6130034506013d6153465a2bedb6fb08a43f74d (commit)
      from  95eef21583d8e998efc48f22898c1ae31b77cb48 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit e6130034506013d6153465a2bedb6fb08a43f74d
Author: NIIBE Yutaka <gniibe <at> fsij.org>
Date:   Wed Nov 19 15:48:12 2014 +0900

    ecc: Improve Montgomery curve implementation.

    * cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Support
    MPI_EC_MONTGOMERY.
    * cipher/ecc.c (test_ecdh_only_keys): New.
    (nist_generate_key): Call test_ecdh_only_keys for MPI_EC_MONTGOMERY.
    (check_secret_key): Handle Montgomery curve of x-coordinate only.
    * mpi/ec.c (_gcry_mpi_ec_mul_point): Resize points before the loop.
    Simplify, using pointers of Q1, Q2, PRD, and SUM.
    --

diff --git a/cipher/ecc-curves.c b/cipher/ecc-curves.c
index fd47c1d..9975bb4 100644
--- a/cipher/ecc-curves.c
+++ b/cipher/ecc-curves.c
 <at>  <at>  -530,9 +530,8  <at>  <at>  _gcry_ecc_fill_in_curve (unsigned int nbits, const char *name,
     {
     case MPI_EC_WEIERSTRASS:
     case MPI_EC_EDWARDS:
-      break;
     case MPI_EC_MONTGOMERY:
-      return GPG_ERR_NOT_SUPPORTED;
+      break;
     default:
       return GPG_ERR_BUG;
     }
diff --git a/cipher/ecc.c b/cipher/ecc.c
index 8bdbd56..2f5e401 100644
--- a/cipher/ecc.c
+++ b/cipher/ecc.c
 <at>  <at>  -81,6 +81,7  <at>  <at>  static void *progress_cb_data;
 
 /* Local prototypes. */
 static void test_keys (ECC_secret_key * sk, unsigned int nbits);
+static void test_ecdh_only_keys (ECC_secret_key * sk, unsigned int nbits);
 static unsigned int ecc_get_nbits (gcry_sexp_t parms);

 
 <at>  <at>  -209,7 +210,10  <at>  <at>  nist_generate_key (ECC_secret_key *sk, elliptic_curve_t *E, mpi_ec_t ctx,

   point_free (&Q);
   /* Now we can test our keys (this should never fail!).  */
-  test_keys (sk, nbits - 64);
+  if (sk->E.model != MPI_EC_MONTGOMERY)
+    test_keys (sk, nbits - 64);
+  else
+    test_ecdh_only_keys (sk, nbits - 64);

   return 0;
 }
 <at>  <at>  -266,6 +270,80  <at>  <at>  test_keys (ECC_secret_key *sk, unsigned int nbits)
 }

 
+static void
+test_ecdh_only_keys (ECC_secret_key *sk, unsigned int nbits)
+{
+  ECC_public_key pk;
+  gcry_mpi_t test;
+  mpi_point_struct R_;
+  gcry_mpi_t x0, x1;
+  mpi_ec_t ec;
+
+  if (DBG_CIPHER)
+    log_debug ("Testing key.\n");
+
+  point_init (&R_);
+
+  pk.E = _gcry_ecc_curve_copy (sk->E);
+  point_init (&pk.Q);
+  point_set (&pk.Q, &sk->Q);
+
+  if (sk->E.dialect == ECC_DIALECT_ED25519)
+    {
+      char *rndbuf;
+
+      test = mpi_new (256);
+      rndbuf = _gcry_random_bytes (32, GCRY_WEAK_RANDOM);
+      rndbuf[0] &= 0x7f;  /* Clear bit 255. */
+      rndbuf[0] |= 0x40;  /* Set bit 254.   */
+      rndbuf[31] &= 0xf8; /* Clear bits 2..0 so that d mod 8 == 0  */
+      _gcry_mpi_set_buffer (test, rndbuf, 32, 0);
+      xfree (rndbuf);
+    }
+  else
+    {
+      test = mpi_new (nbits);
+      _gcry_mpi_randomize (test, nbits, GCRY_WEAK_RANDOM);
+    }
+
+  ec = _gcry_mpi_ec_p_internal_new (pk.E.model, pk.E.dialect, 0,
+                                    pk.E.p, pk.E.a, pk.E.b);
+  x0 = mpi_new (0);
+  x1 = mpi_new (0);
+
+  /* R_ = hkQ  <=>  R_ = hkdG  */
+  _gcry_mpi_ec_mul_point (&R_, test, &pk.Q, ec);
+  if (sk->E.dialect != ECC_DIALECT_ED25519)
+    _gcry_mpi_ec_mul_point (&R_, ec->h, &R_, ec);
+  if (_gcry_mpi_ec_get_affine (x0, NULL, &R_, ec))
+    log_fatal ("ecdh: Failed to get affine coordinates for hkQ\n");
+
+  _gcry_mpi_ec_mul_point (&R_, test, &pk.E.G, ec);
+  _gcry_mpi_ec_mul_point (&R_, sk->d, &R_, ec);
+  /* R_ = hdkG */
+  if (sk->E.dialect != ECC_DIALECT_ED25519)
+    _gcry_mpi_ec_mul_point (&R_, ec->h, &R_, ec);
+
+  if (_gcry_mpi_ec_get_affine (x1, NULL, &R_, ec))
+    log_fatal ("ecdh: Failed to get affine coordinates for hdkG\n");
+
+  if (mpi_cmp (x0, x1))
+    {
+      log_fatal ("ECDH test failed.\n");
+    }
+
+  mpi_free (x0);
+  mpi_free (x1);
+  _gcry_mpi_ec_free (ec);
+
+  point_free (&pk.Q);
+  _gcry_ecc_curve_free (&pk.E);
+
+  point_free (&R_);
+  mpi_free (test);
+}
+
+
 /*
  * To check the validity of the value, recalculate the correspondence
  * between the public value and the secret one.
 <at>  <at>  -281,7 +359,10  <at>  <at>  check_secret_key (ECC_secret_key *sk, mpi_ec_t ec, int flags)

   point_init (&Q);
   x1 = mpi_new (0);
-  y1 = mpi_new (0);
+  if (ec->model == MPI_EC_MONTGOMERY)
+    y1 = NULL;
+  else
+    y1 = mpi_new (0);

   /* G in E(F_p) */
   if (!_gcry_mpi_ec_curve_point (&sk->E.G, ec))
 <at>  <at>  -338,7 +419,7  <at>  <at>  check_secret_key (ECC_secret_key *sk, mpi_ec_t ec, int flags)
   else if (!mpi_cmp_ui (sk->Q.z, 1))
     {
       /* Fast path if Q is already in affine coordinates.  */
-      if (mpi_cmp (x1, sk->Q.x) || mpi_cmp (y1, sk->Q.y))
+      if (mpi_cmp (x1, sk->Q.x) || (!y1 && mpi_cmp (y1, sk->Q.y)))
         {
           if (DBG_CIPHER)
             log_debug
 <at>  <at>  -1581,7 +1662,7  <at>  <at>  compute_keygrip (gcry_md_hd_t md, gcry_sexp_t keyparms)
       char buf[30];

       if (idx == 5)
-	continue;		/* Skip cofactor. */
+        continue;               /* Skip cofactor. */

       if (mpi_is_opaque (values[idx]))
         {
diff --git a/mpi/ec.c b/mpi/ec.c
index 80f3b22..0b7c7a7 100644
--- a/mpi/ec.c
+++ b/mpi/ec.c
 <at>  <at>  -1251,7 +1251,9  <at>  <at>  _gcry_mpi_ec_mul_point (mpi_point_t result,
       unsigned int nbits;
       int j;
       mpi_point_struct p1_, p2_;
+      mpi_point_t q1, q2, prd, sum;
       unsigned long sw;
+      size_t nlimbs;

       /* Compute scalar point multiplication with Montgomery Ladder.
          Note that we don't use Y-coordinate in the points at all.
 <at>  <at>  -1267,27 +1269,35  <at>  <at>  _gcry_mpi_ec_mul_point (mpi_point_t result,
       p2.x  = mpi_copy (point->x);
       mpi_set_ui (p2.z, 1);

+      nlimbs = 2*(nbits+BITS_PER_MPI_LIMB-1)/BITS_PER_MPI_LIMB+1;
+      mpi_resize (p1.x, nlimbs);
+      mpi_resize (p1.z, nlimbs);
+      mpi_resize (p2.x, nlimbs);
+      mpi_resize (p2.z, nlimbs);
+      mpi_resize (p1_.x, nlimbs);
+      mpi_resize (p1_.z, nlimbs);
+      mpi_resize (p2_.x, nlimbs);
+      mpi_resize (p2_.z, nlimbs);
+
+      q1 = &p1;
+      q2 = &p2;
+      prd = &p1_;
+      sum = &p2_;
+
       for (j=nbits-1; j >= 0; j--)
         {
-          sw = mpi_test_bit (scalar, j);
-          mpi_swap_cond (p1.x, p2.x, sw);
-          mpi_swap_cond (p1.z, p2.z, sw);
-          montgomery_ladder (&p1_, &p2_, &p1, &p2, point->x, ctx);
-          mpi_swap_cond (p1_.x, p2_.x, sw);
-          mpi_swap_cond (p1_.z, p2_.z, sw);
-
-          if (--j < 0)
-            break;
+          mpi_point_t t;

           sw = mpi_test_bit (scalar, j);
-          mpi_swap_cond (p1_.x, p2_.x, sw);
-          mpi_swap_cond (p1_.z, p2_.z, sw);
-          montgomery_ladder (&p1, &p2, &p1_, &p2_, point->x, ctx);
-          mpi_swap_cond (p1.x, p2.x, sw);
-          mpi_swap_cond (p1.z, p2.z, sw);
+          mpi_swap_cond (q1->x, q2->x, sw);
+          mpi_swap_cond (q1->z, q2->z, sw);
+          montgomery_ladder (prd, sum, q1, q2, point->x, ctx);
+          mpi_swap_cond (prd->x, sum->x, sw);
+          mpi_swap_cond (prd->z, sum->z, sw);
+          t = q1;  q1 = prd;  prd = t;
+          t = q2;  q2 = sum;  sum = t;
         }

-      z1 = mpi_new (0);
       mpi_clear (result->y);
       sw = (nbits & 1);
       mpi_swap_cond (p1.x, p1_.x, sw);
 <at>  <at>  -1300,12 +1310,13  <at>  <at>  _gcry_mpi_ec_mul_point (mpi_point_t result,
         }
       else
         {
+          z1 = mpi_new (0);
           ec_invm (z1, p1.z, ctx);
           ec_mulm (result->x, p1.x, z1, ctx);
           mpi_set_ui (result->z, 1);
+          mpi_free (z1);
         }

-      mpi_free (z1);
       point_free (&p1);
       point_free (&p2);
       point_free (&p1_);

-----------------------------------------------------------------------

Summary of changes:
 cipher/ecc-curves.c |    3 +-
 cipher/ecc.c        |   89 ++++++++++++++++++++++++++++++++++++++++++++++++---
 mpi/ec.c            |   43 ++++++++++++++++---------
 3 files changed, 113 insertions(+), 22 deletions(-)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
NIIBE Yutaka | 19 Nov 08:39 2014

[PATCH] ecc: Improve Montgomery curve implementation

Here is the change for Montgomery curve implementation.  I forgot to
submit this change in August.

Adding test_ecdh_only_keys is needed when we will support encryption
by Curve25519 in future.

The changes in _gcry_mpi_ec_mul_point are to make sure resizing the
MPI representation of points, and code clean up.

OK to commit?

    ecc: Improve Montgomery curve implementation.

    * cipher/ecc-curves.c (_gcry_ecc_fill_in_curve): Support
    MPI_EC_MONTGOMERY.
    * cipher/ecc.c (test_ecdh_only_keys): New.
    (nist_generate_key): Call test_ecdh_only_keys for MPI_EC_MONTGOMERY.
    (check_secret_key): Handle Montgomery curve of x-coordinate only.
    * mpi/ec.c (_gcry_mpi_ec_mul_point): Resize points before the loop.
    Simplify, using pointers of Q1, Q2, PRD, and SUM.
    --

diff --git a/cipher/ecc-curves.c b/cipher/ecc-curves.c
index fd47c1d..9975bb4 100644
--- a/cipher/ecc-curves.c
+++ b/cipher/ecc-curves.c
 <at>  <at>  -530,9 +530,8  <at>  <at>  _gcry_ecc_fill_in_curve (unsigned int nbits, const
char *name,
     {
     case MPI_EC_WEIERSTRASS:
     case MPI_EC_EDWARDS:
-      break;
     case MPI_EC_MONTGOMERY:
-      return GPG_ERR_NOT_SUPPORTED;
+      break;
     default:
       return GPG_ERR_BUG;
     }
diff --git a/cipher/ecc.c b/cipher/ecc.c
index 8bdbd56..2f5e401 100644
--- a/cipher/ecc.c
+++ b/cipher/ecc.c
 <at>  <at>  -81,6 +81,7  <at>  <at>  static void *progress_cb_data;
 
 /* Local prototypes. */
 static void test_keys (ECC_secret_key * sk, unsigned int nbits);
+static void test_ecdh_only_keys (ECC_secret_key * sk, unsigned int nbits);
 static unsigned int ecc_get_nbits (gcry_sexp_t parms);

 <at>  <at>  -209,7 +210,10  <at>  <at>  nist_generate_key (ECC_secret_key *sk,
elliptic_curve_t *E, mpi_ec_t ctx,

   point_free (&Q);
   /* Now we can test our keys (this should never fail!).  */
-  test_keys (sk, nbits - 64);
+  if (sk->E.model != MPI_EC_MONTGOMERY)
+    test_keys (sk, nbits - 64);
+  else
+    test_ecdh_only_keys (sk, nbits - 64);

   return 0;
 }
 <at>  <at>  -266,6 +270,80  <at>  <at>  test_keys (ECC_secret_key *sk, unsigned int nbits)
 }

+static void
+test_ecdh_only_keys (ECC_secret_key *sk, unsigned int nbits)
+{
+  ECC_public_key pk;
+  gcry_mpi_t test;
+  mpi_point_struct R_;
+  gcry_mpi_t x0, x1;
+  mpi_ec_t ec;
+
+  if (DBG_CIPHER)
+    log_debug ("Testing key.\n");
+
+  point_init (&R_);
+
+  pk.E = _gcry_ecc_curve_copy (sk->E);
+  point_init (&pk.Q);
+  point_set (&pk.Q, &sk->Q);
+
+  if (sk->E.dialect == ECC_DIALECT_ED25519)
+    {
+      char *rndbuf;
+
+      test = mpi_new (256);
+      rndbuf = _gcry_random_bytes (32, GCRY_WEAK_RANDOM);
+      rndbuf[0] &= 0x7f;  /* Clear bit 255. */
+      rndbuf[0] |= 0x40;  /* Set bit 254.   */
+      rndbuf[31] &= 0xf8; /* Clear bits 2..0 so that d mod 8 == 0  */
+      _gcry_mpi_set_buffer (test, rndbuf, 32, 0);
+      xfree (rndbuf);
+    }
+  else
+    {
+      test = mpi_new (nbits);
+      _gcry_mpi_randomize (test, nbits, GCRY_WEAK_RANDOM);
+    }
+
+  ec = _gcry_mpi_ec_p_internal_new (pk.E.model, pk.E.dialect, 0,
+                                    pk.E.p, pk.E.a, pk.E.b);
+  x0 = mpi_new (0);
+  x1 = mpi_new (0);
+
+  /* R_ = hkQ  <=>  R_ = hkdG  */
+  _gcry_mpi_ec_mul_point (&R_, test, &pk.Q, ec);
+  if (sk->E.dialect != ECC_DIALECT_ED25519)
+    _gcry_mpi_ec_mul_point (&R_, ec->h, &R_, ec);
+  if (_gcry_mpi_ec_get_affine (x0, NULL, &R_, ec))
+    log_fatal ("ecdh: Failed to get affine coordinates for hkQ\n");
+
+  _gcry_mpi_ec_mul_point (&R_, test, &pk.E.G, ec);
+  _gcry_mpi_ec_mul_point (&R_, sk->d, &R_, ec);
+  /* R_ = hdkG */
+  if (sk->E.dialect != ECC_DIALECT_ED25519)
+    _gcry_mpi_ec_mul_point (&R_, ec->h, &R_, ec);
+
+  if (_gcry_mpi_ec_get_affine (x1, NULL, &R_, ec))
+    log_fatal ("ecdh: Failed to get affine coordinates for hdkG\n");
+
+  if (mpi_cmp (x0, x1))
+    {
+      log_fatal ("ECDH test failed.\n");
+    }
+
+  mpi_free (x0);
+  mpi_free (x1);
+  _gcry_mpi_ec_free (ec);
+
+  point_free (&pk.Q);
+  _gcry_ecc_curve_free (&pk.E);
+
+  point_free (&R_);
+  mpi_free (test);
+}
+
+
 /*
  * To check the validity of the value, recalculate the correspondence
  * between the public value and the secret one.
 <at>  <at>  -281,7 +359,10  <at>  <at>  check_secret_key (ECC_secret_key *sk, mpi_ec_t ec,
int flags)

   point_init (&Q);
   x1 = mpi_new (0);
-  y1 = mpi_new (0);
+  if (ec->model == MPI_EC_MONTGOMERY)
+    y1 = NULL;
+  else
+    y1 = mpi_new (0);

   /* G in E(F_p) */
   if (!_gcry_mpi_ec_curve_point (&sk->E.G, ec))
 <at>  <at>  -338,7 +419,7  <at>  <at>  check_secret_key (ECC_secret_key *sk, mpi_ec_t ec,
int flags)
   else if (!mpi_cmp_ui (sk->Q.z, 1))
     {
       /* Fast path if Q is already in affine coordinates.  */
-      if (mpi_cmp (x1, sk->Q.x) || mpi_cmp (y1, sk->Q.y))
+      if (mpi_cmp (x1, sk->Q.x) || (!y1 && mpi_cmp (y1, sk->Q.y)))
         {
           if (DBG_CIPHER)
             log_debug
 <at>  <at>  -1581,7 +1662,7  <at>  <at>  compute_keygrip (gcry_md_hd_t md, gcry_sexp_t
keyparms)
       char buf[30];

       if (idx == 5)
-	continue;		/* Skip cofactor. */
+        continue;               /* Skip cofactor. */

       if (mpi_is_opaque (values[idx]))
         {
diff --git a/mpi/ec.c b/mpi/ec.c
index 80f3b22..0b7c7a7 100644
--- a/mpi/ec.c
+++ b/mpi/ec.c
 <at>  <at>  -1251,7 +1251,9  <at>  <at>  _gcry_mpi_ec_mul_point (mpi_point_t result,
       unsigned int nbits;
       int j;
       mpi_point_struct p1_, p2_;
+      mpi_point_t q1, q2, prd, sum;
       unsigned long sw;
+      size_t nlimbs;

       /* Compute scalar point multiplication with Montgomery Ladder.
          Note that we don't use Y-coordinate in the points at all.
 <at>  <at>  -1267,27 +1269,35  <at>  <at>  _gcry_mpi_ec_mul_point (mpi_point_t result,
       p2.x  = mpi_copy (point->x);
       mpi_set_ui (p2.z, 1);

+      nlimbs = 2*(nbits+BITS_PER_MPI_LIMB-1)/BITS_PER_MPI_LIMB+1;
+      mpi_resize (p1.x, nlimbs);
+      mpi_resize (p1.z, nlimbs);
+      mpi_resize (p2.x, nlimbs);
+      mpi_resize (p2.z, nlimbs);
+      mpi_resize (p1_.x, nlimbs);
+      mpi_resize (p1_.z, nlimbs);
+      mpi_resize (p2_.x, nlimbs);
+      mpi_resize (p2_.z, nlimbs);
+
+      q1 = &p1;
+      q2 = &p2;
+      prd = &p1_;
+      sum = &p2_;
+
       for (j=nbits-1; j >= 0; j--)
         {
-          sw = mpi_test_bit (scalar, j);
-          mpi_swap_cond (p1.x, p2.x, sw);
-          mpi_swap_cond (p1.z, p2.z, sw);
-          montgomery_ladder (&p1_, &p2_, &p1, &p2, point->x, ctx);
-          mpi_swap_cond (p1_.x, p2_.x, sw);
-          mpi_swap_cond (p1_.z, p2_.z, sw);
-
-          if (--j < 0)
-            break;
+          mpi_point_t t;

           sw = mpi_test_bit (scalar, j);
-          mpi_swap_cond (p1_.x, p2_.x, sw);
-          mpi_swap_cond (p1_.z, p2_.z, sw);
-          montgomery_ladder (&p1, &p2, &p1_, &p2_, point->x, ctx);
-          mpi_swap_cond (p1.x, p2.x, sw);
-          mpi_swap_cond (p1.z, p2.z, sw);
+          mpi_swap_cond (q1->x, q2->x, sw);
+          mpi_swap_cond (q1->z, q2->z, sw);
+          montgomery_ladder (prd, sum, q1, q2, point->x, ctx);
+          mpi_swap_cond (prd->x, sum->x, sw);
+          mpi_swap_cond (prd->z, sum->z, sw);
+          t = q1;  q1 = prd;  prd = t;
+          t = q2;  q2 = sum;  sum = t;
         }

-      z1 = mpi_new (0);
       mpi_clear (result->y);
       sw = (nbits & 1);
       mpi_swap_cond (p1.x, p1_.x, sw);
 <at>  <at>  -1300,12 +1310,13  <at>  <at>  _gcry_mpi_ec_mul_point (mpi_point_t result,
         }
       else
         {
+          z1 = mpi_new (0);
           ec_invm (z1, p1.z, ctx);
           ec_mulm (result->x, p1.x, z1, ctx);
           mpi_set_ui (result->z, 1);
+          mpi_free (z1);
         }

-      mpi_free (z1);
       point_free (&p1);
       point_free (&p2);
       point_free (&p1_);
--

-- 
by Jussi Kivilinna | 5 Nov 17:13 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-125-g95eef21

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  95eef21583d8e998efc48f22898c1ae31b77cb48 (commit)
       via  0b520128551054d83fb0bb2db8873394f38de498 (commit)
       via  c584f44543883346d5a565581ff99a0afce9c5e1 (commit)
      from  669a83ba86c38b271d85ed4bf1cabc7cc8160583 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 95eef21583d8e998efc48f22898c1ae31b77cb48
Author: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
Date:   Sun Nov 2 17:45:35 2014 +0200

    Disable NEON for CPUs that are known to have broken NEON implementation

    * src/hwf-arm.c (detect_arm_proc_cpuinfo): Add parsing for CPU version
    information and check if CPU is known to have broken NEON
    implementation.
    (_gcry_hwf_detect_arm): Filter out broken HW features.
    --

    Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>

diff --git a/src/hwf-arm.c b/src/hwf-arm.c
index dbbb607..3dc050e 100644
--- a/src/hwf-arm.c
+++ b/src/hwf-arm.c
 <at>  <at>  -98,17 +98,32  <at>  <at>  detect_arm_at_hwcap(void)
 #define HAS_PROC_CPUINFO 1

 static unsigned int
-detect_arm_proc_cpuinfo(void)
+detect_arm_proc_cpuinfo(unsigned int *broken_hwfs)
 {
   char buf[1024]; /* large enough */
   char *str_features, *str_neon;
+  int cpu_implementer, cpu_arch, cpu_variant, cpu_part, cpu_revision;
   FILE *f;
   int readlen, i;
   static int cpuinfo_initialized = 0;
   static unsigned int stored_cpuinfo_features;
+  static unsigned int stored_broken_hwfs;
+  struct {
+    const char *name;
+    int *value;
+  } cpu_entries[5] = {
+    { "CPU implementer", &cpu_implementer },
+    { "CPU architecture", &cpu_arch },
+    { "CPU variant", &cpu_variant },
+    { "CPU part", &cpu_part },
+    { "CPU revision", &cpu_revision },
+  };

   if (cpuinfo_initialized)
-    return stored_cpuinfo_features;
+    {
+      *broken_hwfs |= stored_broken_hwfs;
+      return stored_cpuinfo_features;
+    }

   f = fopen("/proc/cpuinfo", "r");
   if (!f)
 <at>  <at>  -124,12 +139,32  <at>  <at>  detect_arm_proc_cpuinfo(void)

   cpuinfo_initialized = 1;
   stored_cpuinfo_features = 0;
+  stored_broken_hwfs = 0;

   /* Find features line. */
   str_features = strstr(buf, "Features");
   if (!str_features)
     return stored_cpuinfo_features;

+  /* Find CPU version information. */
+  for (i = 0; i < DIM(cpu_entries); i++)
+    {
+      char *str;
+
+      *cpu_entries[i].value = -1;
+
+      str = strstr(buf, cpu_entries[i].name);
+      if (!str)
+        continue;
+
+      str = strstr(str, ": ");
+      if (!str)
+        continue;
+
+      str += 2;
+      *cpu_entries[i].value = strtoul(str, NULL, 0);
+    }
+
   /* Lines to strings. */
   for (i = 0; i < sizeof(buf); i++)
     if (buf[i] == '\n')
 <at>  <at>  -140,6 +175,19  <at>  <at>  detect_arm_proc_cpuinfo(void)
   if (str_neon && (str_neon[5] == ' ' || str_neon[5] == '\0'))
     stored_cpuinfo_features |= HWF_ARM_NEON;

+  /* Check for CPUs with broken NEON implementation. See
+   * https://code.google.com/p/chromium/issues/detail?id=341598
+   */
+  if (cpu_implementer == 0x51
+      && cpu_arch == 7
+      && cpu_variant == 1
+      && cpu_part == 0x4d
+      && cpu_revision == 0)
+    {
+      stored_broken_hwfs = HWF_ARM_NEON;
+    }
+
+  *broken_hwfs |= stored_broken_hwfs;
   return stored_cpuinfo_features;
 }

 <at>  <at>  -149,18 +197,21  <at>  <at>  unsigned int
 _gcry_hwf_detect_arm (void)
 {
   unsigned int ret = 0;
+  unsigned int broken_hwfs = 0;

 #if defined (HAS_SYS_AT_HWCAP)
   ret |= detect_arm_at_hwcap ();
 #endif

 #if defined (HAS_PROC_CPUINFO)
-  ret |= detect_arm_proc_cpuinfo ();
+  ret |= detect_arm_proc_cpuinfo (&broken_hwfs);
 #endif

 #if defined(__ARM_NEON__) && defined(ENABLE_NEON_SUPPORT)
   ret |= HWF_ARM_NEON;
 #endif

+  ret &= ~broken_hwfs;
+
   return ret;
 }

commit 0b520128551054d83fb0bb2db8873394f38de498
Author: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
Date:   Sun Nov 2 16:01:11 2014 +0200

    Add ARM/NEON implementation of Poly1305

    * cipher/Makefile.am: Add 'poly1305-armv7-neon.S'.
    * cipher/poly1305-armv7-neon.S: New.
    * cipher/poly1305-internal.h (POLY1305_USE_NEON)
    (POLY1305_NEON_BLOCKSIZE, POLY1305_NEON_STATESIZE)
    (POLY1305_NEON_ALIGNMENT): New.
    * cipher/poly1305.c [POLY1305_USE_NEON]
    (_gcry_poly1305_armv7_neon_init_ext)
    (_gcry_poly1305_armv7_neon_finish_ext)
    (_gcry_poly1305_armv7_neon_blocks, poly1305_armv7_neon_ops): New.
    (_gcry_poly1305_init) [POLY1305_USE_NEON]: Select NEON implementation
    if HWF_ARM_NEON set.
    * configure.ac [neonsupport=yes]: Add 'poly1305-armv7-neon.lo'.
    --

    Add Andrew Moon's public domain NEON implementation of Poly1305. Original
    source is available at: https://github.com/floodyberry/poly1305-opt

    Benchmark on Cortex-A8 (--cpu-mhz 1008):

    Old:
                        |  nanosecs/byte   mebibytes/sec   cycles/byte
     POLY1305           |     12.34 ns/B     77.27 MiB/s     12.44 c/B

    New:
                        |  nanosecs/byte   mebibytes/sec   cycles/byte
     POLY1305           |      2.12 ns/B     450.7 MiB/s      2.13 c/B

    Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>

diff --git a/cipher/Makefile.am b/cipher/Makefile.am
index 09ccaf9..22018b3 100644
--- a/cipher/Makefile.am
+++ b/cipher/Makefile.am
 <at>  <at>  -73,7 +73,7  <at>  <at>  gost28147.c gost.h \
 gostr3411-94.c \
 md4.c \
 md5.c \
-poly1305-sse2-amd64.S poly1305-avx2-amd64.S \
+poly1305-sse2-amd64.S poly1305-avx2-amd64.S poly1305-armv7-neon.S \
 rijndael.c rijndael-tables.h rijndael-amd64.S rijndael-arm.S \
 rmd160.c \
 rsa.c \
diff --git a/cipher/poly1305-armv7-neon.S b/cipher/poly1305-armv7-neon.S
new file mode 100644
index 0000000..1134e85
--- /dev/null
+++ b/cipher/poly1305-armv7-neon.S
 <at>  <at>  -0,0 +1,705  <at>  <at> 
+/* poly1305-armv7-neon.S  -  ARMv7/NEON implementation of Poly1305
+ *
+ * Copyright (C) 2014 Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
+ *
+ * This file is part of Libgcrypt.
+ *
+ * Libgcrypt is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * Libgcrypt is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Based on public domain implementation by Andrew Moon at
+ *  https://github.com/floodyberry/poly1305-opt
+ */
+
+#include <config.h>
+
+#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \
+    defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
+    defined(HAVE_GCC_INLINE_ASM_NEON)
+
+.syntax unified
+.fpu neon
+.arm
+
+.text
+
+.p2align 2
+.Lpoly1305_init_constants_neon:
+.long 0x3ffff03
+.long 0x3ffc0ff
+.long 0x3f03fff
+.long 0x00fffff
+
+.globl _gcry_poly1305_armv7_neon_init_ext
+.type  _gcry_poly1305_armv7_neon_init_ext,%function;
+_gcry_poly1305_armv7_neon_init_ext:
+.Lpoly1305_init_ext_neon_local:
+	stmfd sp!, {r4-r11, lr}
+	sub sp, sp, #32
+	mov r14, r2
+	and r2, r2, r2
+	moveq r14, #-1
+	ldmia r1!, {r2-r5}
+	ldr r7, =.Lpoly1305_init_constants_neon
+	mov r6, r2
+	mov r8, r2, lsr #26
+	mov r9, r3, lsr #20
+	mov r10, r4, lsr #14
+	mov r11, r5, lsr #8
+	orr r8, r8, r3, lsl #6
+	orr r9, r9, r4, lsl #12
+	orr r10, r10, r5, lsl #18
+	ldmia r7, {r2-r5}
+	and r2, r2, r8
+	and r3, r3, r9
+	and r4, r4, r10
+	and r5, r5, r11
+	and r6, r6, 0x3ffffff
+	stmia r0!, {r2-r6}
+	eor r8, r8, r8
+	str r8, [sp, #24]
+.Lpoly1305_init_ext_neon_squareloop:
+	ldr r8, [sp, #24]
+	mov r12, #16
+	cmp r8, #2
+	beq .Lpoly1305_init_ext_neon_donesquaring
+	cmp r8, #1
+	moveq r12, #64
+	cmp r14, r12
+	bls .Lpoly1305_init_ext_neon_donesquaring
+	add r8, #1
+	str r8, [sp, #24]
+	mov r6, r6, lsl #1
+	mov r2, r2, lsl #1
+	umull r7, r8, r3, r3
+	umull r9, r10, r6, r4
+	umlal r7, r8, r6, r5
+	umlal r9, r10, r2, r3
+	add r11, r5, r5, lsl #2
+	umlal r7, r8, r2, r4
+	umlal r9, r10, r5, r11
+	str r7, [sp, #16]
+	str r8, [sp, #20]
+	mov r2, r2, lsr #1
+	mov r5, r5, lsl #1
+	str r9, [sp, #8]
+	str r10, [sp, #12]
+	umull r7, r8, r2, r2
+	umull r9, r10, r6, r2
+	add r11, r3, r3, lsl #2
+	add r12, r4, r4, lsl #2
+	umlal r7, r8, r6, r3
+	umlal r9, r10, r5, r11
+	umlal r7, r8, r5, r12
+	umlal r9, r10, r4, r12
+	mov r6, r6, lsr #1
+	mov r3, r3, lsl #1
+	add r11, r2, r2, lsl #2
+	str r7, [sp, #0]
+	str r8, [sp, #4]
+	umull r7, r8, r6, r6
+	umlal r7, r8, r3, r12
+	umlal r7, r8, r5, r11
+	and r6, r7, 0x3ffffff
+	mov r11, r7, lsr #26
+	orr r11, r11, r8, lsl #6
+	ldr r7, [sp, #0]
+	ldr r8, [sp, #4]
+	adds r9, r9, r11
+	adc r10, r10, #0
+	and r2, r9, 0x3ffffff
+	mov r11, r9, lsr #26
+	orr r11, r11, r10, lsl #6
+	ldr r9, [sp, #8]
+	ldr r10, [sp, #12]
+	adds r7, r7, r11
+	adc r8, r8, #0
+	and r3, r7, 0x3ffffff
+	mov r11, r7, lsr #26
+	orr r11, r11, r8, lsl #6
+	ldr r7, [sp, #16]
+	ldr r8, [sp, #20]
+	adds r9, r9, r11
+	adc r10, r10, #0
+	and r4, r9, 0x3ffffff
+	mov r11, r9, lsr #26
+	orr r11, r11, r10, lsl #6
+	adds r7, r7, r11
+	adc r8, r8, #0
+	and r5, r7, 0x3ffffff
+	mov r11, r7, lsr #26
+	orr r11, r11, r8, lsl #6
+	add r11, r11, r11, lsl #2
+	add r6, r6, r11
+	mov r11, r6, lsr #26
+	and r6, r6, 0x3ffffff
+	add r2, r2, r11
+	stmia r0!, {r2-r6}
+	b .Lpoly1305_init_ext_neon_squareloop
+.Lpoly1305_init_ext_neon_donesquaring:
+	mov r2, #2
+	ldr r14, [sp, #24]
+	sub r14, r2, r14
+	mov r3, r14, lsl #4
+	add r3, r3, r14, lsl #2
+	add r0, r0, r3
+	eor r2, r2, r2
+	eor r3, r3, r3
+	eor r4, r4, r4
+	eor r5, r5, r5
+	eor r6, r6, r6
+	stmia r0!, {r2-r6}
+	stmia r0!, {r2-r6}
+	ldmia r1!, {r2-r5}
+	stmia r0, {r2-r6}
+	add sp, sp, #32
+	ldmfd sp!, {r4-r11, lr}
+	mov r0, #(9*4+32)
+	bx lr
+.ltorg
+.size _gcry_poly1305_armv7_neon_init_ext,.-_gcry_poly1305_armv7_neon_init_ext;
+
+.globl _gcry_poly1305_armv7_neon_blocks
+.type  _gcry_poly1305_armv7_neon_blocks,%function;
+_gcry_poly1305_armv7_neon_blocks:
+.Lpoly1305_blocks_neon_local:
+	vmov.i32 q0, #0xffffffff
+	vmov.i32 d4, #1
+	vsubw.u32 q0, q0, d4
+	vstmdb sp!, {q4,q5,q6,q7}
+	stmfd sp!, {r4-r11, lr}
+	mov r8, sp
+	and sp, sp, #~63
+	sub sp, sp, #192
+	str r0, [sp, #108]
+	str r1, [sp, #112]
+	str r2, [sp, #116]
+	str r8, [sp, #120]
+	mov r3, r0
+	mov r0, r1
+	mov r1, r2
+	mov r2, r3
+	ldr r8, [r2, #116]
+	veor d15, d15, d15
+	vorr.i32 d15, #(1 << 24)
+	tst r8, #2
+	beq .Lpoly1305_blocks_neon_skip_shift8
+	vshr.u64 d15, #32
+.Lpoly1305_blocks_neon_skip_shift8:
+	tst r8, #4
+	beq .Lpoly1305_blocks_neon_skip_shift16
+	veor d15, d15, d15
+.Lpoly1305_blocks_neon_skip_shift16:
+	vst1.64 d15, [sp, :64]
+	tst r8, #1
+	bne .Lpoly1305_blocks_neon_started
+	vld1.64 {q0-q1}, [r0]!
+	vswp d1, d2
+	vmovn.i64 d21, q0
+	vshrn.i64 d22, q0, #26
+	vshrn.u64 d24, q1, #14
+	vext.8 d0, d0, d2, #4
+	vext.8 d1, d1, d3, #4
+	vshr.u64 q1, q1, #32
+	vshrn.i64 d23, q0, #20
+	vshrn.u64 d25, q1, #8
+	vand.i32 d21, #0x03ffffff
+	vand.i32 q11, #0x03ffffff
+	vand.i32 q12, #0x03ffffff
+	orr r8, r8, #1
+	sub r1, r1, #32
+	str r8, [r2, #116]
+	vorr d25, d25, d15
+	b .Lpoly1305_blocks_neon_setupr20
+.Lpoly1305_blocks_neon_started:
+	add r9, r2, #60
+	vldm r9, {d21-d25}
+.Lpoly1305_blocks_neon_setupr20:
+	vmov.i32 d0, #5
+	tst r8, #(8|16)
+	beq .Lpoly1305_blocks_neon_setupr20_simple
+	tst r8, #(8)
+	beq .Lpoly1305_blocks_neon_setupr20_r_1
+	mov r9, r2
+	add r10, r2, #20
+	vld1.64 {q9}, [r9]!
+	vld1.64 {q8}, [r10]!
+	vld1.64 {d2}, [r9]
+	vld1.64 {d20}, [r10]
+	b .Lpoly1305_blocks_neon_setupr20_hard
+.Lpoly1305_blocks_neon_setupr20_r_1:
+	mov r9, r2
+	vmov.i32 d2, #1
+	vld1.64 {q8}, [r9]!
+	veor q9, q9, q9
+	vshr.u64 d2, d2, #32
+	vld1.64 {d20}, [r9]
+.Lpoly1305_blocks_neon_setupr20_hard:
+	vzip.i32 q8, q9
+	vzip.i32 d20, d2
+	b .Lpoly1305_blocks_neon_setups20
+.Lpoly1305_blocks_neon_setupr20_simple:
+	add r9, r2, #20
+	vld1.64 {d2-d4}, [r9]
+	vdup.32 d16, d2[0]
+	vdup.32 d17, d2[1]
+	vdup.32 d18, d3[0]
+	vdup.32 d19, d3[1]
+	vdup.32 d20, d4[0]
+.Lpoly1305_blocks_neon_setups20:
+	vmul.i32 q13, q8, d0[0]
+	vmov.i64 q15, 0x00000000ffffffff
+	vmul.i32 q14, q9, d0[0]
+	vshr.u64 q15, q15, #6
+	cmp r1, #64
+	blo .Lpoly1305_blocks_neon_try32
+	add r9, sp, #16
+	add r10, r2, #40
+	add r11, sp, #64
+	str r1, [sp, #116]
+	vld1.64 {d10-d12}, [r10]
+	vmov d14, d12
+	vmul.i32 q6, q5, d0[0]
+.Lpoly1305_blocks_neon_mainloop:
+	ldmia r0!, {r2-r5}
+	vmull.u32 q0, d25, d12[0]
+	mov r7, r2, lsr #26
+	vmlal.u32 q0, d24, d12[1]
+	mov r8, r3, lsr #20
+	ldr r6, [sp, #0]
+	vmlal.u32 q0, d23, d13[0]
+	mov r9, r4, lsr #14
+	vmlal.u32 q0, d22, d13[1]
+	orr r6, r6, r5, lsr #8
+	vmlal.u32 q0, d21, d14[0]
+	orr r3, r7, r3, lsl #6
+	vmull.u32 q1, d25, d12[1]
+	orr r4, r8, r4, lsl #12
+	orr r5, r9, r5, lsl #18
+	vmlal.u32 q1, d24, d13[0]
+	ldmia r0!, {r7-r10}
+	vmlal.u32 q1, d23, d13[1]
+	mov r1, r7, lsr #26
+	vmlal.u32 q1, d22, d14[0]
+	ldr r11, [sp, #4]
+	mov r12, r8, lsr #20
+	vmlal.u32 q1, d21, d10[0]
+	mov r14, r9, lsr #14
+	vmull.u32 q2, d25, d13[0]
+	orr r11, r11, r10, lsr #8
+	orr r8, r1, r8, lsl #6
+	vmlal.u32 q2, d24, d13[1]
+	orr r9, r12, r9, lsl #12
+	vmlal.u32 q2, d23, d14[0]
+	orr r10, r14, r10, lsl #18
+	vmlal.u32 q2, d22, d10[0]
+	mov r12, r3
+	and r2, r2, #0x3ffffff
+	vmlal.u32 q2, d21, d10[1]
+	mov r14, r5
+	vmull.u32 q3, d25, d13[1]
+	and r3, r7, #0x3ffffff
+	vmlal.u32 q3, d24, d14[0]
+	and r5, r8, #0x3ffffff
+	vmlal.u32 q3, d23, d10[0]
+	and r7, r9, #0x3ffffff
+	vmlal.u32 q3, d22, d10[1]
+	and r8, r14, #0x3ffffff
+	vmlal.u32 q3, d21, d11[0]
+	and r9, r10, #0x3ffffff
+	add r14, sp, #128
+	vmull.u32 q4, d25, d14[0]
+	mov r10, r6
+	vmlal.u32 q4, d24, d10[0]
+	and r6, r4, #0x3ffffff
+	vmlal.u32 q4, d23, d10[1]
+	and r4, r12, #0x3ffffff
+	vmlal.u32 q4, d22, d11[0]
+	stm r14, {r2-r11}
+	vmlal.u32 q4, d21, d11[1]
+	vld1.64 {d21-d24}, [r14, :256]!
+	vld1.64 {d25}, [r14, :64]
+	ldmia r0!, {r2-r5}
+	vmlal.u32 q0, d25, d26
+	mov r7, r2, lsr #26
+	vmlal.u32 q0, d24, d27
+	ldr r6, [sp, #0]
+	mov r8, r3, lsr #20
+	vmlal.u32 q0, d23, d28
+	mov r9, r4, lsr #14
+	vmlal.u32 q0, d22, d29
+	orr r6, r6, r5, lsr #8
+	vmlal.u32 q0, d21, d20
+	orr r3, r7, r3, lsl #6
+	vmlal.u32 q1, d25, d27
+	orr r4, r8, r4, lsl #12
+	orr r5, r9, r5, lsl #18
+	vmlal.u32 q1, d24, d28
+	ldmia r0!, {r7-r10}
+	vmlal.u32 q1, d23, d29
+	mov r1, r7, lsr #26
+	vmlal.u32 q1, d22, d20
+	ldr r11, [sp, #4]
+	mov r12, r8, lsr #20
+	vmlal.u32 q1, d21, d16
+	mov r14, r9, lsr #14
+	vmlal.u32 q2, d25, d28
+	orr r11, r11, r10, lsr #8
+	orr r8, r1, r8, lsl #6
+	orr r9, r12, r9, lsl #12
+	vmlal.u32 q2, d24, d29
+	orr r10, r14, r10, lsl #18
+	and r2, r2, #0x3ffffff
+	mov r12, r3
+	vmlal.u32 q2, d23, d20
+	mov r14, r5
+	vmlal.u32 q2, d22, d16
+	and r3, r7, #0x3ffffff
+	vmlal.u32 q2, d21, d17
+	and r5, r8, #0x3ffffff
+	vmlal.u32 q3, d25, d29
+	and r7, r9, #0x3ffffff
+	vmlal.u32 q3, d24, d20
+	and r8, r14, #0x3ffffff
+	vmlal.u32 q3, d23, d16
+	and r9, r10, #0x3ffffff
+	vmlal.u32 q3, d22, d17
+	add r14, sp, #128
+	vmlal.u32 q3, d21, d18
+	mov r10, r6
+	vmlal.u32 q4, d25, d20
+	vmlal.u32 q4, d24, d16
+	and r6, r4, #0x3ffffff
+	vmlal.u32 q4, d23, d17
+	and r4, r12, #0x3ffffff
+	vmlal.u32 q4, d22, d18
+	stm r14, {r2-r11}
+	vmlal.u32 q4, d21, d19
+	vld1.64 {d21-d24}, [r14, :256]!
+	vld1.64 {d25}, [r14, :64]
+	vaddw.u32 q0, q0, d21
+	vaddw.u32 q1, q1, d22
+	vaddw.u32 q2, q2, d23
+	vaddw.u32 q3, q3, d24
+	vaddw.u32 q4, q4, d25
+	vshr.u64 q11, q0, #26
+	vand q0, q0, q15
+	vadd.i64 q1, q1, q11
+	vshr.u64 q12, q3, #26
+	vand q3, q3, q15
+	vadd.i64 q4, q4, q12
+	vshr.u64 q11, q1, #26
+	vand q1, q1, q15
+	vadd.i64 q2, q2, q11
+	vshr.u64 q12, q4, #26
+	vand q4, q4, q15
+	vadd.i64 q0, q0, q12
+	vshl.i64 q12, q12, #2
+	ldr r1, [sp, #116]
+	vadd.i64 q0, q0, q12
+	vshr.u64 q11, q2, #26
+	vand q2, q2, q15
+	vadd.i64 q3, q3, q11
+	sub r1, #64
+	vshr.u64 q12, q0, #26
+	vand q0, q0, q15
+	vadd.i64 q1, q1, q12
+	cmp r1, #64
+	vshr.u64 q11, q3, #26
+	vand q3, q3, q15
+	vadd.i64 q4, q4, q11
+	vmovn.i64 d21, q0
+	str r1, [sp, #116]
+	vmovn.i64 d22, q1
+	vmovn.i64 d23, q2
+	vmovn.i64 d24, q3
+	vmovn.i64 d25, q4
+	bhs .Lpoly1305_blocks_neon_mainloop
+.Lpoly1305_blocks_neon_try32:
+	cmp r1, #32
+	blo .Lpoly1305_blocks_neon_done
+	tst r0, r0
+	bne .Lpoly1305_blocks_loadm32
+	veor q0, q0, q0
+	veor q1, q1, q1
+	veor q2, q2, q2
+	veor q3, q3, q3
+	veor q4, q4, q4
+	b .Lpoly1305_blocks_continue32
+.Lpoly1305_blocks_loadm32:
+	vld1.64 {q0-q1}, [r0]!
+	veor q4, q4, q4
+	vswp d1, d2
+	veor q3, q3, q3
+	vtrn.32 q0, q4
+	vtrn.32 q1, q3
+	vshl.i64 q2, q1, #12
+	vshl.i64 q3, q3, #18
+	vshl.i64 q1, q4, #6
+	vmovl.u32 q4, d15
+.Lpoly1305_blocks_continue32:
+	vmlal.u32 q0, d25, d26
+	vmlal.u32 q0, d24, d27
+	vmlal.u32 q0, d23, d28
+	vmlal.u32 q0, d22, d29
+	vmlal.u32 q0, d21, d20
+	vmlal.u32 q1, d25, d27
+	vmlal.u32 q1, d24, d28
+	vmlal.u32 q1, d23, d29
+	vmlal.u32 q1, d22, d20
+	vmlal.u32 q1, d21, d16
+	vmlal.u32 q2, d25, d28
+	vmlal.u32 q2, d24, d29
+	vmlal.u32 q2, d23, d20
+	vmlal.u32 q2, d22, d16
+	vmlal.u32 q2, d21, d17
+	vmlal.u32 q3, d25, d29
+	vmlal.u32 q3, d24, d20
+	vmlal.u32 q3, d23, d16
+	vmlal.u32 q3, d22, d17
+	vmlal.u32 q3, d21, d18
+	vmlal.u32 q4, d25, d20
+	vmlal.u32 q4, d24, d16
+	vmlal.u32 q4, d23, d17
+	vmlal.u32 q4, d22, d18
+	vmlal.u32 q4, d21, d19
+	vshr.u64 q11, q0, #26
+	vand q0, q0, q15
+	vadd.i64 q1, q1, q11
+	vshr.u64 q12, q3, #26
+	vand q3, q3, q15
+	vadd.i64 q4, q4, q12
+	vshr.u64 q11, q1, #26
+	vand q1, q1, q15
+	vadd.i64 q2, q2, q11
+	vshr.u64 q12, q4, #26
+	vand q4, q4, q15
+	vadd.i64 q0, q0, q12
+	vshl.i64 q12, q12, #2
+	vadd.i64 q0, q0, q12
+	vshr.u64 q11, q2, #26
+	vand q2, q2, q15
+	vadd.i64 q3, q3, q11
+	vshr.u64 q12, q0, #26
+	vand q0, q0, q15
+	vadd.i64 q1, q1, q12
+	vshr.u64 q11, q3, #26
+	vand q3, q3, q15
+	vadd.i64 q4, q4, q11
+	vmovn.i64 d21, q0
+	vmovn.i64 d22, q1
+	vmovn.i64 d23, q2
+	vmovn.i64 d24, q3
+	vmovn.i64 d25, q4
+.Lpoly1305_blocks_neon_done:
+	tst r0, r0
+	beq .Lpoly1305_blocks_neon_final
+	ldr r2, [sp, #108]
+	add r2, r2, #60
+	vst1.64 {d21}, [r2]!
+	vst1.64 {d22-d25}, [r2]
+	b .Lpoly1305_blocks_neon_leave
+.Lpoly1305_blocks_neon_final:
+	vadd.u32 d10, d0, d1
+	vadd.u32 d13, d2, d3
+	vadd.u32 d11, d4, d5
+	ldr r5, [sp, #108]
+	vadd.u32 d14, d6, d7
+	vadd.u32 d12, d8, d9
+	vtrn.32 d10, d13
+	vtrn.32 d11, d14
+	vst1.64 {d10-d12}, [sp]
+	ldm sp, {r0-r4}
+	mov r12, r0, lsr #26
+	and r0, r0, #0x3ffffff
+	add r1, r1, r12
+	mov r12, r1, lsr #26
+	and r1, r1, #0x3ffffff
+	add r2, r2, r12
+	mov r12, r2, lsr #26
+	and r2, r2, #0x3ffffff
+	add r3, r3, r12
+	mov r12, r3, lsr #26
+	and r3, r3, #0x3ffffff
+	add r4, r4, r12
+	mov r12, r4, lsr #26
+	and r4, r4, #0x3ffffff
+	add r12, r12, r12, lsl #2
+	add r0, r0, r12
+	mov r12, r0, lsr #26
+	and r0, r0, #0x3ffffff
+	add r1, r1, r12
+	mov r12, r1, lsr #26
+	and r1, r1, #0x3ffffff
+	add r2, r2, r12
+	mov r12, r2, lsr #26
+	and r2, r2, #0x3ffffff
+	add r3, r3, r12
+	mov r12, r3, lsr #26
+	and r3, r3, #0x3ffffff
+	add r4, r4, r12
+	mov r12, r4, lsr #26
+	and r4, r4, #0x3ffffff
+	add r12, r12, r12, lsl #2
+	add r0, r0, r12
+	mov r12, r0, lsr #26
+	and r0, r0, #0x3ffffff
+	add r1, r1, r12
+	add r6, r0, #5
+	mov r12, r6, lsr #26
+	and r6, r6, #0x3ffffff
+	add r7, r1, r12
+	mov r12, r7, lsr #26
+	and r7, r7, #0x3ffffff
+	add r10, r2, r12
+	mov r12, r10, lsr #26
+	and r10, r10, #0x3ffffff
+	add r11, r3, r12
+	mov r12, #-(1 << 26)
+	add r12, r12, r11, lsr #26
+	and r11, r11, #0x3ffffff
+	add r14, r4, r12
+	mov r12, r14, lsr #31
+	sub r12, #1
+	and r6, r6, r12
+	and r7, r7, r12
+	and r10, r10, r12
+	and r11, r11, r12
+	and r14, r14, r12
+	mvn r12, r12
+	and r0, r0, r12
+	and r1, r1, r12
+	and r2, r2, r12
+	and r3, r3, r12
+	and r4, r4, r12
+	orr r0, r0, r6
+	orr r1, r1, r7
+	orr r2, r2, r10
+	orr r3, r3, r11
+	orr r4, r4, r14
+	orr r0, r0, r1, lsl #26
+	lsr r1, r1, #6
+	orr r1, r1, r2, lsl #20
+	lsr r2, r2, #12
+	orr r2, r2, r3, lsl #14
+	lsr r3, r3, #18
+	orr r3, r3, r4, lsl #8
+	add r5, r5, #60
+	stm r5, {r0-r3}
+.Lpoly1305_blocks_neon_leave:
+	sub r0, sp, #8
+	ldr sp, [sp, #120]
+	ldmfd sp!, {r4-r11, lr}
+	vldm sp!, {q4-q7}
+	sub r0, sp, r0
+	bx lr
+.size _gcry_poly1305_armv7_neon_blocks,.-_gcry_poly1305_armv7_neon_blocks;
+
+.globl _gcry_poly1305_armv7_neon_finish_ext
+.type  _gcry_poly1305_armv7_neon_finish_ext,%function;
+_gcry_poly1305_armv7_neon_finish_ext:
+.Lpoly1305_finish_ext_neon_local:
+	stmfd sp!, {r4-r11, lr}
+	sub sp, sp, #32
+	mov r5, r0
+	mov r6, r1
+	mov r7, r2
+	mov r8, r3
+	ands r7, r7, r7
+	beq .Lpoly1305_finish_ext_neon_noremaining
+	mov r9, sp
+	veor q0, q0, q0
+	veor q1, q1, q1
+	vst1.64 {q0-q1}, [sp]
+	tst r7, #16
+	beq .Lpoly1305_finish_ext_neon_skip16
+	vld1.u64 {q0}, [r1]!
+	vst1.64 {q0}, [r9]!
+.Lpoly1305_finish_ext_neon_skip16:
+	tst r7, #8
+	beq .Lpoly1305_finish_ext_neon_skip8
+	ldmia r1!, {r10-r11}
+	stmia r9!, {r10-r11}
+.Lpoly1305_finish_ext_neon_skip8:
+	tst r7, #4
+	beq .Lpoly1305_finish_ext_neon_skip4
+	ldr r10, [r1], #4
+	str r10, [r9], #4
+.Lpoly1305_finish_ext_neon_skip4:
+	tst r7, #2
+	beq .Lpoly1305_finish_ext_neon_skip2
+	ldrh r10, [r1], #2
+	strh r10, [r9], #2
+.Lpoly1305_finish_ext_neon_skip2:
+	tst r7, #1
+	beq .Lpoly1305_finish_ext_neon_skip1
+	ldrb r10, [r1], #1
+	strb r10, [r9], #1
+.Lpoly1305_finish_ext_neon_skip1:
+	cmp r7, #16
+	beq .Lpoly1305_finish_ext_neon_skipfinalbit
+	mov r10, #1
+	strb r10, [r9]
+.Lpoly1305_finish_ext_neon_skipfinalbit:
+	ldr r10, [r5, #116]
+	orrhs r10, #2
+	orrlo r10, #4
+	str r10, [r5, #116]
+	mov r0, r5
+	mov r1, sp
+	mov r2, #32
+	bl .Lpoly1305_blocks_neon_local
+.Lpoly1305_finish_ext_neon_noremaining:
+	ldr r10, [r5, #116]
+	tst r10, #1
+	beq .Lpoly1305_finish_ext_neon_notstarted
+	cmp r7, #0
+	beq .Lpoly1305_finish_ext_neon_user2r
+	cmp r7, #16
+	bls .Lpoly1305_finish_ext_neon_user1
+.Lpoly1305_finish_ext_neon_user2r:
+	orr r10, r10, #8
+	b .Lpoly1305_finish_ext_neon_finalblock
+.Lpoly1305_finish_ext_neon_user1:
+	orr r10, r10, #16
+.Lpoly1305_finish_ext_neon_finalblock:
+	str r10, [r5, #116]
+	mov r0, r5
+	eor r1, r1, r1
+	mov r2, #32
+	bl .Lpoly1305_blocks_neon_local
+.Lpoly1305_finish_ext_neon_notstarted:
+	add r0, r5, #60
+	add r9, r5, #100
+	ldm r0, {r0-r3}
+	ldm r9, {r9-r12}
+	adds r0, r0, r9
+	adcs r1, r1, r10
+	adcs r2, r2, r11
+	adcs r3, r3, r12
+	stm r8, {r0-r3}
+	veor q0, q0, q0
+	veor q1, q1, q1
+	veor q2, q2, q2
+	veor q3, q3, q3
+	vstmia r5!, {q0-q3}
+	vstm r5, {q0-q3}
+	add sp, sp, #32
+	ldmfd sp!, {r4-r11, lr}
+	mov r0, #(9*4+32)
+	bx lr
+.size _gcry_poly1305_armv7_neon_finish_ext,.-_gcry_poly1305_armv7_neon_finish_ext;
+
+#endif
diff --git a/cipher/poly1305-internal.h b/cipher/poly1305-internal.h
index 0299c43..dfc0c04 100644
--- a/cipher/poly1305-internal.h
+++ b/cipher/poly1305-internal.h
 <at>  <at>  -65,10 +65,24  <at>  <at> 
 #endif

 
+/* POLY1305_USE_NEON indicates whether to enable ARM NEON assembly code. */
+#undef POLY1305_USE_NEON
+#if defined(ENABLE_NEON_SUPPORT) && defined(HAVE_ARM_ARCH_V6) && \
+    defined(__ARMEL__) && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
+    defined(HAVE_GCC_INLINE_ASM_NEON)
+# define POLY1305_USE_NEON 1
+# define POLY1305_NEON_BLOCKSIZE 32
+# define POLY1305_NEON_STATESIZE 128
+# define POLY1305_NEON_ALIGNMENT 16
+#endif
+
+
 /* Largest block-size used in any implementation (optimized implementations
  * might use block-size multiple of 16). */
 #ifdef POLY1305_USE_AVX2
 # define POLY1305_LARGEST_BLOCKSIZE POLY1305_AVX2_BLOCKSIZE
+#elif defined(POLY1305_USE_NEON)
+# define POLY1305_LARGEST_BLOCKSIZE POLY1305_NEON_BLOCKSIZE
 #elif defined(POLY1305_USE_SSE2)
 # define POLY1305_LARGEST_BLOCKSIZE POLY1305_SSE2_BLOCKSIZE
 #else
 <at>  <at>  -78,6 +92,8  <at>  <at> 
 /* Largest state-size used in any implementation. */
 #ifdef POLY1305_USE_AVX2
 # define POLY1305_LARGEST_STATESIZE POLY1305_AVX2_STATESIZE
+#elif defined(POLY1305_USE_NEON)
+# define POLY1305_LARGEST_STATESIZE POLY1305_NEON_STATESIZE
 #elif defined(POLY1305_USE_SSE2)
 # define POLY1305_LARGEST_STATESIZE POLY1305_SSE2_STATESIZE
 #else
 <at>  <at>  -87,6 +103,8  <at>  <at> 
 /* Minimum alignment for state pointer passed to implementations. */
 #ifdef POLY1305_USE_AVX2
 # define POLY1305_STATE_ALIGNMENT POLY1305_AVX2_ALIGNMENT
+#elif defined(POLY1305_USE_NEON)
+# define POLY1305_STATE_ALIGNMENT POLY1305_NEON_ALIGNMENT
 #elif defined(POLY1305_USE_SSE2)
 # define POLY1305_STATE_ALIGNMENT POLY1305_SSE2_ALIGNMENT
 #else
diff --git a/cipher/poly1305.c b/cipher/poly1305.c
index fe241c1..28dbbf8 100644
--- a/cipher/poly1305.c
+++ b/cipher/poly1305.c
 <at>  <at>  -76,6 +76,25  <at>  <at>  static const poly1305_ops_t poly1305_amd64_avx2_ops = {
 #endif

 
+#ifdef POLY1305_USE_NEON
+
+void _gcry_poly1305_armv7_neon_init_ext(void *state, const poly1305_key_t *key);
+unsigned int _gcry_poly1305_armv7_neon_finish_ext(void *state, const byte *m,
+						  size_t remaining,
+						  byte mac[16]);
+unsigned int _gcry_poly1305_armv7_neon_blocks(void *ctx, const byte *m,
+					      size_t bytes);
+
+static const poly1305_ops_t poly1305_armv7_neon_ops = {
+  POLY1305_NEON_BLOCKSIZE,
+  _gcry_poly1305_armv7_neon_init_ext,
+  _gcry_poly1305_armv7_neon_blocks,
+  _gcry_poly1305_armv7_neon_finish_ext
+};
+
+#endif
+
+
 #ifdef HAVE_U64_TYPEDEF

 /* Reference unoptimized poly1305 implementation using 32 bit * 32 bit = 64 bit
 <at>  <at>  -661,6 +680,10  <at>  <at>  _gcry_poly1305_init (poly1305_context_t * ctx, const byte * key,
   if (features & HWF_INTEL_AVX2)
     ctx->ops = &poly1305_amd64_avx2_ops;
 #endif
+#ifdef POLY1305_USE_NEON
+  if (features & HWF_ARM_NEON)
+    ctx->ops = &poly1305_armv7_neon_ops;
+#endif
   (void)features;

   buf_cpy (keytmp.b, key, POLY1305_KEYLEN);
diff --git a/configure.ac b/configure.ac
index 60ed015..a0d5fc9 100644
--- a/configure.ac
+++ b/configure.ac
 <at>  <at>  -1837,6 +1837,11  <at>  <at>  case "${host}" in
    ;;
 esac

+if test x"$neonsupport" = xyes ; then
+   # Build with the NEON implementation
+   GCRYPT_CIPHERS="$GCRYPT_CIPHERS poly1305-armv7-neon.lo"
+fi
+
 LIST_MEMBER(dsa, $enabled_pubkey_ciphers)
 if test "$found" = "1" ; then
    GCRYPT_PUBKEY_CIPHERS="$GCRYPT_PUBKEY_CIPHERS dsa.lo"

commit c584f44543883346d5a565581ff99a0afce9c5e1
Author: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
Date:   Wed Aug 6 20:05:16 2014 +0300

    chacha20: add ARMv7/NEON implementation

    * cipher/Makefile.am: Add 'chacha20-armv7-neon.S'.
    * cipher/chacha20-armv7-neon.S: New.
    * cipher/chacha20.c (USE_NEON): New.
    [USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New.
    (chacha20_do_setkey) [USE_NEON]: Use Neon implementation if
    HWF_ARM_NEON flag set.
    (selftest): Self-test encrypting buffer byte by byte.
    * configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'.
    --

    Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original
    source is available at: https://github.com/floodyberry/chacha-opt

    Benchmark on Cortex-A8 (--cpu-mhz 1008):

    Old:
     CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
         STREAM enc |     13.45 ns/B     70.92 MiB/s     13.56 c/B
         STREAM dec |     13.45 ns/B     70.90 MiB/s     13.56 c/B

    New:
     CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
         STREAM enc |      6.20 ns/B     153.9 MiB/s      6.25 c/B
         STREAM dec |      6.20 ns/B     153.9 MiB/s      6.25 c/B

    Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>

diff --git a/cipher/Makefile.am b/cipher/Makefile.am
index 7f45cbb..09ccaf9 100644
--- a/cipher/Makefile.am
+++ b/cipher/Makefile.am
 <at>  <at>  -61,6 +61,7  <at>  <at>  arcfour.c arcfour-amd64.S \
 blowfish.c blowfish-amd64.S blowfish-arm.S \
 cast5.c cast5-amd64.S cast5-arm.S \
 chacha20.c chacha20-sse2-amd64.S chacha20-ssse3-amd64.S chacha20-avx2-amd64.S \
+  chacha20-armv7-neon.S \
 crc.c \
 des.c des-amd64.S \
 dsa.c \
diff --git a/cipher/chacha20-armv7-neon.S b/cipher/chacha20-armv7-neon.S
new file mode 100644
index 0000000..1a395ba
--- /dev/null
+++ b/cipher/chacha20-armv7-neon.S
 <at>  <at>  -0,0 +1,710  <at>  <at> 
+/* chacha20-armv7-neon.S - ARM/NEON accelerated chacha20 blocks function
+ *
+ * Copyright (C) 2014 Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
+ *
+ * This file is part of Libgcrypt.
+ *
+ * Libgcrypt is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * Libgcrypt is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Based on public domain implementation by Andrew Moon at
+ *  https://github.com/floodyberry/chacha-opt
+ */
+
+#include <config.h>
+
+#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \
+    defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
+    defined(HAVE_GCC_INLINE_ASM_NEON) && defined(USE_CHACHA20)
+
+.syntax unified
+.fpu neon
+.arm
+
+.text
+
+.globl _gcry_chacha20_armv7_neon_blocks
+.type  _gcry_chacha20_armv7_neon_blocks,%function;
+_gcry_chacha20_armv7_neon_blocks:
+.Lchacha_blocks_neon_local:
+	tst r3, r3
+	beq .Lchacha_blocks_neon_nobytes
+	vstmdb sp!, {q4,q5,q6,q7}
+	stmfd sp!, {r4-r12, r14}
+	mov r8, sp
+	sub sp, sp, #196
+	and sp, sp, #0xffffffe0
+	str r0, [sp, #60]
+	str r1, [sp, #48]
+	str r2, [sp, #40]
+	str r3, [sp, #52]
+	str r8, [sp, #192]
+	add r1, sp, #64
+	ldmia r0!, {r4-r11}
+	stmia r1!, {r4-r11}
+	ldmia r0!, {r4-r11}
+	stmia r1!, {r4-r11}
+	mov r4, #20
+	str r4, [sp, #44]
+	cmp r3, #256
+	blo .Lchacha_blocks_neon_mainloop2
+.Lchacha_blocks_neon_mainloop1:
+	ldr r0, [sp, #44]
+	str r0, [sp, #0]
+	add r1, sp, #(64)
+	mov r2, #1
+	veor q12, q12
+	vld1.32 {q0,q1}, [r1,:128]!
+	vld1.32 {q2,q3}, [r1,:128]
+	vmov.32 d24[0], r2
+	vadd.u64 q3, q3, q12
+	vmov q4, q0
+	vmov q5, q1
+	vmov q6, q2
+	vadd.u64 q7, q3, q12
+	vmov q8, q0
+	vmov q9, q1
+	vmov q10, q2
+	vadd.u64 q11, q7, q12
+	add r0, sp, #64
+	ldm r0, {r0-r12}
+	ldr r14, [sp, #(64 +60)]
+	str r6, [sp, #8]
+	str r11, [sp, #12]
+	str r14, [sp, #28]
+	ldr r11, [sp, #(64 +52)]
+	ldr r14, [sp, #(64 +56)]
+.Lchacha_blocks_neon_rounds1:
+	ldr r6, [sp, #0]
+	vadd.i32 q0, q0, q1
+	add r0, r0, r4
+	vadd.i32 q4, q4, q5
+	add r1, r1, r5
+	vadd.i32 q8, q8, q9
+	eor r12, r12, r0
+	veor q12, q3, q0
+	eor r11, r11, r1
+	veor q13, q7, q4
+	ror r12, r12, #16
+	veor q14, q11, q8
+	ror r11, r11, #16
+	vrev32.16 q3, q12
+	subs r6, r6, #2
+	vrev32.16 q7, q13
+	add r8, r8, r12
+	vrev32.16 q11, q14
+	add r9, r9, r11
+	vadd.i32 q2, q2, q3
+	eor r4, r4, r8
+	vadd.i32 q6, q6, q7
+	eor r5, r5, r9
+	vadd.i32 q10, q10, q11
+	str r6, [sp, #0]
+	veor q12, q1, q2
+	ror r4, r4, #20
+	veor q13, q5, q6
+	ror r5, r5, #20
+	veor q14, q9, q10
+	add r0, r0, r4
+	vshl.i32 q1, q12, #12
+	add r1, r1, r5
+	vshl.i32 q5, q13, #12
+	ldr r6, [sp, #8]
+	vshl.i32 q9, q14, #12
+	eor r12, r12, r0
+	vsri.u32 q1, q12, #20
+	eor r11, r11, r1
+	vsri.u32 q5, q13, #20
+	ror r12, r12, #24
+	vsri.u32 q9, q14, #20
+	ror r11, r11, #24
+	vadd.i32 q0, q0, q1
+	add r8, r8, r12
+	vadd.i32 q4, q4, q5
+	add r9, r9, r11
+	vadd.i32 q8, q8, q9
+	eor r4, r4, r8
+	veor q12, q3, q0
+	eor r5, r5, r9
+	veor q13, q7, q4
+	str r11, [sp, #20]
+	veor q14, q11, q8
+	ror r4, r4, #25
+	vshl.i32 q3, q12, #8
+	ror r5, r5, #25
+	vshl.i32 q7, q13, #8
+	str r4, [sp, #4]
+	vshl.i32 q11, q14, #8
+	ldr r4, [sp, #28]
+	vsri.u32 q3, q12, #24
+	add r2, r2, r6
+	vsri.u32 q7, q13, #24
+	add r3, r3, r7
+	vsri.u32 q11, q14, #24
+	ldr r11, [sp, #12]
+	vadd.i32 q2, q2, q3
+	eor r14, r14, r2
+	vadd.i32 q6, q6, q7
+	eor r4, r4, r3
+	vadd.i32 q10, q10, q11
+	ror r14, r14, #16
+	veor q12, q1, q2
+	ror r4, r4, #16
+	veor q13, q5, q6
+	add r10, r10, r14
+	veor q14, q9, q10
+	add r11, r11, r4
+	vshl.i32 q1, q12, #7
+	eor r6, r6, r10
+	vshl.i32 q5, q13, #7
+	eor r7, r7, r11
+	vshl.i32 q9, q14, #7
+	ror r6, r6, #20
+	vsri.u32 q1, q12, #25
+	ror r7, r7, #20
+	vsri.u32 q5, q13, #25
+	add r2, r2, r6
+	vsri.u32 q9, q14, #25
+	add r3, r3, r7
+	vext.32 q3, q3, q3, #3
+	eor r14, r14, r2
+	vext.32 q7, q7, q7, #3
+	eor r4, r4, r3
+	vext.32 q11, q11, q11, #3
+	ror r14, r14, #24
+	vext.32 q1, q1, q1, #1
+	ror r4, r4, #24
+	vext.32 q5, q5, q5, #1
+	add r10, r10, r14
+	vext.32 q9, q9, q9, #1
+	add r11, r11, r4
+	vext.32 q2, q2, q2, #2
+	eor r6, r6, r10
+	vext.32 q6, q6, q6, #2
+	eor r7, r7, r11
+	vext.32 q10, q10, q10, #2
+	ror r6, r6, #25
+	vadd.i32 q0, q0, q1
+	ror r7, r7, #25
+	vadd.i32 q4, q4, q5
+	add r0, r0, r5
+	vadd.i32 q8, q8, q9
+	add r1, r1, r6
+	veor q12, q3, q0
+	eor r4, r4, r0
+	veor q13, q7, q4
+	eor r12, r12, r1
+	veor q14, q11, q8
+	ror r4, r4, #16
+	vrev32.16 q3, q12
+	ror r12, r12, #16
+	vrev32.16 q7, q13
+	add r10, r10, r4
+	vrev32.16 q11, q14
+	add r11, r11, r12
+	vadd.i32 q2, q2, q3
+	eor r5, r5, r10
+	vadd.i32 q6, q6, q7
+	eor r6, r6, r11
+	vadd.i32 q10, q10, q11
+	ror r5, r5, #20
+	veor q12, q1, q2
+	ror r6, r6, #20
+	veor q13, q5, q6
+	add r0, r0, r5
+	veor q14, q9, q10
+	add r1, r1, r6
+	vshl.i32 q1, q12, #12
+	eor r4, r4, r0
+	vshl.i32 q5, q13, #12
+	eor r12, r12, r1
+	vshl.i32 q9, q14, #12
+	ror r4, r4, #24
+	vsri.u32 q1, q12, #20
+	ror r12, r12, #24
+	vsri.u32 q5, q13, #20
+	add r10, r10, r4
+	vsri.u32 q9, q14, #20
+	add r11, r11, r12
+	vadd.i32 q0, q0, q1
+	eor r5, r5, r10
+	vadd.i32 q4, q4, q5
+	eor r6, r6, r11
+	vadd.i32 q8, q8, q9
+	str r11, [sp, #12]
+	veor q12, q3, q0
+	ror r5, r5, #25
+	veor q13, q7, q4
+	ror r6, r6, #25
+	veor q14, q11, q8
+	str r4, [sp, #28]
+	vshl.i32 q3, q12, #8
+	ldr r4, [sp, #4]
+	vshl.i32 q7, q13, #8
+	add r2, r2, r7
+	vshl.i32 q11, q14, #8
+	add r3, r3, r4
+	vsri.u32 q3, q12, #24
+	ldr r11, [sp, #20]
+	vsri.u32 q7, q13, #24
+	eor r11, r11, r2
+	vsri.u32 q11, q14, #24
+	eor r14, r14, r3
+	vadd.i32 q2, q2, q3
+	ror r11, r11, #16
+	vadd.i32 q6, q6, q7
+	ror r14, r14, #16
+	vadd.i32 q10, q10, q11
+	add r8, r8, r11
+	veor q12, q1, q2
+	add r9, r9, r14
+	veor q13, q5, q6
+	eor r7, r7, r8
+	veor q14, q9, q10
+	eor r4, r4, r9
+	vshl.i32 q1, q12, #7
+	ror r7, r7, #20
+	vshl.i32 q5, q13, #7
+	ror r4, r4, #20
+	vshl.i32 q9, q14, #7
+	str r6, [sp, #8]
+	vsri.u32 q1, q12, #25
+	add r2, r2, r7
+	vsri.u32 q5, q13, #25
+	add r3, r3, r4
+	vsri.u32 q9, q14, #25
+	eor r11, r11, r2
+	vext.32 q3, q3, q3, #1
+	eor r14, r14, r3
+	vext.32 q7, q7, q7, #1
+	ror r11, r11, #24
+	vext.32 q11, q11, q11, #1
+	ror r14, r14, #24
+	vext.32 q1, q1, q1, #3
+	add r8, r8, r11
+	vext.32 q5, q5, q5, #3
+	add r9, r9, r14
+	vext.32 q9, q9, q9, #3
+	eor r7, r7, r8
+	vext.32 q2, q2, q2, #2
+	eor r4, r4, r9
+	vext.32 q6, q6, q6, #2
+	ror r7, r7, #25
+	vext.32 q10, q10, q10, #2
+	ror r4, r4, #25
+	bne .Lchacha_blocks_neon_rounds1
+	str r8, [sp, #0]
+	str r9, [sp, #4]
+	str r10, [sp, #8]
+	str r12, [sp, #16]
+	str r11, [sp, #20]
+	str r14, [sp, #24]
+	add r9, sp, #64
+	vld1.32 {q12,q13}, [r9,:128]!
+	ldr r12, [sp, #48]
+	vld1.32 {q14,q15}, [r9,:128]
+	ldr r14, [sp, #40]
+	vadd.i32 q0, q0, q12
+	ldr r8, [sp, #(64 +0)]
+	vadd.i32 q4, q4, q12
+	ldr r9, [sp, #(64 +4)]
+	vadd.i32 q8, q8, q12
+	ldr r10, [sp, #(64 +8)]
+	vadd.i32 q1, q1, q13
+	ldr r11, [sp, #(64 +12)]
+	vadd.i32 q5, q5, q13
+	add r0, r0, r8
+	vadd.i32 q9, q9, q13
+	add r1, r1, r9
+	vadd.i32 q2, q2, q14
+	add r2, r2, r10
+	vadd.i32 q6, q6, q14
+	ldr r8, [sp, #(64 +16)]
+	vadd.i32 q10, q10, q14
+	add r3, r3, r11
+	veor q14, q14, q14
+	ldr r9, [sp, #(64 +20)]
+	mov r11, #1
+	add r4, r4, r8
+	vmov.32 d28[0], r11
+	ldr r10, [sp, #(64 +24)]
+	vadd.u64 q12, q14, q15
+	add r5, r5, r9
+	vadd.u64 q13, q14, q12
+	ldr r11, [sp, #(64 +28)]
+	vadd.u64 q14, q14, q13
+	add r6, r6, r10
+	vadd.i32 q3, q3, q12
+	tst r12, r12
+	vadd.i32 q7, q7, q13
+	add r7, r7, r11
+	vadd.i32 q11, q11, q14
+	beq .Lchacha_blocks_neon_nomessage11
+	ldmia r12!, {r8-r11}
+	eor r0, r0, r8
+	eor r1, r1, r9
+	eor r2, r2, r10
+	ldr r8, [r12, #0]
+	eor r3, r3, r11
+	ldr r9, [r12, #4]
+	eor r4, r4, r8
+	ldr r10, [r12, #8]
+	eor r5, r5, r9
+	ldr r11, [r12, #12]
+	eor r6, r6, r10
+	add r12, r12, #16
+	eor r7, r7, r11
+.Lchacha_blocks_neon_nomessage11:
+	stmia r14!, {r0-r7}
+	ldm sp, {r0-r7}
+	ldr r8, [sp, #(64 +32)]
+	ldr r9, [sp, #(64 +36)]
+	ldr r10, [sp, #(64 +40)]
+	ldr r11, [sp, #(64 +44)]
+	add r0, r0, r8
+	add r1, r1, r9
+	add r2, r2, r10
+	ldr r8, [sp, #(64 +48)]
+	add r3, r3, r11
+	ldr r9, [sp, #(64 +52)]
+	add r4, r4, r8
+	ldr r10, [sp, #(64 +56)]
+	add r5, r5, r9
+	ldr r11, [sp, #(64 +60)]
+	add r6, r6, r10
+	adds r8, r8, #4
+	add r7, r7, r11
+	adc r9, r9, #0
+	str r8, [sp, #(64 +48)]
+	tst r12, r12
+	str r9, [sp, #(64 +52)]
+	beq .Lchacha_blocks_neon_nomessage12
+	ldmia r12!, {r8-r11}
+	eor r0, r0, r8
+	eor r1, r1, r9
+	eor r2, r2, r10
+	ldr r8, [r12, #0]
+	eor r3, r3, r11
+	ldr r9, [r12, #4]
+	eor r4, r4, r8
+	ldr r10, [r12, #8]
+	eor r5, r5, r9
+	ldr r11, [r12, #12]
+	eor r6, r6, r10
+	add r12, r12, #16
+	eor r7, r7, r11
+.Lchacha_blocks_neon_nomessage12:
+	stmia r14!, {r0-r7}
+	beq .Lchacha_blocks_neon_nomessage13
+	vld1.32 {q12,q13}, [r12]!
+	vld1.32 {q14,q15}, [r12]!
+	veor q0, q0, q12
+	veor q1, q1, q13
+	veor q2, q2, q14
+	veor q3, q3, q15
+.Lchacha_blocks_neon_nomessage13:
+	vst1.32 {q0,q1}, [r14]!
+	vst1.32 {q2,q3}, [r14]!
+	beq .Lchacha_blocks_neon_nomessage14
+	vld1.32 {q12,q13}, [r12]!
+	vld1.32 {q14,q15}, [r12]!
+	veor q4, q4, q12
+	veor q5, q5, q13
+	veor q6, q6, q14
+	veor q7, q7, q15
+.Lchacha_blocks_neon_nomessage14:
+	vst1.32 {q4,q5}, [r14]!
+	vst1.32 {q6,q7}, [r14]!
+	beq .Lchacha_blocks_neon_nomessage15
+	vld1.32 {q12,q13}, [r12]!
+	vld1.32 {q14,q15}, [r12]!
+	veor q8, q8, q12
+	veor q9, q9, q13
+	veor q10, q10, q14
+	veor q11, q11, q15
+.Lchacha_blocks_neon_nomessage15:
+	vst1.32 {q8,q9}, [r14]!
+	vst1.32 {q10,q11}, [r14]!
+	str r12, [sp, #48]
+	str r14, [sp, #40]
+	ldr r3, [sp, #52]
+	sub r3, r3, #256
+	cmp r3, #256
+	str r3, [sp, #52]
+	bhs .Lchacha_blocks_neon_mainloop1
+	tst r3, r3
+	beq .Lchacha_blocks_neon_done
+.Lchacha_blocks_neon_mainloop2:
+	ldr r3, [sp, #52]
+	ldr r1, [sp, #48]
+	cmp r3, #64
+	bhs .Lchacha_blocks_neon_noswap1
+	add r4, sp, #128
+	mov r5, r4
+	tst r1, r1
+	beq .Lchacha_blocks_neon_nocopy1
+.Lchacha_blocks_neon_copyinput1:
+	subs r3, r3, #1
+	ldrb r0, [r1], #1
+	strb r0, [r4], #1
+	bne .Lchacha_blocks_neon_copyinput1
+	str r5, [sp, #48]
+.Lchacha_blocks_neon_nocopy1:
+	ldr r4, [sp, #40]
+	str r5, [sp, #40]
+	str r4, [sp, #56]
+.Lchacha_blocks_neon_noswap1:
+	ldr r0, [sp, #44]
+	str r0, [sp, #0]
+	add r0, sp, #64
+	ldm r0, {r0-r12}
+	ldr r14, [sp, #(64 +60)]
+	str r6, [sp, #8]
+	str r11, [sp, #12]
+	str r14, [sp, #28]
+	ldr r11, [sp, #(64 +52)]
+	ldr r14, [sp, #(64 +56)]
+.Lchacha_blocks_neon_rounds2:
+	ldr r6, [sp, #0]
+	add r0, r0, r4
+	add r1, r1, r5
+	eor r12, r12, r0
+	eor r11, r11, r1
+	ror r12, r12, #16
+	ror r11, r11, #16
+	subs r6, r6, #2
+	add r8, r8, r12
+	add r9, r9, r11
+	eor r4, r4, r8
+	eor r5, r5, r9
+	str r6, [sp, #0]
+	ror r4, r4, #20
+	ror r5, r5, #20
+	add r0, r0, r4
+	add r1, r1, r5
+	ldr r6, [sp, #8]
+	eor r12, r12, r0
+	eor r11, r11, r1
+	ror r12, r12, #24
+	ror r11, r11, #24
+	add r8, r8, r12
+	add r9, r9, r11
+	eor r4, r4, r8
+	eor r5, r5, r9
+	str r11, [sp, #20]
+	ror r4, r4, #25
+	ror r5, r5, #25
+	str r4, [sp, #4]
+	ldr r4, [sp, #28]
+	add r2, r2, r6
+	add r3, r3, r7
+	ldr r11, [sp, #12]
+	eor r14, r14, r2
+	eor r4, r4, r3
+	ror r14, r14, #16
+	ror r4, r4, #16
+	add r10, r10, r14
+	add r11, r11, r4
+	eor r6, r6, r10
+	eor r7, r7, r11
+	ror r6, r6, #20
+	ror r7, r7, #20
+	add r2, r2, r6
+	add r3, r3, r7
+	eor r14, r14, r2
+	eor r4, r4, r3
+	ror r14, r14, #24
+	ror r4, r4, #24
+	add r10, r10, r14
+	add r11, r11, r4
+	eor r6, r6, r10
+	eor r7, r7, r11
+	ror r6, r6, #25
+	ror r7, r7, #25
+	add r0, r0, r5
+	add r1, r1, r6
+	eor r4, r4, r0
+	eor r12, r12, r1
+	ror r4, r4, #16
+	ror r12, r12, #16
+	add r10, r10, r4
+	add r11, r11, r12
+	eor r5, r5, r10
+	eor r6, r6, r11
+	ror r5, r5, #20
+	ror r6, r6, #20
+	add r0, r0, r5
+	add r1, r1, r6
+	eor r4, r4, r0
+	eor r12, r12, r1
+	ror r4, r4, #24
+	ror r12, r12, #24
+	add r10, r10, r4
+	add r11, r11, r12
+	eor r5, r5, r10
+	eor r6, r6, r11
+	str r11, [sp, #12]
+	ror r5, r5, #25
+	ror r6, r6, #25
+	str r4, [sp, #28]
+	ldr r4, [sp, #4]
+	add r2, r2, r7
+	add r3, r3, r4
+	ldr r11, [sp, #20]
+	eor r11, r11, r2
+	eor r14, r14, r3
+	ror r11, r11, #16
+	ror r14, r14, #16
+	add r8, r8, r11
+	add r9, r9, r14
+	eor r7, r7, r8
+	eor r4, r4, r9
+	ror r7, r7, #20
+	ror r4, r4, #20
+	str r6, [sp, #8]
+	add r2, r2, r7
+	add r3, r3, r4
+	eor r11, r11, r2
+	eor r14, r14, r3
+	ror r11, r11, #24
+	ror r14, r14, #24
+	add r8, r8, r11
+	add r9, r9, r14
+	eor r7, r7, r8
+	eor r4, r4, r9
+	ror r7, r7, #25
+	ror r4, r4, #25
+	bne .Lchacha_blocks_neon_rounds2
+	str r8, [sp, #0]
+	str r9, [sp, #4]
+	str r10, [sp, #8]
+	str r12, [sp, #16]
+	str r11, [sp, #20]
+	str r14, [sp, #24]
+	ldr r12, [sp, #48]
+	ldr r14, [sp, #40]
+	ldr r8, [sp, #(64 +0)]
+	ldr r9, [sp, #(64 +4)]
+	ldr r10, [sp, #(64 +8)]
+	ldr r11, [sp, #(64 +12)]
+	add r0, r0, r8
+	add r1, r1, r9
+	add r2, r2, r10
+	ldr r8, [sp, #(64 +16)]
+	add r3, r3, r11
+	ldr r9, [sp, #(64 +20)]
+	add r4, r4, r8
+	ldr r10, [sp, #(64 +24)]
+	add r5, r5, r9
+	ldr r11, [sp, #(64 +28)]
+	add r6, r6, r10
+	tst r12, r12
+	add r7, r7, r11
+	beq .Lchacha_blocks_neon_nomessage21
+	ldmia r12!, {r8-r11}
+	eor r0, r0, r8
+	eor r1, r1, r9
+	eor r2, r2, r10
+	ldr r8, [r12, #0]
+	eor r3, r3, r11
+	ldr r9, [r12, #4]
+	eor r4, r4, r8
+	ldr r10, [r12, #8]
+	eor r5, r5, r9
+	ldr r11, [r12, #12]
+	eor r6, r6, r10
+	add r12, r12, #16
+	eor r7, r7, r11
+.Lchacha_blocks_neon_nomessage21:
+	stmia r14!, {r0-r7}
+	ldm sp, {r0-r7}
+	ldr r8, [sp, #(64 +32)]
+	ldr r9, [sp, #(64 +36)]
+	ldr r10, [sp, #(64 +40)]
+	ldr r11, [sp, #(64 +44)]
+	add r0, r0, r8
+	add r1, r1, r9
+	add r2, r2, r10
+	ldr r8, [sp, #(64 +48)]
+	add r3, r3, r11
+	ldr r9, [sp, #(64 +52)]
+	add r4, r4, r8
+	ldr r10, [sp, #(64 +56)]
+	add r5, r5, r9
+	ldr r11, [sp, #(64 +60)]
+	add r6, r6, r10
+	adds r8, r8, #1
+	add r7, r7, r11
+	adc r9, r9, #0
+	str r8, [sp, #(64 +48)]
+	tst r12, r12
+	str r9, [sp, #(64 +52)]
+	beq .Lchacha_blocks_neon_nomessage22
+	ldmia r12!, {r8-r11}
+	eor r0, r0, r8
+	eor r1, r1, r9
+	eor r2, r2, r10
+	ldr r8, [r12, #0]
+	eor r3, r3, r11
+	ldr r9, [r12, #4]
+	eor r4, r4, r8
+	ldr r10, [r12, #8]
+	eor r5, r5, r9
+	ldr r11, [r12, #12]
+	eor r6, r6, r10
+	add r12, r12, #16
+	eor r7, r7, r11
+.Lchacha_blocks_neon_nomessage22:
+	stmia r14!, {r0-r7}
+	str r12, [sp, #48]
+	str r14, [sp, #40]
+	ldr r3, [sp, #52]
+	cmp r3, #64
+	sub r4, r3, #64
+	str r4, [sp, #52]
+	bhi .Lchacha_blocks_neon_mainloop2
+	cmp r3, #64
+	beq .Lchacha_blocks_neon_nocopy2
+	ldr r1, [sp, #56]
+	sub r14, r14, #64
+.Lchacha_blocks_neon_copyinput2:
+	subs r3, r3, #1
+	ldrb r0, [r14], #1
+	strb r0, [r1], #1
+	bne .Lchacha_blocks_neon_copyinput2
+.Lchacha_blocks_neon_nocopy2:
+.Lchacha_blocks_neon_done:
+	ldr r7, [sp, #60]
+	ldr r8, [sp, #(64 +48)]
+	ldr r9, [sp, #(64 +52)]
+	str r8, [r7, #(48 + 0)]
+	str r9, [r7, #(48 + 4)]
+	mov r12, sp
+	stmia r12!, {r0-r7}
+	add r12, r12, #48
+	stmia r12!, {r0-r7}
+	sub r0, sp, #8
+	ldr sp, [sp, #192]
+	ldmfd sp!, {r4-r12, r14}
+	vldm sp!, {q4-q7}
+	sub r0, sp, r0
+	bx lr
+.Lchacha_blocks_neon_nobytes:
+	mov r0, #0;
+	bx lr
+.ltorg
+.size _gcry_chacha20_armv7_neon_blocks,.-_gcry_chacha20_armv7_neon_blocks;
+
+#endif
diff --git a/cipher/chacha20.c b/cipher/chacha20.c
index ebba2fc..c1847aa 100644
--- a/cipher/chacha20.c
+++ b/cipher/chacha20.c
 <at>  <at>  -67,6 +67,16  <at>  <at> 
 # define USE_AVX2 1
 #endif

+/* USE_NEON indicates whether to enable ARM NEON assembly code. */
+#undef USE_NEON
+#ifdef ENABLE_NEON_SUPPORT
+# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \
+     && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \
+     && defined(HAVE_GCC_INLINE_ASM_NEON)
+#  define USE_NEON 1
+# endif
+#endif /*ENABLE_NEON_SUPPORT*/
+

 struct CHACHA20_context_s;

 <at>  <at>  -104,6 +114,13  <at>  <at>  unsigned int _gcry_chacha20_amd64_avx2_blocks(u32 *state, const byte *in,

 #endif /* USE_AVX2 */

+#ifdef USE_NEON
+
+unsigned int _gcry_chacha20_armv7_neon_blocks(u32 *state, const byte *in,
+                                              byte *out, size_t bytes);
+
+#endif /* USE_NEON */
+

 static void chacha20_setiv (void *context, const byte * iv, size_t ivlen);
 static const char *selftest (void);
 <at>  <at>  -353,6 +370,10  <at>  <at>  chacha20_do_setkey (CHACHA20_context_t * ctx,
   if (features & HWF_INTEL_AVX2)
     ctx->blocks = _gcry_chacha20_amd64_avx2_blocks;
 #endif
+#ifdef USE_NEON
+  if (features & HWF_ARM_NEON)
+    ctx->blocks = _gcry_chacha20_armv7_neon_blocks;
+#endif

   (void)features;

 <at>  <at>  -541,6 +562,19  <at>  <at>  selftest (void)
     if (buf[i] != (byte) i)
       return "ChaCha20 encryption test 2 failed.";

+  chacha20_setkey (&ctx, key_1, sizeof key_1);
+  chacha20_setiv (&ctx, nonce_1, sizeof nonce_1);
+  /* encrypt */
+  for (i = 0; i < sizeof buf; i++)
+    chacha20_encrypt_stream (&ctx, &buf[i], &buf[i], 1);
+  /* decrypt */
+  chacha20_setkey (&ctx, key_1, sizeof key_1);
+  chacha20_setiv (&ctx, nonce_1, sizeof nonce_1);
+  chacha20_encrypt_stream (&ctx, buf, buf, sizeof buf);
+  for (i = 0; i < sizeof buf; i++)
+    if (buf[i] != (byte) i)
+      return "ChaCha20 encryption test 3 failed.";
+
   return NULL;
 }

diff --git a/configure.ac b/configure.ac
index d14b7f6..60ed015 100644
--- a/configure.ac
+++ b/configure.ac
 <at>  <at>  -1822,6 +1822,11  <at>  <at>  if test "$found" = "1" ; then
          GCRYPT_CIPHERS="$GCRYPT_CIPHERS chacha20-avx2-amd64.lo"
       ;;
    esac
+
+   if test x"$neonsupport" = xyes ; then
+     # Build with the NEON implementation
+     GCRYPT_CIPHERS="$GCRYPT_CIPHERS chacha20-armv7-neon.lo"
+   fi
 fi

 case "${host}" in

-----------------------------------------------------------------------

Summary of changes:
 cipher/Makefile.am           |    3 +-
 cipher/chacha20-armv7-neon.S |  710 ++++++++++++++++++++++++++++++++++++++++++
 cipher/chacha20.c            |   34 ++
 cipher/poly1305-armv7-neon.S |  705 +++++++++++++++++++++++++++++++++++++++++
 cipher/poly1305-internal.h   |   18 ++
 cipher/poly1305.c            |   23 ++
 configure.ac                 |   10 +
 src/hwf-arm.c                |   57 +++-
 8 files changed, 1556 insertions(+), 4 deletions(-)
 create mode 100644 cipher/chacha20-armv7-neon.S
 create mode 100644 cipher/poly1305-armv7-neon.S

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
Jussi Kivilinna | 2 Nov 17:52 2014
Picon
Picon

[PATCH 1/3] chacha20: add ARMv7/NEON implementation

* cipher/Makefile.am: Add 'chacha20-armv7-neon.S'.
* cipher/chacha20-armv7-neon.S: New.
* cipher/chacha20.c (USE_NEON): New.
[USE_NEON] (_gcry_chacha20_armv7_neon_blocks): New.
(chacha20_do_setkey) [USE_NEON]: Use Neon implementation if
HWF_ARM_NEON flag set.
(selftest): Self-test encrypting buffer byte by byte.
* configure.ac [neonsupport=yes]: Add 'chacha20-armv7-neon.lo'.
--

Add Andrew Moon's public domain ARMv7/NEON implementation of ChaCha20. Original
source is available at: https://github.com/floodyberry/chacha-opt

Benchmark on Cortex-A8 (--cpu-mhz 1008):

Old:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |     13.45 ns/B     70.92 MiB/s     13.56 c/B
     STREAM dec |     13.45 ns/B     70.90 MiB/s     13.56 c/B

New:
 CHACHA20       |  nanosecs/byte   mebibytes/sec   cycles/byte
     STREAM enc |      6.20 ns/B     153.9 MiB/s      6.25 c/B
     STREAM dec |      6.20 ns/B     153.9 MiB/s      6.25 c/B

Signed-off-by: Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
---
 cipher/Makefile.am           |    1 
 cipher/chacha20-armv7-neon.S |  710 ++++++++++++++++++++++++++++++++++++++++++
 cipher/chacha20.c            |   34 ++
 configure.ac                 |    5 
 4 files changed, 750 insertions(+)
 create mode 100644 cipher/chacha20-armv7-neon.S

diff --git a/cipher/Makefile.am b/cipher/Makefile.am
index 7f45cbb..09ccaf9 100644
--- a/cipher/Makefile.am
+++ b/cipher/Makefile.am
 <at>  <at>  -61,6 +61,7  <at>  <at>  arcfour.c arcfour-amd64.S \
 blowfish.c blowfish-amd64.S blowfish-arm.S \
 cast5.c cast5-amd64.S cast5-arm.S \
 chacha20.c chacha20-sse2-amd64.S chacha20-ssse3-amd64.S chacha20-avx2-amd64.S \
+  chacha20-armv7-neon.S \
 crc.c \
 des.c des-amd64.S \
 dsa.c \
diff --git a/cipher/chacha20-armv7-neon.S b/cipher/chacha20-armv7-neon.S
new file mode 100644
index 0000000..1a395ba
--- /dev/null
+++ b/cipher/chacha20-armv7-neon.S
 <at>  <at>  -0,0 +1,710  <at>  <at> 
+/* chacha20-armv7-neon.S - ARM/NEON accelerated chacha20 blocks function
+ *
+ * Copyright (C) 2014 Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
+ *
+ * This file is part of Libgcrypt.
+ *
+ * Libgcrypt is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as
+ * published by the Free Software Foundation; either version 2.1 of
+ * the License, or (at your option) any later version.
+ *
+ * Libgcrypt is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * Based on public domain implementation by Andrew Moon at
+ *  https://github.com/floodyberry/chacha-opt
+ */
+
+#include <config.h>
+
+#if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) && \
+    defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) && \
+    defined(HAVE_GCC_INLINE_ASM_NEON) && defined(USE_CHACHA20)
+
+.syntax unified
+.fpu neon
+.arm
+
+.text
+
+.globl _gcry_chacha20_armv7_neon_blocks
+.type  _gcry_chacha20_armv7_neon_blocks,%function;
+_gcry_chacha20_armv7_neon_blocks:
+.Lchacha_blocks_neon_local:
+	tst r3, r3
+	beq .Lchacha_blocks_neon_nobytes
+	vstmdb sp!, {q4,q5,q6,q7}
+	stmfd sp!, {r4-r12, r14}
+	mov r8, sp
+	sub sp, sp, #196
+	and sp, sp, #0xffffffe0
+	str r0, [sp, #60]
+	str r1, [sp, #48]
+	str r2, [sp, #40]
+	str r3, [sp, #52]
+	str r8, [sp, #192]
+	add r1, sp, #64
+	ldmia r0!, {r4-r11}
+	stmia r1!, {r4-r11}
+	ldmia r0!, {r4-r11}
+	stmia r1!, {r4-r11}
+	mov r4, #20
+	str r4, [sp, #44]
+	cmp r3, #256
+	blo .Lchacha_blocks_neon_mainloop2
+.Lchacha_blocks_neon_mainloop1:
+	ldr r0, [sp, #44]
+	str r0, [sp, #0]
+	add r1, sp, #(64)
+	mov r2, #1
+	veor q12, q12
+	vld1.32 {q0,q1}, [r1,:128]!
+	vld1.32 {q2,q3}, [r1,:128]
+	vmov.32 d24[0], r2
+	vadd.u64 q3, q3, q12
+	vmov q4, q0
+	vmov q5, q1
+	vmov q6, q2
+	vadd.u64 q7, q3, q12
+	vmov q8, q0
+	vmov q9, q1
+	vmov q10, q2
+	vadd.u64 q11, q7, q12
+	add r0, sp, #64
+	ldm r0, {r0-r12}
+	ldr r14, [sp, #(64 +60)]
+	str r6, [sp, #8]
+	str r11, [sp, #12]
+	str r14, [sp, #28]
+	ldr r11, [sp, #(64 +52)]
+	ldr r14, [sp, #(64 +56)]
+.Lchacha_blocks_neon_rounds1:
+	ldr r6, [sp, #0]
+	vadd.i32 q0, q0, q1
+	add r0, r0, r4
+	vadd.i32 q4, q4, q5
+	add r1, r1, r5
+	vadd.i32 q8, q8, q9
+	eor r12, r12, r0
+	veor q12, q3, q0
+	eor r11, r11, r1
+	veor q13, q7, q4
+	ror r12, r12, #16
+	veor q14, q11, q8
+	ror r11, r11, #16
+	vrev32.16 q3, q12
+	subs r6, r6, #2
+	vrev32.16 q7, q13
+	add r8, r8, r12
+	vrev32.16 q11, q14
+	add r9, r9, r11
+	vadd.i32 q2, q2, q3
+	eor r4, r4, r8
+	vadd.i32 q6, q6, q7
+	eor r5, r5, r9
+	vadd.i32 q10, q10, q11
+	str r6, [sp, #0]
+	veor q12, q1, q2
+	ror r4, r4, #20
+	veor q13, q5, q6
+	ror r5, r5, #20
+	veor q14, q9, q10
+	add r0, r0, r4
+	vshl.i32 q1, q12, #12
+	add r1, r1, r5
+	vshl.i32 q5, q13, #12
+	ldr r6, [sp, #8]
+	vshl.i32 q9, q14, #12
+	eor r12, r12, r0
+	vsri.u32 q1, q12, #20
+	eor r11, r11, r1
+	vsri.u32 q5, q13, #20
+	ror r12, r12, #24
+	vsri.u32 q9, q14, #20
+	ror r11, r11, #24
+	vadd.i32 q0, q0, q1
+	add r8, r8, r12
+	vadd.i32 q4, q4, q5
+	add r9, r9, r11
+	vadd.i32 q8, q8, q9
+	eor r4, r4, r8
+	veor q12, q3, q0
+	eor r5, r5, r9
+	veor q13, q7, q4
+	str r11, [sp, #20]
+	veor q14, q11, q8
+	ror r4, r4, #25
+	vshl.i32 q3, q12, #8
+	ror r5, r5, #25
+	vshl.i32 q7, q13, #8
+	str r4, [sp, #4]
+	vshl.i32 q11, q14, #8
+	ldr r4, [sp, #28]
+	vsri.u32 q3, q12, #24
+	add r2, r2, r6
+	vsri.u32 q7, q13, #24
+	add r3, r3, r7
+	vsri.u32 q11, q14, #24
+	ldr r11, [sp, #12]
+	vadd.i32 q2, q2, q3
+	eor r14, r14, r2
+	vadd.i32 q6, q6, q7
+	eor r4, r4, r3
+	vadd.i32 q10, q10, q11
+	ror r14, r14, #16
+	veor q12, q1, q2
+	ror r4, r4, #16
+	veor q13, q5, q6
+	add r10, r10, r14
+	veor q14, q9, q10
+	add r11, r11, r4
+	vshl.i32 q1, q12, #7
+	eor r6, r6, r10
+	vshl.i32 q5, q13, #7
+	eor r7, r7, r11
+	vshl.i32 q9, q14, #7
+	ror r6, r6, #20
+	vsri.u32 q1, q12, #25
+	ror r7, r7, #20
+	vsri.u32 q5, q13, #25
+	add r2, r2, r6
+	vsri.u32 q9, q14, #25
+	add r3, r3, r7
+	vext.32 q3, q3, q3, #3
+	eor r14, r14, r2
+	vext.32 q7, q7, q7, #3
+	eor r4, r4, r3
+	vext.32 q11, q11, q11, #3
+	ror r14, r14, #24
+	vext.32 q1, q1, q1, #1
+	ror r4, r4, #24
+	vext.32 q5, q5, q5, #1
+	add r10, r10, r14
+	vext.32 q9, q9, q9, #1
+	add r11, r11, r4
+	vext.32 q2, q2, q2, #2
+	eor r6, r6, r10
+	vext.32 q6, q6, q6, #2
+	eor r7, r7, r11
+	vext.32 q10, q10, q10, #2
+	ror r6, r6, #25
+	vadd.i32 q0, q0, q1
+	ror r7, r7, #25
+	vadd.i32 q4, q4, q5
+	add r0, r0, r5
+	vadd.i32 q8, q8, q9
+	add r1, r1, r6
+	veor q12, q3, q0
+	eor r4, r4, r0
+	veor q13, q7, q4
+	eor r12, r12, r1
+	veor q14, q11, q8
+	ror r4, r4, #16
+	vrev32.16 q3, q12
+	ror r12, r12, #16
+	vrev32.16 q7, q13
+	add r10, r10, r4
+	vrev32.16 q11, q14
+	add r11, r11, r12
+	vadd.i32 q2, q2, q3
+	eor r5, r5, r10
+	vadd.i32 q6, q6, q7
+	eor r6, r6, r11
+	vadd.i32 q10, q10, q11
+	ror r5, r5, #20
+	veor q12, q1, q2
+	ror r6, r6, #20
+	veor q13, q5, q6
+	add r0, r0, r5
+	veor q14, q9, q10
+	add r1, r1, r6
+	vshl.i32 q1, q12, #12
+	eor r4, r4, r0
+	vshl.i32 q5, q13, #12
+	eor r12, r12, r1
+	vshl.i32 q9, q14, #12
+	ror r4, r4, #24
+	vsri.u32 q1, q12, #20
+	ror r12, r12, #24
+	vsri.u32 q5, q13, #20
+	add r10, r10, r4
+	vsri.u32 q9, q14, #20
+	add r11, r11, r12
+	vadd.i32 q0, q0, q1
+	eor r5, r5, r10
+	vadd.i32 q4, q4, q5
+	eor r6, r6, r11
+	vadd.i32 q8, q8, q9
+	str r11, [sp, #12]
+	veor q12, q3, q0
+	ror r5, r5, #25
+	veor q13, q7, q4
+	ror r6, r6, #25
+	veor q14, q11, q8
+	str r4, [sp, #28]
+	vshl.i32 q3, q12, #8
+	ldr r4, [sp, #4]
+	vshl.i32 q7, q13, #8
+	add r2, r2, r7
+	vshl.i32 q11, q14, #8
+	add r3, r3, r4
+	vsri.u32 q3, q12, #24
+	ldr r11, [sp, #20]
+	vsri.u32 q7, q13, #24
+	eor r11, r11, r2
+	vsri.u32 q11, q14, #24
+	eor r14, r14, r3
+	vadd.i32 q2, q2, q3
+	ror r11, r11, #16
+	vadd.i32 q6, q6, q7
+	ror r14, r14, #16
+	vadd.i32 q10, q10, q11
+	add r8, r8, r11
+	veor q12, q1, q2
+	add r9, r9, r14
+	veor q13, q5, q6
+	eor r7, r7, r8
+	veor q14, q9, q10
+	eor r4, r4, r9
+	vshl.i32 q1, q12, #7
+	ror r7, r7, #20
+	vshl.i32 q5, q13, #7
+	ror r4, r4, #20
+	vshl.i32 q9, q14, #7
+	str r6, [sp, #8]
+	vsri.u32 q1, q12, #25
+	add r2, r2, r7
+	vsri.u32 q5, q13, #25
+	add r3, r3, r4
+	vsri.u32 q9, q14, #25
+	eor r11, r11, r2
+	vext.32 q3, q3, q3, #1
+	eor r14, r14, r3
+	vext.32 q7, q7, q7, #1
+	ror r11, r11, #24
+	vext.32 q11, q11, q11, #1
+	ror r14, r14, #24
+	vext.32 q1, q1, q1, #3
+	add r8, r8, r11
+	vext.32 q5, q5, q5, #3
+	add r9, r9, r14
+	vext.32 q9, q9, q9, #3
+	eor r7, r7, r8
+	vext.32 q2, q2, q2, #2
+	eor r4, r4, r9
+	vext.32 q6, q6, q6, #2
+	ror r7, r7, #25
+	vext.32 q10, q10, q10, #2
+	ror r4, r4, #25
+	bne .Lchacha_blocks_neon_rounds1
+	str r8, [sp, #0]
+	str r9, [sp, #4]
+	str r10, [sp, #8]
+	str r12, [sp, #16]
+	str r11, [sp, #20]
+	str r14, [sp, #24]
+	add r9, sp, #64
+	vld1.32 {q12,q13}, [r9,:128]!
+	ldr r12, [sp, #48]
+	vld1.32 {q14,q15}, [r9,:128]
+	ldr r14, [sp, #40]
+	vadd.i32 q0, q0, q12
+	ldr r8, [sp, #(64 +0)]
+	vadd.i32 q4, q4, q12
+	ldr r9, [sp, #(64 +4)]
+	vadd.i32 q8, q8, q12
+	ldr r10, [sp, #(64 +8)]
+	vadd.i32 q1, q1, q13
+	ldr r11, [sp, #(64 +12)]
+	vadd.i32 q5, q5, q13
+	add r0, r0, r8
+	vadd.i32 q9, q9, q13
+	add r1, r1, r9
+	vadd.i32 q2, q2, q14
+	add r2, r2, r10
+	vadd.i32 q6, q6, q14
+	ldr r8, [sp, #(64 +16)]
+	vadd.i32 q10, q10, q14
+	add r3, r3, r11
+	veor q14, q14, q14
+	ldr r9, [sp, #(64 +20)]
+	mov r11, #1
+	add r4, r4, r8
+	vmov.32 d28[0], r11
+	ldr r10, [sp, #(64 +24)]
+	vadd.u64 q12, q14, q15
+	add r5, r5, r9
+	vadd.u64 q13, q14, q12
+	ldr r11, [sp, #(64 +28)]
+	vadd.u64 q14, q14, q13
+	add r6, r6, r10
+	vadd.i32 q3, q3, q12
+	tst r12, r12
+	vadd.i32 q7, q7, q13
+	add r7, r7, r11
+	vadd.i32 q11, q11, q14
+	beq .Lchacha_blocks_neon_nomessage11
+	ldmia r12!, {r8-r11}
+	eor r0, r0, r8
+	eor r1, r1, r9
+	eor r2, r2, r10
+	ldr r8, [r12, #0]
+	eor r3, r3, r11
+	ldr r9, [r12, #4]
+	eor r4, r4, r8
+	ldr r10, [r12, #8]
+	eor r5, r5, r9
+	ldr r11, [r12, #12]
+	eor r6, r6, r10
+	add r12, r12, #16
+	eor r7, r7, r11
+.Lchacha_blocks_neon_nomessage11:
+	stmia r14!, {r0-r7}
+	ldm sp, {r0-r7}
+	ldr r8, [sp, #(64 +32)]
+	ldr r9, [sp, #(64 +36)]
+	ldr r10, [sp, #(64 +40)]
+	ldr r11, [sp, #(64 +44)]
+	add r0, r0, r8
+	add r1, r1, r9
+	add r2, r2, r10
+	ldr r8, [sp, #(64 +48)]
+	add r3, r3, r11
+	ldr r9, [sp, #(64 +52)]
+	add r4, r4, r8
+	ldr r10, [sp, #(64 +56)]
+	add r5, r5, r9
+	ldr r11, [sp, #(64 +60)]
+	add r6, r6, r10
+	adds r8, r8, #4
+	add r7, r7, r11
+	adc r9, r9, #0
+	str r8, [sp, #(64 +48)]
+	tst r12, r12
+	str r9, [sp, #(64 +52)]
+	beq .Lchacha_blocks_neon_nomessage12
+	ldmia r12!, {r8-r11}
+	eor r0, r0, r8
+	eor r1, r1, r9
+	eor r2, r2, r10
+	ldr r8, [r12, #0]
+	eor r3, r3, r11
+	ldr r9, [r12, #4]
+	eor r4, r4, r8
+	ldr r10, [r12, #8]
+	eor r5, r5, r9
+	ldr r11, [r12, #12]
+	eor r6, r6, r10
+	add r12, r12, #16
+	eor r7, r7, r11
+.Lchacha_blocks_neon_nomessage12:
+	stmia r14!, {r0-r7}
+	beq .Lchacha_blocks_neon_nomessage13
+	vld1.32 {q12,q13}, [r12]!
+	vld1.32 {q14,q15}, [r12]!
+	veor q0, q0, q12
+	veor q1, q1, q13
+	veor q2, q2, q14
+	veor q3, q3, q15
+.Lchacha_blocks_neon_nomessage13:
+	vst1.32 {q0,q1}, [r14]!
+	vst1.32 {q2,q3}, [r14]!
+	beq .Lchacha_blocks_neon_nomessage14
+	vld1.32 {q12,q13}, [r12]!
+	vld1.32 {q14,q15}, [r12]!
+	veor q4, q4, q12
+	veor q5, q5, q13
+	veor q6, q6, q14
+	veor q7, q7, q15
+.Lchacha_blocks_neon_nomessage14:
+	vst1.32 {q4,q5}, [r14]!
+	vst1.32 {q6,q7}, [r14]!
+	beq .Lchacha_blocks_neon_nomessage15
+	vld1.32 {q12,q13}, [r12]!
+	vld1.32 {q14,q15}, [r12]!
+	veor q8, q8, q12
+	veor q9, q9, q13
+	veor q10, q10, q14
+	veor q11, q11, q15
+.Lchacha_blocks_neon_nomessage15:
+	vst1.32 {q8,q9}, [r14]!
+	vst1.32 {q10,q11}, [r14]!
+	str r12, [sp, #48]
+	str r14, [sp, #40]
+	ldr r3, [sp, #52]
+	sub r3, r3, #256
+	cmp r3, #256
+	str r3, [sp, #52]
+	bhs .Lchacha_blocks_neon_mainloop1
+	tst r3, r3
+	beq .Lchacha_blocks_neon_done
+.Lchacha_blocks_neon_mainloop2:
+	ldr r3, [sp, #52]
+	ldr r1, [sp, #48]
+	cmp r3, #64
+	bhs .Lchacha_blocks_neon_noswap1
+	add r4, sp, #128
+	mov r5, r4
+	tst r1, r1
+	beq .Lchacha_blocks_neon_nocopy1
+.Lchacha_blocks_neon_copyinput1:
+	subs r3, r3, #1
+	ldrb r0, [r1], #1
+	strb r0, [r4], #1
+	bne .Lchacha_blocks_neon_copyinput1
+	str r5, [sp, #48]
+.Lchacha_blocks_neon_nocopy1:
+	ldr r4, [sp, #40]
+	str r5, [sp, #40]
+	str r4, [sp, #56]
+.Lchacha_blocks_neon_noswap1:
+	ldr r0, [sp, #44]
+	str r0, [sp, #0]
+	add r0, sp, #64
+	ldm r0, {r0-r12}
+	ldr r14, [sp, #(64 +60)]
+	str r6, [sp, #8]
+	str r11, [sp, #12]
+	str r14, [sp, #28]
+	ldr r11, [sp, #(64 +52)]
+	ldr r14, [sp, #(64 +56)]
+.Lchacha_blocks_neon_rounds2:
+	ldr r6, [sp, #0]
+	add r0, r0, r4
+	add r1, r1, r5
+	eor r12, r12, r0
+	eor r11, r11, r1
+	ror r12, r12, #16
+	ror r11, r11, #16
+	subs r6, r6, #2
+	add r8, r8, r12
+	add r9, r9, r11
+	eor r4, r4, r8
+	eor r5, r5, r9
+	str r6, [sp, #0]
+	ror r4, r4, #20
+	ror r5, r5, #20
+	add r0, r0, r4
+	add r1, r1, r5
+	ldr r6, [sp, #8]
+	eor r12, r12, r0
+	eor r11, r11, r1
+	ror r12, r12, #24
+	ror r11, r11, #24
+	add r8, r8, r12
+	add r9, r9, r11
+	eor r4, r4, r8
+	eor r5, r5, r9
+	str r11, [sp, #20]
+	ror r4, r4, #25
+	ror r5, r5, #25
+	str r4, [sp, #4]
+	ldr r4, [sp, #28]
+	add r2, r2, r6
+	add r3, r3, r7
+	ldr r11, [sp, #12]
+	eor r14, r14, r2
+	eor r4, r4, r3
+	ror r14, r14, #16
+	ror r4, r4, #16
+	add r10, r10, r14
+	add r11, r11, r4
+	eor r6, r6, r10
+	eor r7, r7, r11
+	ror r6, r6, #20
+	ror r7, r7, #20
+	add r2, r2, r6
+	add r3, r3, r7
+	eor r14, r14, r2
+	eor r4, r4, r3
+	ror r14, r14, #24
+	ror r4, r4, #24
+	add r10, r10, r14
+	add r11, r11, r4
+	eor r6, r6, r10
+	eor r7, r7, r11
+	ror r6, r6, #25
+	ror r7, r7, #25
+	add r0, r0, r5
+	add r1, r1, r6
+	eor r4, r4, r0
+	eor r12, r12, r1
+	ror r4, r4, #16
+	ror r12, r12, #16
+	add r10, r10, r4
+	add r11, r11, r12
+	eor r5, r5, r10
+	eor r6, r6, r11
+	ror r5, r5, #20
+	ror r6, r6, #20
+	add r0, r0, r5
+	add r1, r1, r6
+	eor r4, r4, r0
+	eor r12, r12, r1
+	ror r4, r4, #24
+	ror r12, r12, #24
+	add r10, r10, r4
+	add r11, r11, r12
+	eor r5, r5, r10
+	eor r6, r6, r11
+	str r11, [sp, #12]
+	ror r5, r5, #25
+	ror r6, r6, #25
+	str r4, [sp, #28]
+	ldr r4, [sp, #4]
+	add r2, r2, r7
+	add r3, r3, r4
+	ldr r11, [sp, #20]
+	eor r11, r11, r2
+	eor r14, r14, r3
+	ror r11, r11, #16
+	ror r14, r14, #16
+	add r8, r8, r11
+	add r9, r9, r14
+	eor r7, r7, r8
+	eor r4, r4, r9
+	ror r7, r7, #20
+	ror r4, r4, #20
+	str r6, [sp, #8]
+	add r2, r2, r7
+	add r3, r3, r4
+	eor r11, r11, r2
+	eor r14, r14, r3
+	ror r11, r11, #24
+	ror r14, r14, #24
+	add r8, r8, r11
+	add r9, r9, r14
+	eor r7, r7, r8
+	eor r4, r4, r9
+	ror r7, r7, #25
+	ror r4, r4, #25
+	bne .Lchacha_blocks_neon_rounds2
+	str r8, [sp, #0]
+	str r9, [sp, #4]
+	str r10, [sp, #8]
+	str r12, [sp, #16]
+	str r11, [sp, #20]
+	str r14, [sp, #24]
+	ldr r12, [sp, #48]
+	ldr r14, [sp, #40]
+	ldr r8, [sp, #(64 +0)]
+	ldr r9, [sp, #(64 +4)]
+	ldr r10, [sp, #(64 +8)]
+	ldr r11, [sp, #(64 +12)]
+	add r0, r0, r8
+	add r1, r1, r9
+	add r2, r2, r10
+	ldr r8, [sp, #(64 +16)]
+	add r3, r3, r11
+	ldr r9, [sp, #(64 +20)]
+	add r4, r4, r8
+	ldr r10, [sp, #(64 +24)]
+	add r5, r5, r9
+	ldr r11, [sp, #(64 +28)]
+	add r6, r6, r10
+	tst r12, r12
+	add r7, r7, r11
+	beq .Lchacha_blocks_neon_nomessage21
+	ldmia r12!, {r8-r11}
+	eor r0, r0, r8
+	eor r1, r1, r9
+	eor r2, r2, r10
+	ldr r8, [r12, #0]
+	eor r3, r3, r11
+	ldr r9, [r12, #4]
+	eor r4, r4, r8
+	ldr r10, [r12, #8]
+	eor r5, r5, r9
+	ldr r11, [r12, #12]
+	eor r6, r6, r10
+	add r12, r12, #16
+	eor r7, r7, r11
+.Lchacha_blocks_neon_nomessage21:
+	stmia r14!, {r0-r7}
+	ldm sp, {r0-r7}
+	ldr r8, [sp, #(64 +32)]
+	ldr r9, [sp, #(64 +36)]
+	ldr r10, [sp, #(64 +40)]
+	ldr r11, [sp, #(64 +44)]
+	add r0, r0, r8
+	add r1, r1, r9
+	add r2, r2, r10
+	ldr r8, [sp, #(64 +48)]
+	add r3, r3, r11
+	ldr r9, [sp, #(64 +52)]
+	add r4, r4, r8
+	ldr r10, [sp, #(64 +56)]
+	add r5, r5, r9
+	ldr r11, [sp, #(64 +60)]
+	add r6, r6, r10
+	adds r8, r8, #1
+	add r7, r7, r11
+	adc r9, r9, #0
+	str r8, [sp, #(64 +48)]
+	tst r12, r12
+	str r9, [sp, #(64 +52)]
+	beq .Lchacha_blocks_neon_nomessage22
+	ldmia r12!, {r8-r11}
+	eor r0, r0, r8
+	eor r1, r1, r9
+	eor r2, r2, r10
+	ldr r8, [r12, #0]
+	eor r3, r3, r11
+	ldr r9, [r12, #4]
+	eor r4, r4, r8
+	ldr r10, [r12, #8]
+	eor r5, r5, r9
+	ldr r11, [r12, #12]
+	eor r6, r6, r10
+	add r12, r12, #16
+	eor r7, r7, r11
+.Lchacha_blocks_neon_nomessage22:
+	stmia r14!, {r0-r7}
+	str r12, [sp, #48]
+	str r14, [sp, #40]
+	ldr r3, [sp, #52]
+	cmp r3, #64
+	sub r4, r3, #64
+	str r4, [sp, #52]
+	bhi .Lchacha_blocks_neon_mainloop2
+	cmp r3, #64
+	beq .Lchacha_blocks_neon_nocopy2
+	ldr r1, [sp, #56]
+	sub r14, r14, #64
+.Lchacha_blocks_neon_copyinput2:
+	subs r3, r3, #1
+	ldrb r0, [r14], #1
+	strb r0, [r1], #1
+	bne .Lchacha_blocks_neon_copyinput2
+.Lchacha_blocks_neon_nocopy2:
+.Lchacha_blocks_neon_done:
+	ldr r7, [sp, #60]
+	ldr r8, [sp, #(64 +48)]
+	ldr r9, [sp, #(64 +52)]
+	str r8, [r7, #(48 + 0)]
+	str r9, [r7, #(48 + 4)]
+	mov r12, sp
+	stmia r12!, {r0-r7}
+	add r12, r12, #48
+	stmia r12!, {r0-r7}
+	sub r0, sp, #8
+	ldr sp, [sp, #192]
+	ldmfd sp!, {r4-r12, r14}
+	vldm sp!, {q4-q7}
+	sub r0, sp, r0
+	bx lr
+.Lchacha_blocks_neon_nobytes:
+	mov r0, #0;
+	bx lr
+.ltorg
+.size _gcry_chacha20_armv7_neon_blocks,.-_gcry_chacha20_armv7_neon_blocks;
+
+#endif
diff --git a/cipher/chacha20.c b/cipher/chacha20.c
index ebba2fc..c1847aa 100644
--- a/cipher/chacha20.c
+++ b/cipher/chacha20.c
 <at>  <at>  -67,6 +67,16  <at>  <at> 
 # define USE_AVX2 1
 #endif

+/* USE_NEON indicates whether to enable ARM NEON assembly code. */
+#undef USE_NEON
+#ifdef ENABLE_NEON_SUPPORT
+# if defined(HAVE_ARM_ARCH_V6) && defined(__ARMEL__) \
+     && defined(HAVE_COMPATIBLE_GCC_ARM_PLATFORM_AS) \
+     && defined(HAVE_GCC_INLINE_ASM_NEON)
+#  define USE_NEON 1
+# endif
+#endif /*ENABLE_NEON_SUPPORT*/
+

 struct CHACHA20_context_s;

 <at>  <at>  -104,6 +114,13  <at>  <at>  unsigned int _gcry_chacha20_amd64_avx2_blocks(u32 *state, const byte *in,

 #endif /* USE_AVX2 */

+#ifdef USE_NEON
+
+unsigned int _gcry_chacha20_armv7_neon_blocks(u32 *state, const byte *in,
+                                              byte *out, size_t bytes);
+
+#endif /* USE_NEON */
+

 static void chacha20_setiv (void *context, const byte * iv, size_t ivlen);
 static const char *selftest (void);
 <at>  <at>  -353,6 +370,10  <at>  <at>  chacha20_do_setkey (CHACHA20_context_t * ctx,
   if (features & HWF_INTEL_AVX2)
     ctx->blocks = _gcry_chacha20_amd64_avx2_blocks;
 #endif
+#ifdef USE_NEON
+  if (features & HWF_ARM_NEON)
+    ctx->blocks = _gcry_chacha20_armv7_neon_blocks;
+#endif

   (void)features;

 <at>  <at>  -541,6 +562,19  <at>  <at>  selftest (void)
     if (buf[i] != (byte) i)
       return "ChaCha20 encryption test 2 failed.";

+  chacha20_setkey (&ctx, key_1, sizeof key_1);
+  chacha20_setiv (&ctx, nonce_1, sizeof nonce_1);
+  /* encrypt */
+  for (i = 0; i < sizeof buf; i++)
+    chacha20_encrypt_stream (&ctx, &buf[i], &buf[i], 1);
+  /* decrypt */
+  chacha20_setkey (&ctx, key_1, sizeof key_1);
+  chacha20_setiv (&ctx, nonce_1, sizeof nonce_1);
+  chacha20_encrypt_stream (&ctx, buf, buf, sizeof buf);
+  for (i = 0; i < sizeof buf; i++)
+    if (buf[i] != (byte) i)
+      return "ChaCha20 encryption test 3 failed.";
+
   return NULL;
 }

diff --git a/configure.ac b/configure.ac
index d14b7f6..60ed015 100644
--- a/configure.ac
+++ b/configure.ac
 <at>  <at>  -1822,6 +1822,11  <at>  <at>  if test "$found" = "1" ; then
          GCRYPT_CIPHERS="$GCRYPT_CIPHERS chacha20-avx2-amd64.lo"
       ;;
    esac
+
+   if test x"$neonsupport" = xyes ; then
+     # Build with the NEON implementation
+     GCRYPT_CIPHERS="$GCRYPT_CIPHERS chacha20-armv7-neon.lo"
+   fi
 fi

 case "${host}" in
And Sch | 13 Oct 16:47 2014

comparison between signed and unsigned integer

I recently added '-Wextra' to my compile flags and I get many of the following
warnings when compiling libgcrypt.

warning: comparison between signed and unsigned integer expressions [-Wsign-compare]

I have looked through them all and most of them are comparing a signed counter
with size_t or unsigned int, which should be benign. However, researching the
warning there are certain nasty bugs that appear if the signed int is ever
negative...

http://www.jwwalker.com/pages/safe-compare.html

https://www.securecoding.cert.org/confluence/display/cplusplus/INT02-CPP.+Understand+integer+conversion+rules

Now, Werner Koch said in the bug tracker that fixing this may introduce bugs, and I would agree. It probably
wouldn't be worthwhile because there are no obvious bugs ATM.

However, here is my second proposal, why not add a call to assert() before the comparison to make sure the
signed int is not negative. This shouldn't introduce any bugs AFAIK, and can be turned off globally.

____________________________________________________________
FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your desktop!
Check it out at http://www.inbox.com/marineaquarium
by Werner Koch | 9 Oct 08:31 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-122-g669a83b

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  669a83ba86c38b271d85ed4bf1cabc7cc8160583 (commit)
      from  23ecadf309f8056c35cc092e58df801ac0eab862 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 669a83ba86c38b271d85ed4bf1cabc7cc8160583
Author: Werner Koch <wk <at> gnupg.org>
Date:   Thu Oct 9 08:31:35 2014 +0200

    Register DCO for Markus Teich

    --

diff --git a/AUTHORS b/AUTHORS
index f72a421..e186a48 100644
--- a/AUTHORS
+++ b/AUTHORS
 <at>  <at>  -157,6 +157,9  <at>  <at>  Jussi Kivilinna <jussi.kivilinna <at> mbnet.fi>
 Jussi Kivilinna <jussi.kivilinna <at> iki.fi>
 2013-05-06:5186720A.4090101 <at> iki.fi:

+Markus Teich <markus dot teich at stusta dot mhn dot de>
+2014-10-08:20141008180509.GA2770 <at> trolle:
+
 Milan Broz <gmazyland <at> gmail.com>
 2014-01-13:52D44CC6.4050707 <at> gmail.com:

-----------------------------------------------------------------------

Summary of changes:
 AUTHORS |    3 +++
 1 file changed, 3 insertions(+)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits
Vitezslav Cizek | 8 Oct 14:40 2014
Picon

FIPS 186-4 compliance patches for rsa/dsa/ecdsa

Hi,
The libgcrypt code isn't compliant with the latest FIPS 186-4.
There are some changes necessary, especially in the key generation code.

I've created issue 1736.
(https://bugs.g10code.com/gnupg/issue1736)

Patches are attached there.
Can someone please review them?

--

-- 
Vita Cizek
_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
Werner Koch | 8 Oct 15:01 2014
Picon

Re: [PATCH revised] Add gcry_mpi_ec_sub.

On Tue,  7 Oct 2014 18:41, teichm <at> in.tum.de said:

> And now revised with the „signed of“ line. Sorry for the delay, but contributing
> to libgcrypt seems to be very time consuming… :(

As is the maintaining ...

Pushed.  Thanks.

Please send a DCO to this list (see doc/HACKING).

Shalom-Salam,

   Werner

--

-- 
Die Gedanken sind frei.  Ausnahmen regelt ein Bundesgesetz.

_______________________________________________
Gcrypt-devel mailing list
Gcrypt-devel <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gcrypt-devel
by Markus Teich | 8 Oct 15:01 2014
Picon

[git] GCRYPT - branch, master, updated. libgcrypt-1.6.0-121-g23ecadf

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "The GNU crypto library".

The branch, master has been updated
       via  23ecadf309f8056c35cc092e58df801ac0eab862 (commit)
      from  a078436be5b656e4a2acfaeb5f054b9991f617e5 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 23ecadf309f8056c35cc092e58df801ac0eab862
Author: Markus Teich <markus.teich <at> stusta.mhn.de>
Date:   Tue Oct 7 18:24:27 2014 +0200

    mpi: Add gcry_mpi_ec_sub.

    * NEWS (gcry_mpi_ec_sub): New.
    * doc/gcrypt.texi (gcry_mpi_ec_sub): New.
    * mpi/ec.c (_gcry_mpi_ec_sub, sub_points_edwards): New.
    (sub_points_montgomery, sub_points_weierstrass): New stubs.
    * src/gcrypt-int.h (_gcry_mpi_ec_sub): New.
    * src/gcrypt.h.in (gcry_mpi_ec_sub): New.
    * src/libgcrypt.def (gcry_mpi_ec_sub): New.
    * src/libgcrypt.vers (gcry_mpi_ec_sub): New.
    * src/mpi.h (_gcry_mpi_ec_sub_points): New.
    * src/visibility.c (gcry_mpi_ec_sub): New.
    * src/visibility.h (gcry_mpi_ec_sub): New.
    --

    This function subtracts two points on the curve. Only Twisted Edwards
    curves are supported with this change.

    Signed-off-by: Markus Teich <markus dot teich at stusta dot mhn dot de>

diff --git a/NEWS b/NEWS
index 214c676..0150fdd 100644
--- a/NEWS
+++ b/NEWS
 <at>  <at>  -29,6 +29,7  <at>  <at>  Noteworthy changes in version 1.7.0 (unreleased)
  GCRYCTL_SET_SBOX                NEW.
  gcry_cipher_set_sbox            NEW macro.
  GCRY_MD_GOSTR3411_CP            NEW.
+ gcry_mpi_ec_sub                 NEW.

 
 Noteworthy changes in version 1.6.0 (2013-12-16)
diff --git a/doc/gcrypt.texi b/doc/gcrypt.texi
index 63edf06..108d53a 100644
--- a/doc/gcrypt.texi
+++ b/doc/gcrypt.texi
 <at>  <at>  -4806,6 +4806,15  <at>  <at>  Add the points  <at> var{u} and  <at> var{v} of the elliptic curve described by
  <at> var{ctx} and store the result into  <at> var{w}.
  <at> end deftypefun

+ <at> deftypefun void gcry_mpi_ec_sub (  <at> 
+  <at> w{gcry_mpi_point_t  <at> var{w}},  <at> w{gcry_mpi_point_t  <at> var{u}},  <at> 
+  <at> w{gcry_mpi_point_t  <at> var{v}},  <at> w{gcry_ctx_t  <at> var{ctx}})
+
+Subtracts the point  <at> var{v} from the point  <at> var{u} of the elliptic
+curve described by  <at> var{ctx} and store the result into  <at> var{w}. Only
+Twisted Edwards curves are supported for now.
+ <at> end deftypefun
+
  <at> deftypefun void gcry_mpi_ec_mul (  <at> 
   <at> w{gcry_mpi_point_t  <at> var{w}},  <at> w{gcry_mpi_t  <at> var{n}},  <at> 
   <at> w{gcry_mpi_point_t  <at> var{u}},  <at> w{gcry_ctx_t  <at> var{ctx}})
diff --git a/mpi/ec.c b/mpi/ec.c
index a55291a..80f3b22 100644
--- a/mpi/ec.c
+++ b/mpi/ec.c
 <at>  <at>  -1131,6 +1131,71  <at>  <at>  _gcry_mpi_ec_add_points (mpi_point_t result,
 }

 
+/* RESULT = P1 - P2  (Weierstrass version).*/
+static void
+sub_points_weierstrass (mpi_point_t result,
+                        mpi_point_t p1, mpi_point_t p2,
+                        mpi_ec_t ctx)
+{
+  (void)result;
+  (void)p1;
+  (void)p2;
+  (void)ctx;
+  log_fatal ("%s: %s not yet supported\n",
+             "_gcry_mpi_ec_sub_points", "Weierstrass");
+}
+
+
+/* RESULT = P1 - P2  (Montgomery version).*/
+static void
+sub_points_montgomery (mpi_point_t result,
+                       mpi_point_t p1, mpi_point_t p2,
+                       mpi_ec_t ctx)
+{
+  (void)result;
+  (void)p1;
+  (void)p2;
+  (void)ctx;
+  log_fatal ("%s: %s not yet supported\n",
+             "_gcry_mpi_ec_sub_points", "Montgomery");
+}
+
+
+/* RESULT = P1 - P2  (Twisted Edwards version).*/
+static void
+sub_points_edwards (mpi_point_t result,
+                    mpi_point_t p1, mpi_point_t p2,
+                    mpi_ec_t ctx)
+{
+  mpi_point_t p2i = _gcry_mpi_point_new (0);
+  point_set (p2i, p2);
+  _gcry_mpi_neg (p2i->x, p2i->x);
+  add_points_edwards (result, p1, p2i, ctx);
+  _gcry_mpi_point_release (p2i);
+}
+
+
+/* RESULT = P1 - P2 */
+void
+_gcry_mpi_ec_sub_points (mpi_point_t result,
+                         mpi_point_t p1, mpi_point_t p2,
+                         mpi_ec_t ctx)
+{
+  switch (ctx->model)
+    {
+    case MPI_EC_WEIERSTRASS:
+      sub_points_weierstrass (result, p1, p2, ctx);
+      break;
+    case MPI_EC_MONTGOMERY:
+      sub_points_montgomery (result, p1, p2, ctx);
+      break;
+    case MPI_EC_EDWARDS:
+      sub_points_edwards (result, p1, p2, ctx);
+      break;
+    }
+}
+
+
 /* Scalar point multiplication - the main function for ECC.  If takes
    an integer SCALAR and a POINT as well as the usual context CTX.
    RESULT will be set to the resulting point. */
diff --git a/src/gcrypt-int.h b/src/gcrypt-int.h
index 8a6df84..918937b 100644
--- a/src/gcrypt-int.h
+++ b/src/gcrypt-int.h
 <at>  <at>  -430,6 +430,8  <at>  <at>  int _gcry_mpi_ec_get_affine (gcry_mpi_t x, gcry_mpi_t y, gcry_mpi_point_t point,
 void _gcry_mpi_ec_dup (gcry_mpi_point_t w, gcry_mpi_point_t u, gcry_ctx_t ctx);
 void _gcry_mpi_ec_add (gcry_mpi_point_t w,
                        gcry_mpi_point_t u, gcry_mpi_point_t v, mpi_ec_t ctx);
+void _gcry_mpi_ec_sub (gcry_mpi_point_t w,
+                       gcry_mpi_point_t u, gcry_mpi_point_t v, mpi_ec_t ctx);
 void _gcry_mpi_ec_mul (gcry_mpi_point_t w, gcry_mpi_t n, gcry_mpi_point_t u,
                        mpi_ec_t ctx);
 int _gcry_mpi_ec_curve_point (gcry_mpi_point_t w, mpi_ec_t ctx);
diff --git a/src/gcrypt.h.in b/src/gcrypt.h.in
index 65d9ef6..f3207c9 100644
--- a/src/gcrypt.h.in
+++ b/src/gcrypt.h.in
 <at>  <at>  -704,6 +704,10  <at>  <at>  void gcry_mpi_ec_dup (gcry_mpi_point_t w, gcry_mpi_point_t u, gcry_ctx_t ctx);
 void gcry_mpi_ec_add (gcry_mpi_point_t w,
                       gcry_mpi_point_t u, gcry_mpi_point_t v, gcry_ctx_t ctx);

+/* W = U - V.  */
+void gcry_mpi_ec_sub (gcry_mpi_point_t w,
+                      gcry_mpi_point_t u, gcry_mpi_point_t v, gcry_ctx_t ctx);
+
 /* W = N * U.  */
 void gcry_mpi_ec_mul (gcry_mpi_point_t w, gcry_mpi_t n, gcry_mpi_point_t u,
                       gcry_ctx_t ctx);
diff --git a/src/libgcrypt.def b/src/libgcrypt.def
index 57ed490..924f17f 100644
--- a/src/libgcrypt.def
+++ b/src/libgcrypt.def
 <at>  <at>  -276,5 +276,7  <at>  <at>  EXPORTS
       gcry_mac_ctl               <at> 242
       gcry_mac_get_algo          <at> 243

+      gcry_mpi_ec_sub            <at> 244
+

 ;; end of file with public symbols for Windows.
diff --git a/src/libgcrypt.vers b/src/libgcrypt.vers
index 7ee0541..7e8df3f 100644
--- a/src/libgcrypt.vers
+++ b/src/libgcrypt.vers
 <at>  <at>  -105,7 +105,7  <at>  <at>  GCRYPT_1.6 {
     gcry_mpi_ec_get_mpi; gcry_mpi_ec_get_point;
     gcry_mpi_ec_set_mpi; gcry_mpi_ec_set_point;
     gcry_mpi_ec_get_affine;
-    gcry_mpi_ec_dup; gcry_mpi_ec_add; gcry_mpi_ec_mul;
+    gcry_mpi_ec_dup; gcry_mpi_ec_add; gcry_mpi_ec_sub; gcry_mpi_ec_mul;
     gcry_mpi_ec_curve_point;

     gcry_log_debug;
diff --git a/src/mpi.h b/src/mpi.h
index 7407b7f..13b5117 100644
--- a/src/mpi.h
+++ b/src/mpi.h
 <at>  <at>  -286,6 +286,9  <at>  <at>  void _gcry_mpi_ec_dup_point (mpi_point_t result,
 void _gcry_mpi_ec_add_points (mpi_point_t result,
                               mpi_point_t p1, mpi_point_t p2,
                               mpi_ec_t ctx);
+void _gcry_mpi_ec_sub_points (mpi_point_t result,
+                              mpi_point_t p1, mpi_point_t p2,
+                              mpi_ec_t ctx);
 void _gcry_mpi_ec_mul_point (mpi_point_t result,
                              gcry_mpi_t scalar, mpi_point_t point,
                              mpi_ec_t ctx);
diff --git a/src/visibility.c b/src/visibility.c
index 6ed57ca..fa23e53 100644
--- a/src/visibility.c
+++ b/src/visibility.c
 <at>  <at>  -567,6 +567,14  <at>  <at>  gcry_mpi_ec_add (gcry_mpi_point_t w,
 }

 void
+gcry_mpi_ec_sub (gcry_mpi_point_t w,
+                 gcry_mpi_point_t u, gcry_mpi_point_t v, gcry_ctx_t ctx)
+{
+  _gcry_mpi_ec_sub_points (w, u, v,
+                           _gcry_ctx_get_pointer (ctx, CONTEXT_TYPE_EC));
+}
+
+void
 gcry_mpi_ec_mul (gcry_mpi_point_t w, gcry_mpi_t n, gcry_mpi_point_t u,
                  gcry_ctx_t ctx)
 {
diff --git a/src/visibility.h b/src/visibility.h
index 96b5235..fa3c763 100644
--- a/src/visibility.h
+++ b/src/visibility.h
 <at>  <at>  -218,6 +218,7  <at>  <at>  MARK_VISIBLEX (gcry_mpi_copy)
 MARK_VISIBLEX (gcry_mpi_div)
 MARK_VISIBLEX (gcry_mpi_dump)
 MARK_VISIBLEX (gcry_mpi_ec_add)
+MARK_VISIBLEX (gcry_mpi_ec_sub)
 MARK_VISIBLEX (gcry_mpi_ec_curve_point)
 MARK_VISIBLEX (gcry_mpi_ec_dup)
 MARK_VISIBLEX (gcry_mpi_ec_get_affine)
 <at>  <at>  -486,6 +487,7  <at>  <at>  MARK_VISIBLEX (_gcry_mpi_get_const)

 #define gcry_mpi_abs                _gcry_USE_THE_UNDERSCORED_FUNCTION
 #define gcry_mpi_ec_add             _gcry_USE_THE_UNDERSCORED_FUNCTION
+#define gcry_mpi_ec_sub             _gcry_USE_THE_UNDERSCORED_FUNCTION
 #define gcry_mpi_ec_curve_point     _gcry_USE_THE_UNDERSCORED_FUNCTION
 #define gcry_mpi_ec_dup             _gcry_USE_THE_UNDERSCORED_FUNCTION
 #define gcry_mpi_ec_get_affine      _gcry_USE_THE_UNDERSCORED_FUNCTION

-----------------------------------------------------------------------

Summary of changes:
 NEWS               |    1 +
 doc/gcrypt.texi    |    9 ++++++++
 mpi/ec.c           |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/gcrypt-int.h   |    2 ++
 src/gcrypt.h.in    |    4 ++++
 src/libgcrypt.def  |    2 ++
 src/libgcrypt.vers |    2 +-
 src/mpi.h          |    3 +++
 src/visibility.c   |    8 +++++++
 src/visibility.h   |    2 ++
 10 files changed, 97 insertions(+), 1 deletion(-)

hooks/post-receive
--

-- 
The GNU crypto library
http://git.gnupg.org

_______________________________________________
Gnupg-commits mailing list
Gnupg-commits <at> gnupg.org
http://lists.gnupg.org/mailman/listinfo/gnupg-commits

Gmane