Alex Sicamiotis | 1 Feb 03:06
Picon
Favicon

RE: DES with OpenMP


> Maybe the code should assume that if there are 4 threads or less, that's
> probably just one CPU chip - and use DES_bs_cpt=4 or 8 in that case.
> This assumption will fail if the number of threads is deliberately
> lowered to use only some cores in a multi-socket system, though.  And it
> will fail differently for bigger than quad-core CPU chips.  Not great.
> 

Hmm... yeah I can see the dilemma here of a parameter that can't be tweaked to suit all systems. 

> I suspect that it's nothing fundamental, but merely icc happening to do
> register allocation or whatever better in one version of code vs. the
> other.  It might be the other way around in a slightly different build.
> 
> These differences of a few percent are hard/unrealistic to turn in our
> favor reliably without explicit assembly code and focus on a specific CPU.
> 
> Alexander

I managed to use icc in order to build various versions... I did a run with 1-4-8-16-32-64-128-256 values
(values of 32+ slow down the cracking significantly so I'm not including 64+ and I only include 32 because
it's the default value). 

I did two compilation series, one with -march=core2 and one without in order to examine how much is down to
cpu-specific code. The core2 variant was about 16kb larger than the generic build and had similar if not
slower performance. I tested with nice --10 and -test=20 on a shell / no desktop running - no apps running -
most services shut down @ 4 GHz.

What happened with gcc declining performance as the value went up, repeated itself with icc. I reached
9.36m c/s with des_bs_cpt values of 1 and 4... This is a gain of ~400k c/s with many salts. Interestingly,
(Continue reading)

Solar Designer | 2 Feb 03:19
Favicon

Re: DES with OpenMP

On Wed, Feb 01, 2012 at 02:06:00AM +0000, Alex Sicamiotis wrote:
> I managed to use icc in order to build various versions... I did a run with 1-4-8-16-32-64-128-256 values
(values of 32+ slow down the cracking significantly so I'm not including 64+ and I only include 32 because
it's the default value).
...

Thank you for running these tests and providing the results.  What about
the effect this has on LM hash speed?

BTW, you could similarly experiment with MD5_std_cpt (MD5_std.h) and
BF_cpt (BF_std.h).  These should make a lot less of a difference (and
for their respective hash types only), though.

Alexander

Solar Designer | 2 Feb 03:56
Favicon

Re: OpenCL support on OSX

On Tue, Jan 31, 2012 at 12:26:27PM +0100, Tevesz Andr?s wrote:
> I have some osx specific fixes to the john-1.7.9-jumbo-5-opencl-5.diff 

Thank you!

For others on john-users: this patch was discussed on john-dev and
relevant changes will be made in next -jumbo.  Start of thread:

http://www.openwall.com/lists/john-dev/2012/01/31/1

Alexander

Alex Sicamiotis | 2 Feb 16:21
Picon
Favicon

RE: DES with OpenMP


> 
> Thank you for running these tests and providing the results.  What about
> the effect this has on LM hash speed?
> 

Hmm.. did not know that LM was affected by DES parameters but apparently they are :D 

Again, moving down from 32 to lower values brought significant gains - especially in 2 threads. LM seems to
be "settled" at a value of 8. While for plain DES the ideal value is 1, still with a value of 8 there's not much
performance impact for it while the LM benefits enormously. 8 seems to be the perfect balance (for my
hardware and across both GCC and ICC) and you might consider it for the next john release after testing with
other hardware as well. 

I've done 

* a run with GCC 4.6.2 with values of 1-4-8-16-32 at plain DES (1 & 2 threads)
* a run with GCC 4.6.2 with values of 1-4-8-16-32 at LM (1 & 2 threads)
* a run with ICC 12.1 with values of 1-4-8-16-32 at LM (1 & threads)

Results are here: http://imageshack.us/f/710/resultsarein1.png/

I've highlighted the gains below from the faster non-32 variant to the default 32 value. 32 seems to be the
tipping point of losing large chunks of performance. 

> BTW, you could similarly experiment with MD5_std_cpt (MD5_std.h) and
> BF_cpt (BF_std.h).  These should make a lot less of a difference (and
> for their respective hash types only), though.
> 
> Alexander
(Continue reading)

Alex Sicamiotis | 2 Feb 17:48
Picon
Favicon

RE: DES with OpenMP


> > BTW, you could similarly experiment with MD5_std_cpt (MD5_std.h) and
> > BF_cpt (BF_std.h).  These should make a lot less of a difference (and
> > for their respective hash types only), though.
> > 

Ok, just did this (GCC 4.6.2) with 2 threads on the 1.7.9 (no-jumbo).

BF default value is 3. 

I used 1-2-3-4-5-6-8-16-32 for bf_cpt. Performance was the same throughout 

2180 c/s for 1, 
2180 c/s for 3, 
2180 c/s for 32. 

So no gains in BF. Apparently it is too slow to crack to be affected from this.

MD5 default value was 128. I used 1-4-8-16-32-64-128-256-512. 

Performance for 1,4,8,16,32 and 64 was ranging between 34.520 to 34.530 c/s.
Performance for 128 (def) was 34.456 (0.2% degradation compared to 64-32-16-8-4-1).
Performance for 256 was 34.365, more degradation.
Performance for 512 was 34.331, more degradation. 

In general not much to improve here unlike DES and LM, but still if anyone wants to find 0.2% with core2
hardware they might try lowering the value to 64. 
 		 	   		  
madfran | 2 Feb 18:10

Re: Segmentation fault in john-1.7.9-jumbo-5 under some conditions

Hi,

> On 1/21/12, Solar Designer <solar@...> wrote:
> > All - can someone else with a Windows system handy try to reproduce
> > this, please?
>
> Same problem:
>

No solution for the problem of segmentation fault when traying to  
restore a session initiate with the external option
"Keyboard" ?

Per Thorsheim | 3 Feb 00:58

Minimum Password Length POO

Sorry if I'm way out of line here, posting this to john-users:

I've played a little game of "Create a normal sentence with minimum 4
words using the lowest number of unique letters - in any existing
language" on Twitter: @thorsheim

I guess nothing new really, but I'm really interested in once again
listening to the expertise of john-users on minimum password length
recommendations, based on crazy ideas off the new "Pile of Poo" Unicode
6.0 U+1F4A9 character.

Using the Dumb16 and Dumb32 modes, I'm curious about keyspace
calculations and configuration examples for john on how to do dictionary
+"common Unicode characters" attacks. Like using the U+2665 "Black Heart
Suit" character (Windows UTF-8 Times New Roman font, NO kb layout) to
separate "I" from "insert name of loved one here".

Blog post can be found here:
http://securitynirvana.blogspot.com/2012/02/minimum-password-length-poo.html

--

-- 
Best regards,
Per Thorsheim
CISA, CISM, CISSP-ISSAP
securitynirvana.blogspot.com

Solar Designer | 3 Feb 06:05
Favicon

Re: Minimum Password Length POO

On Fri, Feb 03, 2012 at 12:58:36AM +0100, Per Thorsheim wrote:
> Sorry if I'm way out of line here, posting this to john-users:

Why, you're asking a reasonable JtR-specific question here - so your
posting is appropriate and desirable.  The Subject does not reflect it,
though. ;-)

> Using the Dumb16 and Dumb32 modes, I'm curious about keyspace
> calculations and configuration examples for john on how to do dictionary
> +"common Unicode characters" attacks. Like using the U+2665 "Black Heart
> Suit" character (Windows UTF-8 Times New Roman font, NO kb layout) to
> separate "I" from "insert name of loved one here".

This is tricky right now.  As they are, Dumb16 and Dumb32 will crack
only extremely short passwords.

As a test, I took a string of the type you describe from:

http://en.wikipedia.org/wiki/Variable-width_encoding

"For example, the four character string "I NY" is encoded in UTF-8 like
this (shown as hexadecimal byte values): 49 E2 99 A5 4E 59.  Of the six
units in that sequence, 49, 4E, and 59 are singletons (for I, N, and Y),
E2 is a lead unit and 99 and A5 are trail units.  The heart symbol is
represented by the combination of the lead unit and the two trail units."

(I replaced the actual heart character with a space in this e-mail since
I am not sending it in UTF-8.)

So I put this in a file:
(Continue reading)

Solar Designer | 3 Feb 06:29
Favicon

Re: Segmentation fault in john-1.7.9-jumbo-5 under some conditions

On Thu, Feb 02, 2012 at 06:10:32PM +0100, madfran@... wrote:
> No solution for the problem of segmentation fault when traying to  
> restore a session initiate with the external option
> "Keyboard" ?

I tried to reproduce it on several Linux systems with no luck.  Since
the problem is specific to -jumbo (or so it appears) and to Windows
(which I normally don't use), I am leaving it for the contributors to
-jumbo to debug.  Jim - maybe you? ;-)

Alexander

Solar Designer | 3 Feb 06:54
Favicon

Re: DES with OpenMP

On Thu, Feb 02, 2012 at 03:21:32PM +0000, Alex Sicamiotis wrote:
> Again, moving down from 32 to lower values brought significant gains - especially in 2 threads. LM seems to
be "settled" at a value of 8. While for plain DES the ideal value is 1, still with a value of 8 there's not much
performance impact for it while the LM benefits enormously. 8 seems to be the perfect balance (for my
hardware and across both GCC and ICC) and you might consider it for the next john release after testing with
other hardware as well. 

Thanks for your testing.  I will likely need to split this into several
settings for different machines and hash types.

> In the meanwhile my curiosity has peaked as to why the openMP version is producing ~250 to 300k c/s over the
standard non-omp client (4750k c/s vs 4450-4500k c/s). Several things being equal (no-asm for both, icc
for both, non-hardware optimizations for both, a value of 1 for des_bs_cpt for both, definite use of just 1
thread for both) there are still 300k in favor of openMP which, normally, it should be slower than the
non-omp version. 
> 
> Can you think of *any* other parameters which are tweakable and (may) lead to the +300k gain for the omp
version? I want to try various stuff but I don't know what to tweak. My rationale is that if the non-omp
version is running with at least the same parameters of the omp version, then the non-omp could be slightly
faster than the omp-version (I'm always talking about 1 thread) perhaps exceeding 4.8-4.9m c/s instead
of being near the 4.5m mark.

No parameters to tweak, I think - it's just different code.

You may try building with -D_OPENMP instead of -fopenmp - that is, don't
actually enable OpenMP, but request that version of John's source code.
This should complain on truly OpenMP-specific constructs such as calls
to omp_get_max_threads(), which you'll need to remove (just put 1 for
the threads count, etc.)  It should also give warnings about the
#pragma's, which you may ignore.
(Continue reading)


Gmane