Alex Drahon | 1 Jul 21:33 2006

Benchmarking and file_read_line

I'm still playing with some benchmarks, mostly from the shootout. I  
made a tentative implementation of a file_read_line function in  
neko's std lib, which allowed me to write a File iterator for haXe.

As far as benchmarking is concerned, neko feels on par with Python,  
which is to say not very fast compared to VM based language like Java  
or even Lua. Lua is certainly the fastest interpreted language there is.

I'm still playing with these not so serious benchmarks to get a feel  
of the platform's strengths and weaknesses, right now I'm a little  
surprised not to see any gains over Python...

Alex

Attachment (file.c): application/octet-stream, 9 KiB
Attachment (File.hx): application/octet-stream, 5047 bytes
Attachment (FileIter.hx): application/octet-stream, 1905 bytes
--

-- 
Neko : One VM to run them all
(http://nekovm.org)
Nicolas Cannasse | 1 Jul 22:12 2006

Re: Benchmarking and file_read_line

> I'm still playing with some benchmarks, mostly from the shootout. I made 
> a tentative implementation of a file_read_line function in neko's std 
> lib, which allowed me to write a File iterator for haXe.

http://shootout.alioth.debian.org/debian/benchmark.php?test=sumcol&lang=gcc

file_read_line in C is implemented a lot more trivialy.
It's not "correct" since it limits the line size to 128 but it seems 
acceptable for the shootout.

Other thing, it that you're not benchmarking Neko there, but haXe. 
There's additional things done in haXe :
- wrapping the Neko string into the object String
- iterators

You could remove the second overhead by writing for instance the following :

var s = 0;
try {
    while( true )
       s += Std.parseInt(f.readLine());
} catch( e : Dynamic ) {
    neko.Lib.print(s);
}

However this benchmark will show almost nothing since it might be mainly 
IO-bound. Lua and Python have the same results and are "just" 3.5 time 
slower than pure C.

> As far as benchmarking is concerned, neko feels on par with Python, 
(Continue reading)

Alex Drahon | 1 Jul 22:36 2006

Re: Benchmarking and file_read_line

>> I'm still playing with some benchmarks, mostly from the shootout.  
>> I made a tentative implementation of a file_read_line function in  
>> neko's std lib, which allowed me to write a File iterator for haXe.
>
> http://shootout.alioth.debian.org/debian/benchmark.php? 
> test=sumcol&lang=gcc
>
> file_read_line in C is implemented a lot more trivialy.
> It's not "correct" since it limits the line size to 128 but it  
> seems acceptable for the shootout.
>

ok, it's just that it's a limiting factor to not have it in neko

> Other thing, it that you're not benchmarking Neko there, but haXe.  
> There's additional things done in haXe :
> - wrapping the Neko string into the object String
> - iterators
>

I know, I'm not trying to make an 'honest' benchmark but to get a  
general feeling of the performance I would get. That's why I'm not  
trying to optimize everything to the max

> var s = 0;
> try {
>    while( true )
>       s += Std.parseInt(f.readLine());
> } catch( e : Dynamic ) {
>    neko.Lib.print(s);
(Continue reading)

hank williams | 2 Jul 05:10 2006
Picon

Re: Benchmarking and file_read_line

Nicolas,

How will your JIT code work. I presume you have to generate op codes
for intel and power pc code for mac & windows? Or are you just
supporting intel. Or do you have some other more clever plan up your
sleeve that is somehow processor neutral?

I am curious about this because I am a little worried about the
performance issues that alex has described here. I am going to be
running neko in an embedded environment on a 140mhz coldfire (68k)
processor. I havent been able to run any benchmarks yet because I will
not have prototypes for another month. But if it looks like the
performance isnt as good as thought, it might effect my application.

This thread just sent a shiver down my spine.

Regards
Hank

On 7/1/06, Alex Drahon <adrahon <at> fastmail.fm> wrote:
> >> I'm still playing with some benchmarks, mostly from the shootout.
> >> I made a tentative implementation of a file_read_line function in
> >> neko's std lib, which allowed me to write a File iterator for haXe.
> >
> > http://shootout.alioth.debian.org/debian/benchmark.php?
> > test=sumcol&lang=gcc
> >
> > file_read_line in C is implemented a lot more trivialy.
> > It's not "correct" since it limits the line size to 128 but it
> > seems acceptable for the shootout.
(Continue reading)

Nicolas Cannasse | 2 Jul 12:01 2006

Re: Benchmarking and file_read_line

>> On which platform are your running your benchmarks ? Which which 
>> compiler did you compiled Neko ?
>>
> I'm testing on OS X, everything is compiled with GCC 4. I'm comparing 
> with Lua because you've been pretty dismissive of its performance on 
> many occasions. I also ran some of your neko programs in the bench 
> directory and most of the time neko is 3 times slower than Lua (except 
> for binary-trees where neko is almost as fast as java).

I don't remember being dismissive at Lua performances, although it's 
true that on nekovm.org/faq it's listed together with PHP/Python in the 
"pretty slow runtime" category. That might be a bit unfair and Lua might 
have its own category ;)

Intrigued by this 3 times slower difference, I ran some tests on 
Neko/Win32 CVS and Lua/Win32 binary (5.0.2). Both where built with MSVC 
so we also compare with the same C compilers :

- fibonnacci (recursion with integer calculus) ran pretty much at the 
same speed on both Neko an Lua.

- nbodies (floating point calculus) was indeed 3x faster on Lua. I might 
have a look at further optimizing for such usage, although I think it's 
pretty rare to do heavy floating point calculus in a VM (usualy one 
would move such tasks on the C side).

- fannkuch is IMHO impossible to benchmark, with < 10ms running time

- binary-trees where 3.5 times faster in Neko than in Lua. This 
benchmark mesure integer calculs, function call overhead, and allocation 
(Continue reading)

Nicolas Cannasse | 2 Jul 12:05 2006

Re: Benchmarking and file_read_line

> Nicolas,
> 
> How will your JIT code work. I presume you have to generate op codes
> for intel and power pc code for mac & windows? Or are you just
> supporting intel. Or do you have some other more clever plan up your
> sleeve that is somehow processor neutral?

JIT will be X86-only at the start.

With Apple switching to Intel I'm not sure it's worth investing time on 
a PPC backend and anyway I don't have the proper hardware to test it 
correctly.

> I am curious about this because I am a little worried about the
> performance issues that alex has described here. I am going to be
> running neko in an embedded environment on a 140mhz coldfire (68k)
> processor. I havent been able to run any benchmarks yet because I will
> not have prototypes for another month. But if it looks like the
> performance isnt as good as thought, it might effect my application.

I hope that the reply I gave to Alex were reasuring for you. It might 
also be possible like OSX to set some 68K registers in order to optimize 
the inner VM loop.

Nicolas

--

-- 
Neko : One VM to run them all
(http://nekovm.org)

(Continue reading)

hank williams | 2 Jul 12:20 2006
Picon

Re: Benchmarking and file_read_line

> > I am curious about this because I am a little worried about the
> > performance issues that alex has described here. I am going to be
> > running neko in an embedded environment on a 140mhz coldfire (68k)
> > processor. I havent been able to run any benchmarks yet because I will
> > not have prototypes for another month. But if it looks like the
> > performance isnt as good as thought, it might effect my application.
>
> I hope that the reply I gave to Alex were reasuring for you. It might
> also be possible like OSX to set some 68K registers in order to optimize
> the inner VM loop.
>

After thinking about it (and a few hours of sleep) I am not as worried
about it because there are a few performance bottlenecks where it
would be easy to implement core calculations in C. But I would be
curious to hear any estimate of how much performance you gain through
register optimization.

Regards
Hank

--

-- 
Neko : One VM to run them all
(http://nekovm.org)

Nicolas Cannasse | 2 Jul 12:31 2006

Re: Benchmarking and file_read_line

>> > I am curious about this because I am a little worried about the
>> > performance issues that alex has described here. I am going to be
>> > running neko in an embedded environment on a 140mhz coldfire (68k)
>> > processor. I havent been able to run any benchmarks yet because I will
>> > not have prototypes for another month. But if it looks like the
>> > performance isnt as good as thought, it might effect my application.
>>
>> I hope that the reply I gave to Alex were reasuring for you. It might
>> also be possible like OSX to set some 68K registers in order to optimize
>> the inner VM loop.
>>
> 
> After thinking about it (and a few hours of sleep) I am not as worried
> about it because there are a few performance bottlenecks where it
> would be easy to implement core calculations in C. But I would be
> curious to hear any estimate of how much performance you gain through
> register optimization.
> 
> Regards
> Hank

There is quite a big difference. Most of the CPU in the VM loop is spent 
manipulating theses three variables : SP PC and ACC. Having them in 
register is a lot faster than if they are on the C stack.

Nicolas

--

-- 
Neko : One VM to run them all
(http://nekovm.org)
(Continue reading)

Nicolas Cannasse | 2 Jul 19:09 2006

Re: Benchmarking and file_read_line

> As far as benchmarking is concerned, neko feels on par with Python, 
> which is to say not very fast compared to VM based language like Java or 
> even Lua. Lua is certainly the fastest interpreted language there is.

I studied a bit more the Lua internals and setup a comparison document 
there :

http://nekovm.org/lua

I'll maybe post it for review on the Lua list once I'm back from holidays.

Nicolas

--

-- 
Neko : One VM to run them all
(http://nekovm.org)

Alex Drahon | 2 Jul 19:35 2006

Re: Benchmarking and file_read_line

> I don't remember being dismissive at Lua performances, although  
> it's true that on nekovm.org/faq it's listed together with PHP/ 
> Python in the "pretty slow runtime" category. That might be a bit  
> unfair and Lua might have its own category ;)
>
I'm sure you didn't mean to be dismissive, but I don't understand  
your attitude there. If Python is in the "pretty slow runtime", then  
NekoVM also belongs here. Anyway, Neko's performance is satisfying,  
and certainly good enough for web applications. I just wanted to know  
if switching an app from Python to Neko would give me more  
performance "for free" (of course it's never as simple as that).

> Intrigued by this 3 times slower difference, I ran some tests on  
> Neko/Win32 CVS and Lua/Win32 binary (5.0.2). Both where built with  
> MSVC so we also compare with the same C compilers :
>
> - fibonnacci (recursion with integer calculus) ran pretty much at  
> the same speed on both Neko an Lua.
>
I found it a little faster on Neko. I think there is an error in  
fib.neko, line 2 should be
     if( n <= 1 ) return 1;
instead of
     if( n <= 1 ) return n;

> - nbodies (floating point calculus) was indeed 3x faster on Lua. I  
> might have a look at further optimizing for such usage, although I  
> think it's pretty rare to do heavy floating point calculus in a VM  
> (usualy one would move such tasks on the C side).

(Continue reading)


Gmane