Samuel Audet | 6 Aug 2009 12:15
Picon

[libdc] possible corruption of memory...

Hello,

I am having a problem in my application where when dc1394_camera_free() 
is called, glibc becomes unstable and I get "double free" errors and 
such weird behavior in the rest of the code. If I _never_ call 
dc1394_camera_free() and let it leak memory, then everything works 
fine.. To debug this, I tried to use valgrind, but libdc1394 seems to be 
in very bad shape unfortunately.. I get plenty of warnings about 
uninitialised values. For example, just these four lines

     dc1394_t *d = dc1394_new();
     dc1394camera_list_t *list;
     dc1394_camera_enumerate (d, &list);
     dc1394_camera_free_list(list);
     dc1394_free(d);

generate the following in valgrind (I'm just pasting the summary. The 
whole default output is around 1500 lines):

==22182== ERROR SUMMARY: 347 errors from 47 contexts (suppressed: 4 from 1)
==22182== malloc/free: in use at exit: 0 bytes in 0 blocks.
==22182== malloc/free: 67 allocs, 67 frees, 33,056 bytes allocated.
==22182== For counts of detected errors, rerun with: -v
==22182== All heap blocks were freed -- no leaks are possible.

Ouch... it's hard to see if and where any memory corruption might be 
taking place.. :( I think the code could use some cleaning in that area.

Samuel

(Continue reading)

Rudolf Leitgeb | 6 Aug 2009 13:22
Favicon

Re: [libdc] possible corruption of memory...

If you use the ElectrcFence library you can trigger a segmentation
fault whenever some call messes up the heap - this would make it much
easier for you to track down the problem. Basically relink your
program with libefence (should be available as package for most  
distros),
turn on core dumps, wait for your program to trigger the SEGV, then
proceed with your debugger. If it's fully reproducable behavior, you
should be able to track down the problem quickly.

Cheers,

Rudi

Am 06.08.2009 um 12:15 schrieb Samuel Audet:

> Hello,
>
> I am having a problem in my application where when  
> dc1394_camera_free()
> is called, glibc becomes unstable and I get "double free" errors and
> such weird behavior in the rest of the code. If I _never_ call
> dc1394_camera_free() and let it leak memory, then everything works
> fine.. To debug this, I tried to use valgrind, but libdc1394 seems  
> to be
> in very bad shape unfortunately.. I get plenty of warnings about
> uninitialised values. For example, just these four lines
>
>     dc1394_t *d = dc1394_new();
>     dc1394camera_list_t *list;
>     dc1394_camera_enumerate (d, &list);
(Continue reading)

Samuel Audet | 6 Aug 2009 15:02
Picon

Re: [libdc] possible corruption of memory...

Rudolf Leitgeb wrote:
> If you use the ElectrcFence library you can trigger a segmentation
> fault whenever some call messes up the heap - this would make it much
> easier for you to track down the problem. Basically relink your
> program with libefence (should be available as package for most  
> distros),
> turn on core dumps, wait for your program to trigger the SEGV, then
> proceed with your debugger. If it's fully reproducable behavior, you
> should be able to track down the problem quickly.

Thanks for the tip!

Unfortunately, it's not fully reproducible, and I often error messages 
from glibc like "double free", "memory corruption" and simply aborts, 
not a SEGV..

Samuel

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
Rudolf Leitgeb | 6 Aug 2009 16:05
Favicon

Re: [libdc] possible corruption of memory...

> Unfortunately, it's not fully reproducible, and I often error messages
> from glibc like "double free", "memory corruption" and simply aborts,
> not a SEGV..

That's exactly where libefence kicks in. Just like valgrind it  
intercepts
all calls to malloc/free/new/delete and makes sure you don't do illegal
stuff with your heap (like access memory already frees or overstepping  
bounds).
If it catches you, it throws a SEGV, which yields you a core dump at the
location where your code first did something wrong.

The main difference between valgrind and libefence is that valgrind
is an executable which you use for lauching your application, whereas
libefence is a library which you simply link your code against. You
do not have to modify your code in any way and could even force load
it by setting the LD_PRELOAD environment variable before launching
your application from the command line.

Check it out here: http://freshmeat.net/projects/efence/
Note: it's somewhat outdated but your distro should have it
	as a package anyway.

Cheers,

Rudi

--

DI. Dr. Rudolf Leitgeb
(Continue reading)

David Moore | 6 Aug 2009 16:37
Picon
Favicon

Re: [libdc] possible corruption of memory...

On Thu, 2009-08-06 at 19:15 +0900, Samuel Audet wrote:

> generate the following in valgrind (I'm just pasting the summary. The 
> whole default output is around 1500 lines):
> 

It's normal for valgrind to produce a lot of errors because it doesn't
know how to handle ioctls into the kernel.

That said, it's possible there is a memory leak/corruption, it's just
valgrind might not be too helpful.  libefence might work better.

-David

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
Stefan Richter | 6 Aug 2009 18:58
Picon

Re: [libdc] possible corruption of memory...

David Moore wrote:
> On Thu, 2009-08-06 at 19:15 +0900, Samuel Audet wrote:
> 
>> generate the following in valgrind (I'm just pasting the summary. The 
>> whole default output is around 1500 lines):
>>
> 
> It's normal for valgrind to produce a lot of errors because it doesn't
> know how to handle ioctls into the kernel.
> 
> That said, it's possible there is a memory leak/corruption, it's just
> valgrind might not be too helpful.  libefence might work better.

AFAIU it's possible to annotate such code to avoid valgrind's false 
positive noise.  But I have no idea how intrusive that would be to 
libdc1394's source code.  (Less noise in runtime debug output, more 
noise in the code...)
--

-- 
Stefan Richter
-=====-==--= =--- --==-
http://arcgraph.de/sr/

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
Samuel Audet | 7 Aug 2009 12:44
Picon

Re: [libdc] multiple cameras and juju

Stefan Richter wrote:
> Samuel Audet wrote:
>> The strange thing is, everything works just great under Windows XP (32 
>> bit mind you, so < 4 gig of RAM there) with PGR FlyCapture... ?
> 
> On the very same PC?

Yes, dual boot

> You could boot Linux with mem=2G added to the kernel command line (added 
> to the bootloader config, or entered at the boot prompt) and see what 
> happens.

Didn't change anything, didn't fix it

> Besides that, if you know how to compile kernels, you could edit 
> drivers/firewire/fw-ohci.c in function pci_probe:  Replace
> 	ohci->use_dualbuffer = version >= OHCI_VERSION_1_1;
> by
> 	ohci->use_dualbuffer = false;
> rebuild, install, and test.
> http://fedoraproject.org/wiki/Building_a_custom_kernel

(Aaarg, it's such a pain to recompile the kernel just for one module.. 
there should be an easier way to compile _one_ tiny little module)

But I managed it, finally :) And that _does_ fix the problem :) thanks!

What's that dualbuffer all about anyway? It's only used and useful in 
64-bit mode I guess?
(Continue reading)

Samuel Audet | 7 Aug 2009 14:12
Picon

Re: [libdc] possible corruption of memory...

Stefan Richter wrote:
> AFAIU it's possible to annotate such code to avoid valgrind's false 
> positive noise.  But I have no idea how intrusive that would be to 
> libdc1394's source code.  (Less noise in runtime debug output, more 
> noise in the code...)

Ah, there's just the thing for that in valgrind actually:

"--sim-hints=lax-ioctls: Be very lax about ioctl handling; the only 
assumption is that the size is correct. Doesn’t require the full buffer 
to be initialized when writing. Without this, using some device drivers 
with a large number of strange ioctl commands becomes very tiresome."

But I still get lots of "Conditional jump or move depends on 
uninitialised value(s)" and "Use of uninitialised value of size 8" 
(maybe they are initialized from ioctls and valgrind is still having 
problems..?)

In any case, I think I found my problem... _writing_ perfectly valid 
values to a dc1394video_frame_t, even before dc1394_capture_enqueue(), 
seems to be a bad thing. And the problem only shows up _after_ a 
dc1394_camera_free(). If I let stuff leak and I never call 
dc1394_camera_free(), I can write to dequeued dc1394video_frame_t's all 
I want and nothing happens.. strange.

Samuel

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
(Continue reading)

Stefan Richter | 7 Aug 2009 14:17
Picon

Re: [libdc] multiple cameras and juju

Samuel Audet wrote:
> Stefan Richter wrote:
>> Samuel Audet wrote:
>>> The strange thing is, everything works just great under Windows XP (32 
>>> bit mind you, so < 4 gig of RAM there) with PGR FlyCapture... ?
>> On the very same PC?
> 
> Yes, dual boot
> 
>> You could boot Linux with mem=2G added to the kernel command line (added 
>> to the bootloader config, or entered at the boot prompt) and see what 
>> happens.
> 
> Didn't change anything, didn't fix it
> 
>> Besides that, if you know how to compile kernels, you could edit 
>> drivers/firewire/fw-ohci.c in function pci_probe:  Replace
>> 	ohci->use_dualbuffer = version >= OHCI_VERSION_1_1;
>> by
>> 	ohci->use_dualbuffer = false;
>> rebuild, install, and test.
>> http://fedoraproject.org/wiki/Building_a_custom_kernel
> 
> (Aaarg, it's such a pain to recompile the kernel just for one module.. 
> there should be an easier way to compile _one_ tiny little module)
> 
> But I managed it, finally :) And that _does_ fix the problem :) thanks!

Then we apparently need to add a quirk fix for this controller.

(Continue reading)

Samuel Audet | 7 Aug 2009 14:31
Picon

Re: [libdc] multiple cameras and juju

Stefan Richter wrote:
> Then we apparently need to add a quirk fix for this controller.
> 
> Let's recapitulate:
>    - You don't need a modification in the driver source for FW323.
>    - You do need to force ohci->use_dualbuffer = false for FW643.
> Correct?

Correct

(But the FW323 has no dual buffer mode, it isn't OHCI 1.0, is that right?)

> How are these two controllers shown by "lspci -nn"?

03:00.0 FireWire (IEEE 1394) [0c00]: Agere Systems FW643 PCI 
Express1394b Controller (PHY/Link) [11c1:5901] (rev 06)
04:00.0 FireWire (IEEE 1394) [0c00]: Agere Systems FW323 [11c1:5811] 
(rev 61)

Let me know if you need more details!

> The #if !defined(CONFIG_X86_32)... #endif is only a micro-optimization 
> for one of the platforms which is known to not use physical addresses 
> bigger than 2 GB for kernel-internal buffer allocations.  (There are 
> more of such platforms, notably PPC-32, but I didn't add those to this 
> micro-optimization because there was a vague possibility that the 
> architecture maintainers might change PPC-32 platform behavior in this 
> regard.)  Anyway, in your case it's according to your findings not tied 
> to buffers in the range of >2 GB... <= 4 GB (they can't be beyond 4 GB 
> because plain PCI uses only 32 bit addressing) but a general problem of 
(Continue reading)


Gmane