Eric Blake | 2 Oct 2008 04:51
Gravatar

sed on binary files


Is there any portable way to process files that contain NUL bytes?  I'm
working on making m4 1.6 transparently handle NUL, and want to
post-process the output to normalize error messages while still verifying
that NUL bytes appeared where expected on stderr.  But on Solaris, the
native sed strips NUL bytes before processing the line (NUL bytes cannot
appear in text files, and POSIX does not define behavior on non-text
files, so this is not a bug, just a difference from GNU diff).  As a
result, the m4 testsuite either fails (if I only postprocess the captured
stderr and not the expected error) or can have false positives (if both
stderr and expected error are normalized, then regressions involving added
or missing NUL are not detected).  I don't want to require perl for just
this one test; m4 seems fundamental enough to keep the testsuite
restricted to the GNU coding standards set of tools.  The Solaris man
pages mention that /usr/xpg4/bin/tr can handle NUL bytes, but not
/usr/bin/tr; maybe I could search for an adequate tr, and change all NUL
to some other byte that does not otherwise appear in my expected output
(with the added benefit that diff might not give up early with the
complaint that the files are binary), but I don't know if that is portable
either.  Any suggestions?  Is this worth documenting in the autoconf manual?

--
Don't work too hard, make some time for fun as well!

Eric Blake             ebb9 <at> byu.net
Gary V. Vaughan | 2 Oct 2008 06:09
Picon

Re: sed on binary files

Hi Eric,

On 2 Oct 2008, at 10:51, Eric Blake wrote:
> Is there any portable way to process files that contain NUL bytes?

None that I'm aware of.  Many GNU utilities are reasonably well  
behaved with respect to '\0', and m4 is unusual to some extent in that  
we don't handle them well ourselves.

> I'm working on making m4 1.6 transparently handle NUL,

Excellent!  I made an attempt to do that myself on the 2.0 branch some  
years ago, but it didn't go well so I never committed...

> and want to
> post-process the output to normalize error messages while still  
> verifying
> that NUL bytes appeared where expected on stderr.  But on Solaris, the
> native sed strips NUL bytes before processing the line (NUL bytes  
> cannot
> appear in text files, and POSIX does not define behavior on non-text
> files, so this is not a bug, just a difference from GNU diff).  As a
> result, the m4 testsuite either fails (if I only postprocess the  
> captured
> stderr and not the expected error) or can have false positives (if  
> both
> stderr and expected error are normalized, then regressions involving  
> added
> or missing NUL are not detected).  I don't want to require perl for  
> just
(Continue reading)

Ralf Wildenhues | 2 Oct 2008 07:32
Picon
Picon

Re: sed on binary files

Hi Eric,

* Eric Blake wrote on Thu, Oct 02, 2008 at 04:51:58AM CEST:
> 
> Is there any portable way to process files that contain NUL bytes?

tr?  If you only need to compare for equality, then use cmp.

> The Solaris man
> pages mention that /usr/xpg4/bin/tr can handle NUL bytes, but not
> /usr/bin/tr; maybe I could search for an adequate tr, and change all NUL
> to some other byte that does not otherwise appear in my expected output
> (with the added benefit that diff might not give up early with the
> complaint that the files are binary), but I don't know if that is portable
> either.

That's what I'd try first, too.

> Any suggestions?  Is this worth documenting in the autoconf manual?

I guess the bit that Solaris /usr/bin/tr is deficient, is (what about
other vendors' tr?).  I wouldn't recommend a way to treat binary files
in the manual, until there is actual experience with it, and then only
if that deviates substantially from what Posix says.

Cheers,
Ralf
Eric Blake | 2 Oct 2008 14:16
Gravatar

Re: sed on binary files


According to Gary V. Vaughan on 10/1/2008 10:09 PM:
>> I'm working on making m4 1.6 transparently handle NUL,
> 
> Excellent!  I made an attempt to do that myself on the 2.0 branch some
> years ago, but it didn't go well so I never committed...

The argv_ref branch already does this; it is just a matter of finishing
porting it to branch-1.6 and master (and in the process of that porting, I
discovered that the tests I wrote worked fine under GNU sed but died under
Solaris 10).

>> or can have false positives (if both
>> stderr and expected error are normalized, then regressions involving
>> added
>> or missing NUL are not detected).  I don't want to require perl for just
>> this one test; m4 seems fundamental enough to keep the testsuite
>> restricted to the GNU coding standards set of tools.
> 
> I'd be inclined to do that in C.  A few lines should be sufficient to
> write a minimal filter that writes '\' '0' or '^' ' <at> ' to output whenever
> a NUL byte arrives?

Actually, I'm a bit lazy - I guess I'm okay with false positives on
Solaris when using deficient sed, so long as we can also run on Solaris
with GNU sed.  So I'm installing this patch, which lets the user select
the right sed, as well as passing both files through sed (a no-op for GNU
sed, but strips NUL bytes equally for Solaris sed).  (At any rate, it was
easier to code than searching for a tr that handles NUL).

(Continue reading)

Eric Blake | 2 Oct 2008 14:46
Gravatar

Re: sed on binary files


According to Ralf Wildenhues on 10/1/2008 11:32 PM:

Hi Ralf, Gary,

>> Is there any portable way to process files that contain NUL bytes?
> 
> tr?  If you only need to compare for equality, then use cmp.

Equality is insufficient the way the test is currently written; I'm
checking that m4's errprint can directly output NUL, while warning
messages quote filenames, so I must normalize VPATH output such as:
  src/m4:../examples/null.m4:98: Warning: indir: undefined macro `\0-\0'
to match the expected error
  m4:examples/null.m4:98: Warning: indir: undefined macro `\0-\0'

I suppose I could split the test into two - with m4, any output line that
needs normalization should not be using raw NUL (error messages), while
anything that allows outputs raw NUL to stderr shouldn't need
normalization (errprint, dumpdef, trace output).  But that is not
necessarily trivial to do either (it means the m4 testsuite would have to
conditionally run sed, instead of its current attitude of always running
the sed normalization).

> 
>> The Solaris man
>> pages mention that /usr/xpg4/bin/tr can handle NUL bytes, but not
>> /usr/bin/tr; maybe I could search for an adequate tr, and change all NUL
>> to some other byte that does not otherwise appear in my expected output
>> (with the added benefit that diff might not give up early with the
(Continue reading)

Ralf Wildenhues | 2 Oct 2008 14:50
Picon
Picon

Re: sed on binary files

* Eric Blake wrote on Thu, Oct 02, 2008 at 02:46:38PM CEST:
> 
> I'll add a blurb to the autoconf manual on sed mentioning that it cannot
> be used on binary files (although Posix already says that),

Isn't that obvious?  I mean, of all classic unix text-related tools,
only very few operate on non-text files; Posix mentions that with each
one explicitly.

Of course my radar may be seriously skewed without me realizing that.
;-)

Cheers,
Ralf
Jim Meyering | 2 Oct 2008 18:16
Gravatar

Re: sed on binary files

Eric Blake <ebb9 <at> byu.net> wrote:
> According to Gary V. Vaughan on 10/1/2008 10:09 PM:
>>> I'm working on making m4 1.6 transparently handle NUL,
>>
>> Excellent!  I made an attempt to do that myself on the 2.0 branch some
>> years ago, but it didn't go well so I never committed...
>
> The argv_ref branch already does this; it is just a matter of finishing
> porting it to branch-1.6 and master (and in the process of that porting, I
> discovered that the tests I wrote worked fine under GNU sed but died under
> Solaris 10).
>
>>> or can have false positives (if both
>>> stderr and expected error are normalized, then regressions involving
>>> added
>>> or missing NUL are not detected).  I don't want to require perl for just
>>> this one test; m4 seems fundamental enough to keep the testsuite
>>> restricted to the GNU coding standards set of tools.
>>
>> I'd be inclined to do that in C.  A few lines should be sufficient to
>> write a minimal filter that writes '\' '0' or '^' ' <at> ' to output whenever
>> a NUL byte arrives?
>
> Actually, I'm a bit lazy - I guess I'm okay with false positives on
> Solaris when using deficient sed, so long as we can also run on Solaris
> with GNU sed.  So I'm installing this patch, which lets the user select
> the right sed, as well as passing both files through sed (a no-op for GNU
> sed, but strips NUL bytes equally for Solaris sed).  (At any rate, it was
> easier to code than searching for a tr that handles NUL).
>
(Continue reading)

Alexander Martens | 6 Oct 2008 19:47
Picon

Unit test failed (RHEL 5.2)

Found the following assert after building on a RHEL 5.2.
Don't hesitate to write back for further info / testing!
Glad to help.

Alex

---

PASS: test-strerror
PASS: test-string
PASS: test-strstr
test-strtod.c:667: assertion failed
test-strtod.c:668: assertion failed
test-strtod.c:688: assertion failed
test-strtod.c:717: assertion failed
test-strtod.c:718: assertion failed
FAIL: test-strtod
PASS: test-sys_stat
PASS: test-sys_time
PASS: test-unistd
PASS: test-vasnprintf
PASS: test-vasprintf-posix
PASS: test-vasprintf
PASS: test-wchar
PASS: test-wctype
PASS: test-xvasprintf
===============================
1 of 47 tests failed
Please report to bug-m4 <at> gnu.org
===============================
(Continue reading)

Eric Blake | 6 Oct 2008 21:20
Gravatar

Re: Unit test failed (RHEL 5.2)


According to Alexander Martens on 10/6/2008 11:47 AM:
> Found the following assert after building on a RHEL 5.2.

> test-strtod.c:667: assertion failed

Known bug in your glibc:
http://lists.gnu.org/archive/html/bug-m4/2008-06/msg00001.html

The latest snapshot (will become 1.4.12 in another week or so):
http://lists.gnu.org/archive/html/bug-m4/2008-09/msg00019.html

--
Don't work too hard, make some time for fun as well!

Eric Blake             ebb9 <at> byu.net
Elbert Pol | 12 Oct 2008 11:33
Picon

m4

Hello,

Tried today m4.1.4.12
With os/2 and gcc 4.32

And it failed  <at>  the make

Included the logs...
Attachment (config.log.lzma): application/octet-stream, 26 KiB
Attachment (make.out.lzma): application/octet-stream, 4147 bytes

Gmane