Luke Mewburn | 2 Mar 2003 12:13
Picon

Re: RFC: memmem(3)

On Sun, Mar 02, 2003 at 08:26:03PM +1030, wulf <at> ping.net.au wrote:
  | G'day,
  | 
  | after many uses I like to propose memmem(3) for inclusion into libc.
  | memmem(3) is similar to strstr(3) with the difference that it allows
  | searching for arbitary binary data patterns in an memory block, 
  | see man-page and source code included below fore more information.
  | 
  | I believe that this is very versatile function that may become handy for
  | many applications.
  | 
  | As always, I entertain your constructive comments.
  | 
  | Many thanks in advance
  | 
  | cheerio Berndt

Looks good.

One comment; any reason the signature isn't
    void *memmem(const void *b1, size_t len1, const void *b2, size_t len2);
?

(keep the lengths with the pointers they're for).

  | 
  | 
  | ----------------------- memmem.3 (txt) ------------------------
  | MEMMEM(3)                 NetBSD Programmer's Manual                 MEMMEM(3)
  | 
(Continue reading)

Julio Merino | 2 Mar 2003 12:34

Re: RFC: memmem(3)

On Sun, 2 Mar 2003 20:26:03 +1030 (CST)
wulf <at> ping.net.au wrote:

> SEE ALSO
>      strstr(3),

                ^

> STANDARDS
>      The memmem() function is an extension to ANSI X3.159-1989 (``ANSI C'').

An "extension"? Is this function mentioned in any standard or is it just
a linux thing?

--

-- 
Julio M. Merino Vidal <jmmv <at> menta.net>
The NetBSD Project - http://www.NetBSD.org/

wulf | 2 Mar 2003 12:46
Picon

Re: RFC: memmem(3)

> 
> On Sun, Mar 02, 2003 at 08:26:03PM +1030, wulf <at> ping.net.au wrote:
>   | G'day,
>   | 
>   | after many uses I like to propose memmem(3) for inclusion into libc.
>   | memmem(3) is similar to strstr(3) with the difference that it allows
>   | searching for arbitary binary data patterns in an memory block, 
>   | see man-page and source code included below fore more information.
>   | 
>   | I believe that this is very versatile function that may become handy for
>   | many applications.
>   | 
>   | As always, I entertain your constructive comments.
>   | 
>   | Many thanks in advance
>   | 
>   | cheerio Berndt
> 
> Looks good.
> 
> One comment; any reason the signature isn't
>     void *memmem(const void *b1, size_t len1, const void *b2, size_t len2);
> ?
> 
> (keep the lengths with the pointers they're for).

Thanks. No, there isn't a particular reason for the suggested signature
layout. I merely implemented a missing function from declarations found
in several applications and followed the lead. However, grouping pointers
with their associated parameters would seem to make sens.
(Continue reading)

wulf | 2 Mar 2003 13:29
Picon

Re: RFC: memmem(3)

> 
> On Sun, 2 Mar 2003 20:26:03 +1030 (CST)
> wulf <at> ping.net.au wrote:
> 
> > SEE ALSO
> >      strstr(3),
> 
>                 ^
> 
> 
> > STANDARDS
> >      The memmem() function is an extension to ANSI X3.159-1989 (``ANSI C'').
> 
> An "extension"? Is this function mentioned in any standard or is it just
> a linux thing?

It's not part of the standard and hence I consider it an extension of the
string.h family of functions described by ANSI X3.159-1989 and
ANSI/ISO 9899-1990.

cheerio Berndt

John Hawkinson | 2 Mar 2003 16:04
Picon
Favicon

Re: RFC: memmem(3)

Berndt <wulf <at> ping.net.au> wrote on Sun, 2 Mar 2003 at 22:59:47 +1030
in <200303021229.h22CTx315016 <at> gw.ping.net.au>:

> > > STANDARDS
> > >      The memmem() function is an extension to ANSI X3.159-1989 (``ANSI C'').
> > 
> > An "extension"? Is this function mentioned in any standard or is it just
> > a linux thing?
> 
> It's not part of the standard and hence I consider it an extension of the
> string.h family of functions described by ANSI X3.159-1989 and
> ANSI/ISO 9899-1990.

As I think Julio makes clear, the wording is sufficiently vague to
produce confusion. I don't think it's appropriate to use the word
extension if the object of the preposition is the standard (though if
you said "NetBSD-specific extension", that would be ok I guess).  I
guess I would be most happy with "memmem() is not a standardized
function," with an indication of it's glibc origins under HISTORY.

It looks like the linux/glibc version has this to say:

| CONFORMING TO
| 
|       This function is a GNU extension. 
| 
| BUGS
| 
|       This function was broken in Linux libraries up to and including
| libc 5.0.9; there the `needle' and `haystack' arguments were
(Continue reading)

glasur | 2 Mar 2003 17:21
Picon

Sofort Geld abkassieren

Wir zahlen 10-15 Cent pro Click! 

http://www.ero4click.com

Die Vergütung, der auf Ihren Seiten eingeblendeten Werbung erfolgt im PayPerClick Verfahren. Das
heisst, Sie bekommen für jeden Klick, welche Besucher Ihrer Seite auf einen Werbe-Banner tätigen,
0.10 EUR gutgeschrieben.Bannerwerbung ist die beliebteste und etablierteste Werbeform im World Wide Web.

Wenn sie einen Webmaster werben erhalten sie 10% von dessen Umsatz!

Ihre Vorteile:

Sie können aus vielen verschiedenen Bannern wählen.

Die Teilnahme ist für Sie völlig kostenlos und kann jederzeit gekündigt werden. 

Die Abwicklung sämtlicher Klicks und Zahlungen wird von ero4click.com erledigt.

  
Sehr hohe Vergütungen der Klicks.Sie verdienen mindestens 10 Cent pro Klick.Bei einer guten Zielgruppe
zahlen wir ihnen bis zu 15 Cent pro Klick.  

Sie haben jederzeit passwortgeschützten Zugriff auf Ihre  Benutzerdaten und Statistiken.

Die Auszahlungen erfolgen jeweils am Monatsende.Sie müssen mindestens 20 Euro verdient haben um eine
Auszahlung zu erhalten.

Garantierte Auszahlung ab 20 Euro Guthaben.

http://www.ero4click.com
(Continue reading)

der Mouse | 2 Mar 2003 18:22
Picon

Re: RFC: memmem(3)

>      memmem(const void *b1, const void *b2, size_t len1, size_t len2);

For what it may be worth, for some time now I've had a private library
(called, for historical reasons, libsearchstr) that does basically
this, except

+ searching is usually much faster

+ there is a comparatively expensive setup call that must be done once
  the sought-for string is known

+ there's a maximum length on the sought-for string (approx. 255)

+ there's an equivalence table, permitting it to be used as a
  strcasestr, or for searching treating all digits identically, etc

If anyone's interested, let me know.  The C version is only 68 lines; I
also have .s versions for 68k, sparc, and vax.

The length limit could be raised to about 64k by doubling the size of
the table created by the setup call (shorts instead of chars).

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse <at> rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Martin Husemann | 2 Mar 2003 20:38
Picon

Re: RFC: memmem(3)

On Sun, Mar 02, 2003 at 12:22:37PM -0500, der Mouse wrote:

> + searching is usually much faster

That was my first thought too - if something needs this, it would probably
better use some more clever algorithm, like Rabin/Karp or Knuth/Morris/Pratt
(with similar interface changes as you describe).

Is it realy a good idea to add a seldom needed function and force a brute
force implementation via interface restrictions? Does this realy belong in
libc? But then we probably shouldn't have strtok or strspn, if they weren't
enforced by ANSI.

Martin

der Mouse | 2 Mar 2003 20:45
Picon

Re: RFC: memmem(3)

>> + searching is usually much faster
> That was my first thought too - if something needs this, it would
> probably better use some more clever algorithm, like Rabin/Karp or
> Knuth/Morris/Pratt (with similar interface changes as you describe).

I'm not sure which algorithm I'm using in terms of names like those.
The name Boyer/Moore comes to mind, but I'm not sure exactly what that
is so I could be misled there.

Here's the interface I'm using:

        void
        searchstr_maketbl(unsigned char (*table)[256],
           const unsigned char *string, unsigned int len,
           const unsigned char equiv[256]);

        int
        searchstr(const unsigned char *string, unsigned int stringlen,
           unsigned char (*table)[256], unsigned int tablelen);

The first function is the overhead call I referred to; the second
actually does the search.  The interface is not very general (in terms
of the underlying algorithm) because I wanted to avoid making the
library routines allocate memory.

Hm, the table argument to searchstr really ought to be
const unsigned char (*)[256].  And of course 256 should be something
more like UCHAR_MAX....

/~\ The ASCII				der Mouse
(Continue reading)

Jaromir Dolecek | 2 Mar 2003 21:54
Picon

Re: RFC: memmem(3)

There is even Boyer-Moore string search implementation in tree -
bm(3).
I keep meaning to write implementation of Aho-Corasik string
search implementation ... 

Jaromir

der Mouse wrote:
[ Charset ISO-8859-1 unsupported, converting... ]
> >> + searching is usually much faster
> > That was my first thought too - if something needs this, it would
> > probably better use some more clever algorithm, like Rabin/Karp or
> > Knuth/Morris/Pratt (with similar interface changes as you describe).
> 
> I'm not sure which algorithm I'm using in terms of names like those.
> The name Boyer/Moore comes to mind, but I'm not sure exactly what that
> is so I could be misled there.
> 
> Here's the interface I'm using:
> 
>         void
>         searchstr_maketbl(unsigned char (*table)[256],
>            const unsigned char *string, unsigned int len,
>            const unsigned char equiv[256]);
> 
>         int
>         searchstr(const unsigned char *string, unsigned int stringlen,
>            unsigned char (*table)[256], unsigned int tablelen);
> 
> The first function is the overhead call I referred to; the second
(Continue reading)


Gmane