JDF | 3 Apr 2012 02:15
Picon
Favicon

Maximum size of Pattern space / Hold space

 

Hello all,
I am using GNU sed 4.2.1 on Fedora 14.
Does anyone know what the maximum size of the Pattern/Hold spaces
can be? If I did many, many N or H commands, could I overflow them?
Thanks!

__._,_.___
Recent Activity:
--
.

__,_._,___
Mark Edgar | 4 Apr 2012 15:09
Picon
Gravatar

Re: Maximum size of Pattern space / Hold space

 

On Tue, Apr 3, 2012 at 2:15 AM, JDF <usajdfields <at> yahoo.com> wrote:
> I am using GNU sed 4.2.1 on Fedora 14.
> Does anyone know what the maximum size of the Pattern/Hold spaces
> can be?

They seem to be resizable buffers:
http://git.savannah.gnu.org/cgit/sed.git/tree/sed/execute.c?id=4.2

> If I did many, many N or H commands, could I overflow them?

How many is "many, many"? What do you mean by "overflow"? If you
append enough, you will of course eventually run out of memory:

$ yes {0..7}\ {a..p} | head -n 262145 | (ulimit -v 100000; sed -e :1
-e H -e n) | wc -l
sed: couldn't re-allocate memory
262144

-Mark

__._,_.___
Recent Activity:
--
.

__,_._,___
Peter Dominey | 17 Apr 2012 20:45

CISPA. - (H.R. 3523) A new Bill in Congress that threatens your pivacy.

 

Please take action, now! to voice your opposition to (H.R. 3523) -
CISPA. The bill seeks to trample on all cyber privacy and to provide
total immunity to all businesses intercepting all your communications
and provide it all to the government. This would be done without any
judicial oversight and with no requirement to have probable cause. So
please take action now.

There is less than a week before the bill comes up, so please don't
wait. I would never ordinary reach out like this except there is a real
urgency for us to act now.

Please follow this link to visit the EFF web site and from there to
contact your representatives.

https://wfc2.wiredforchange.com/o/9042/p/dia/action/public/?action_KEY=8444

--

Thanks
Peter Dominey

----------------------------------------------------------
Independent UNIX Contactor

Phone: 817-488-5957
E-mail: pdominey
Web site: www.dominey.biz
----------------------------------------------------------

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

[Non-text portions of this message have been removed]

__._,_.___
Recent Activity:
--
.

__,_._,___
Quincey Robertson | 18 Apr 2012 14:00
Picon
Favicon

Sed Replace Script

 

Hi;
I have this script:

#! /bin/sed

for FILE in `find . -type f -name '*.py'`
do
sed 's/python/python2.4/g'${FILE}
done

When I try and run it I get this error:

[root]# ./do.sed
/bin/sed: -e expression #1, char 1: unknown command: `.'

Please advise.
TIA,
Quincey

__._,_.___
Recent Activity:
--
.

__,_._,___
Mark Edgar | 20 Apr 2012 07:44
Picon
Gravatar

Re: Sed Replace Script

 

On Wed, Apr 18, 2012 at 2:00 PM, Quincey Robertson
<quincey.robertson <at> yahoo.com> wrote:
> #! /bin/sed

Why this line? I think you meant to use #!/bin/sh.

This should work, if I'm correctly guessing your intentions (and
assuming GNU sed's -i option is available):

find . -type f -name '*.py' -exec sed -i 's/python/python2.4/g' {} +

FYI: http://www.dwheeler.com/essays/filenames-in-shell.html is a great
article about properly handling filenames.

-Mark

__._,_.___
Recent Activity:
--
.

__,_._,___
Quincey Robertson | 20 Apr 2012 10:33
Picon
Favicon

Re: Sed Replace Script

 

Thanks. That worked.
Quincey

----- Original Message -----
From: Mark Edgar <medgar123 <at> gmail.com>
To: Quincey Robertson <quincey.robertson <at> yahoo.com>
Cc: "sed-users <at> yahoogroups.com" <sed-users <at> yahoogroups.com>
Sent: Friday, April 20, 2012 1:44 AM
Subject: Re: Sed Replace Script

On Wed, Apr 18, 2012 at 2:00 PM, Quincey Robertson
<quincey.robertson <at> yahoo.com> wrote:
> #! /bin/sed

Why this line?  I think you meant to use #!/bin/sh.

This should work, if I'm correctly guessing your intentions (and
assuming GNU sed's -i option is available):

find . -type f -name '*.py' -exec sed -i 's/python/python2.4/g' {} +

FYI: http://www.dwheeler.com/essays/filenames-in-shell.html is a great
article about properly handling filenames.

    -Mark

__._,_.___
Recent Activity:
--
.

__,_._,___
Luigi Assom | 23 Apr 2012 11:21
Picon

Re: Yahoo! Groups: Welcome to sed-users. Visit today!

 

Hi community!
I subscribed to sed group because I need to do some parsing of txt files
with html.
I am totally new to sed, I 've just discovered it yesterday!!
I tried something (strip all html tag) and I succeed, but I need something
more complex and I cannot dig it - could you please help with the syntax?

I need to extract URL contained in <a> tags, only the part like /RED/BLUE/
, and to strip out every other tag, so that I have clean text and only
THOSE urls.

I made this attempt:

txt structure:

> </b> (Displaying X Results)<table><tr> <td valign="top"><a
> href="/RED/BLUE/" onClick="(new Image()).src='/url/';"><img src="image.jpg"
> width="30" height="30" border="0"></a>&nbsp;</td><td align="right"
> valign="top"><img src="/images/c.gif" width="1" height="6"><br>1.</td><td
> valign="top"><img src="/images/c.gif" width="1" height="6"><br><a
> href="/RED/BLUE/" onclick="(new Image()).src='url/';">TEXT TO EXTRACT</a>
> TEXT TO EXTRACT <p class="find-alike">alike "OTHER TEXT TO
> EXTRACT" - BLABLA <em>(blabla)</em></p> <p class="find-alike">alike
> "OTHER TEXT TO EXTRACT" - BALABLA</p> </td></tr><tr> <td
> valign="top"><img src="/images/b.gif" alt="" width="23" height="1"></td><td
> align="right" valign="top">2.</td><td valign="top"><a href="/RED/BLUE/"
> onclick="(new Image()).src='url/';">TEXT TO EXTRACT</a> TEXT TO EXTRACT
> </td></tr>
>

command:
sed -e 's/<[^>]*>//g;/</N;//b'

it works, but strip out all tags.

I've tried to customize it, but nothing happened:
sed -e '/RED/!d;'

I also tried some script on the web but I cannot make them work...
I even new in using Unix :)
I am working with macbook sed based.

Could you please help out in this task?
Thank you!
Luigi

[Non-text portions of this message have been removed]

__._,_.___
Recent Activity:
--
.

__,_._,___
Joe Gain | 24 Apr 2012 13:05
Picon
Gravatar

Re: Yahoo! Groups: Welcome to sed-users. Visit today!

 

Hey Luigi,

one of sed's problems is dealing with newline characters and you have
your html anchor tags split over new lines. There's probably a smart
way to overcome this using sed (and sed's line buffer), but it's not
an easy alternative.

I think the best way for you is to:

1. remove all the new line characters,or at least don't split tags
with new lines (there are many ways to do this, depending on which
version of sed you are using, could be as easy as # sed -n -e
's/\n//g; p' data.html)

2. remove the surrounding tags from the text that you want in your
anchor attributes with something like:
sed -n -e 's/\(<a[^h]*href="\)\([^"]*\)"\([^>]*>\)/\2/g; p'

3. remove all the other tags, using the sed script you have already.
(You can pipe the output of 2 into 3.)

Hope this helps.

Joe

PS. In general, you can look at some other tools like perl or ruby to
make your life easier.

On Mon, Apr 23, 2012 at 11:21 AM, Luigi Assom <luigi.assom <at> gmail.com> wrote:
> Hi community!
> I subscribed to sed group because I need to do some parsing of txt files
> with html.
> I am totally new to sed, I 've just discovered it yesterday!!
> I tried something (strip all html tag) and I succeed, but I need something
> more complex and I cannot dig it - could you please help with the syntax?
>
> I need to extract URL contained in <a> tags, only the part like /RED/BLUE/
> , and to strip out every other tag, so that I have clean text and only
> THOSE urls.
>
> I made this attempt:
>
> txt structure:
>
>> </b> (Displaying X Results)<table><tr> <td valign="top"><a
>> href="/RED/BLUE/" onClick="(new Image()).src='/url/';"><img src="image.jpg"
>> width="30" height="30" border="0"></a>&nbsp;</td><td align="right"
>> valign="top"><img src="/images/c.gif" width="1" height="6"><br>1.</td><td
>> valign="top"><img src="/images/c.gif" width="1" height="6"><br><a
>> href="/RED/BLUE/" onclick="(new Image()).src='url/';">TEXT TO EXTRACT</a>
>> TEXT TO EXTRACT                 <p class="find-alike">alike "OTHER TEXT TO
>> EXTRACT" - BLABLA <em>(blabla)</em></p>       <p class="find-alike">alike
>> "OTHER TEXT TO EXTRACT" - BALABLA</p>   </td></tr><tr> <td
>> valign="top"><img src="/images/b.gif" alt="" width="23" height="1"></td><td
>> align="right" valign="top">2.</td><td valign="top"><a href="/RED/BLUE/"
>> onclick="(new Image()).src='url/';">TEXT TO EXTRACT</a> TEXT TO EXTRACT
>>               </td></tr>
>>
>
> command:
> sed -e 's/<[^>]*>//g;/</N;//b'
>
> it works, but strip out all tags.
>
> I've tried to customize it, but nothing happened:
> sed -e '/RED/!d;'
>
> I also tried some script on the web but I cannot make them work...
> I even new in using Unix :)
> I am working with macbook sed based.
>
>
> Could you please help out in this task?
> Thank you!
> Luigi
>
>
> [Non-text portions of this message have been removed]
>
>
>
> ------------------------------------
>
> --
> Yahoo! Groups Links
>
>
>

--
joe gain

jacob-burckhardt-str. 16
78464 konstanz
germany

+49 (0)7531 60389

(...otherwise in ???)

__._,_.___
Recent Activity:
--
.

__,_._,___
Cameron Simpson | 25 Apr 2012 00:14
Picon
Picon
Gravatar

Re: Yahoo! Groups: Welcome to sed-users. Visit today!

 

Joe has the truth here. I'm just adding a few remarks.

On 24Apr2012 13:05, Joe Gain <joe.gain <at> gmail.com> wrote:
| one of sed's problems is dealing with newline characters

Remark: specificly because it deals with a line at a time.

| and you have
| your html anchor tags split over new lines. There's probably a smart
| way to overcome this using sed (and sed's line buffer), but it's not
| an easy alternative.
|
| I think the best way for you is to:
|
| 1. remove all the new line characters,or at least don't split tags
| with new lines (there are many ways to do this, depending on which
| version of sed you are using, could be as easy as # sed -n -e
| 's/\n//g; p' data.html)

Or, not using sed:

( tr -d '\012' < data.html ; echo ) | sed other-sed-work-now....

This removes all the newlines using tr, then adds one at the end with
echo, thus putting all the text on a single line.

personally I'd trn newlines into spaces, otherwise a word break across a
line like this:

<p> some words
here</p>

would become

<p> some wordshere</p>

So:

( tr '\012' ' ' < data.html ; echo ) | sed other-sed-work-now....

Remember, sed is not your only tool.

| 2. remove the surrounding tags from the text that you want in your
| anchor attributes with something like:
| sed -n -e 's/\(<a[^h]*href="\)\([^"]*\)"\([^>]*>\)/\2/g; p'

I have my concerns here, specificly the "[^h]". It will match "<a href"
as intended, but it will also match all sorts of other undesired things.

<a *href=

is more reliable, matching spaces. It does presume no TAB characters.

I would be inclined to use even more tr at the outset, to hammer the
text flatter:

( tr '\011\012' ' ' < data.html ; echo ) | tr -s ' ' | sed ...

So:
- turn newlines and TABs into spaces
- turn multiple spaces into a single space

That way you can write patterns like this:

<a href=

without having to cope with more than one space; it makes the regexps
much easier to write and read.

Cheers,
--
Cameron Simpson <cs <at> zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

My venus fly trap is higher up the food chain than I am.
- simon <at> ohm.york.ac.uk (Simon Klyne)

__._,_.___
Recent Activity:
--
.

__,_._,___
Luigi Assom | 25 Apr 2012 20:06
Picon

Re: Yahoo! Groups: Welcome to sed-users. Visit today!

 

Joe, Cameron, thank you for your help!

I did not know sed and tr command, they are totally new and the syntax
really complex, and i don't know perl as well. I did not understand what
you mean exactly, but I tried to copy the sytax you sent, barely as it is..
the only thing which worked a bit seems to be this:
sed -n -e 's/\(<a[^h]*href="\)\([^"]*\)"\([^>]*>\)/\2/g; p'

so I was able to place the url inside the <a> tag outside which is good :)
I could not split end of lines, because it is long text with no "\n"
characters.

I just need to strip out all the other html tags (I don't need them) and
keep only plain text outside the tags.

so I will have a file like
url texttext text text text text url text text text text text
and then I should be ok to handle it.

Maybe is there any GUI interface to use sed on mac?
thank you for your help!

Luigi

On Wed, Apr 25, 2012 at 12:14 AM, Cameron Simpson <cs <at> zip.com.au> wrote:

> Joe has the truth here. I'm just adding a few remarks.
>
> On 24Apr2012 13:05, Joe Gain <joe.gain <at> gmail.com> wrote:
> | one of sed's problems is dealing with newline characters
>
> Remark: specificly because it deals with a line at a time.
>
> | and you have
> | your html anchor tags split over new lines. There's probably a smart
> | way to overcome this using sed (and sed's line buffer), but it's not
> | an easy alternative.
> |
> | I think the best way for you is to:
> |
> | 1. remove all the new line characters,or at least don't split tags
> | with new lines (there are many ways to do this, depending on which
> | version of sed you are using, could be as easy as # sed -n -e
> | 's/\n//g; p' data.html)
>
> Or, not using sed:
>
> ( tr -d '\012' < data.html ; echo ) | sed other-sed-work-now....
>
> This removes all the newlines using tr, then adds one at the end with
> echo, thus putting all the text on a single line.
>
> personally I'd trn newlines into spaces, otherwise a word break across a
> line like this:
>
> <p> some words
> here</p>
>
> would become
>
> <p> some wordshere</p>
>
> So:
>
> ( tr '\012' ' ' < data.html ; echo ) | sed other-sed-work-now....
>
> Remember, sed is not your only tool.
>
> | 2. remove the surrounding tags from the text that you want in your
> | anchor attributes with something like:
> | sed -n -e 's/\(<a[^h]*href="\)\([^"]*\)"\([^>]*>\)/\2/g; p'
>
> I have my concerns here, specificly the "[^h]". It will match "<a href"
> as intended, but it will also match all sorts of other undesired things.
>
> <a *href=
>
> is more reliable, matching spaces. It does presume no TAB characters.
>
> I would be inclined to use even more tr at the outset, to hammer the
> text flatter:
>
> ( tr '\011\012' ' ' < data.html ; echo ) | tr -s ' ' | sed ...
>
> So:
> - turn newlines and TABs into spaces
> - turn multiple spaces into a single space
>
> That way you can write patterns like this:
>
> <a href=
>
> without having to cope with more than one space; it makes the regexps
> much easier to write and read.
>
> Cheers,
> --
> Cameron Simpson <cs <at> zip.com.au> DoD#743
> http://www.cskk.ezoshosting.com/cs/
>
> My venus fly trap is higher up the food chain than I am.
> - simon <at> ohm.york.ac.uk (Simon Klyne)
>

--
Luigi Assom

Skype contact: oggigigi

[Non-text portions of this message have been removed]

__._,_.___
Recent Activity:
--
.

__,_._,___

Gmane