awk-xml
ricardo gamboa <ric_tico <at> yahoo.com>
2008-04-09 15:16:30 GMT
Buenos Dias gente!
Queria ver si alguien me puede ayudar con awk:
El archivo xml,tiene esta syntaxis:
En una sola linea tiene toda la oracion.
#cat newfile
<instance id="bass.v.bnc.001" docsrc="BNC">
<context>
I went fishing for some sea <head>bass</head> .
</context>
</instance>
<instance id="bass.v.bnc.002" docsrc="BNC">
<context>
The <head>bass</head> part of the song is very moving.
</context>
</instance>
<instance id="program.v.bnc.001" docsrc="BNC">
<context>
he proposed an elaborate <head>program</head> of
public works . This
information was taken
</context>
</instance>
<instance id="program.v.bnc.002" docsrc="BNC">
<context>
the <head>program</head> required several hundred
lines of code .
</context>
</instance>
<instance id="smell.v.bnc.001" docsrc="BNC">
<context>
It 's making me annoyed .I did n't want to stay there
and I did n't
want to go to Combe Court , cos I hate it and it
<head>smells</head>
and the Captain slobbers in his food and Christmas is
horrible with no good prezzies and Annie not there .
Why did n't you visit me ? Why not ?
</context>
</instance>
Resulta que estoy haciendo un script que hace lo
siguiente, obtener 3 palabras atras,y 3 palabras
despues del tag <head></head> , excepto las palabras
que tengan un length mayor a 2.
Debiendo retornar esto:
#----------------------------------
for some sea bass
The bass part the song
proprosed elaborate program public works
the program required several hundred
cos hate and smells and the Captain
La buena noticia,es que ya esta retornando algo
parecido,a esto,pero con un poco de "basura", para ver
si me pueden ayudar a mejorar el codigo :)
Obteniendo esto:
#cat m
context
for some sea bass
/context
/instance
/instance
context
The bass part the song
/context
/instance
/instance
context
proposed elaborate program public works
/context
/instance
/instance
context
the program required several hundred
/context
/instance
/instance
context
cos hate and smells and the Captain
/context
/instance
#
#cat solucion.awk
/context/{flag=1} /\/context/{flag=0} !/context/{
if (flag==1)
gsub (/[,;:]/, " ", $0) ;
gsub (/[.]/, " . ", $0) }
/<.*>/ { for (i = 1; i <= NF; i++)
if ($i~/<.*>/) { s = substr ($i, 2,
length($i)-2)
c = 0
for (j = i-1; j > 0 && c != 3 && $j !=
"." ; j--)
if (length($j)>2) { s = $j FS s ; c++
}
c = 0
for (j = i+1; j <= NF && c != 3 && $j
!= "." ; j++)
if (length($j)>2) { s = s FS $j ; c++
}
}
print s
}
#
Saludos,
-ric
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
--
--
Desuscripción: escriba a gulcr-request <at> listas.linux.or.cr, tema 'unsubscribe'
Problemas a: listmaster <at> listas.linux.or.cr. http://gulcr.org/ListasDeCorreo