Re: mapping entries patterns help
Here are two possible approaches.
(1) If the characters you want to allow are those in some range, use a "glob" match.
For instance, $[!-~]% will match any character in the US-ASCII range between ! and ~,
while $[!-~]* will match any number of any such characters.
(2) Use $T% to ensure that at least one space/htab/vtab character is present, with
subsequent $T* to get any additional spaces, and then take advantage of "minimal"
matching, indicated via $_, to have your * wildcards match as little as possible
between such "forced" $T%$T* matches. That is, by including a $T% to ensure that
your space matches "get" at least one space, then using "minimal" matching "between"
such spaces gets you everything-up-to-but-not-including-the-space.
Either one of these approaches, but perhaps especially (2), needs some more work
(probably some additional mapping table entries) if you also need to handle cases
of fewer than five space-separated words. (Also, what do you want to do about
possible leading spaces?) So take these as starting points...
For example, here are two mapping tables:
X-5WORDS-GLOB
$[!-~]%$[!-~]*$ <at> T*$[!-~]%$[!-~]*$ <at> T*$[!-~]%$[!-~]*$ <at> T*$[!-~]%$[!-~]*$ <at> T*$[!-~]%$[!-~]*$ <at> T*$ <at> * \
$0$1_$2$3_$4$5_$6$7_$8$9$Y
X-5WORDS-MINIMAL
%$_*$ <at> T%$ <at> T*%$_*$ <at> T%$ <at> T*%$_*$ <at> T%$ <at> T*%$_*$ <at> T%$ <at> T*%$_*$ <at> T%$ <at> T*$ <at> * \
$0$1_$2$3_$4$5_$6$7_$8$9$Y
each of which expect at least five space separated (terminated) words in the input
string, and which will output <word1>_<word2>_<word3>_<word4>_<word5>, discarding any
additional text. Note the use of the " <at> " prefix on the $T matches, to "turn off
saving" of that wildcard -- that's so that the matches that we want to save
to reuse in a substitution on the right hand side (the matches to the first five words,
split into first character, rest of word) only count up to substitution $9.
# imsimta test -mapping -table=X-5WORDS-GLOB
Input string: one two three four five more text
Output string: one_two_three_four_five
Output flags: [0, 'Y' (89)]
or
# imsimta test -mapping -table=X-5WORDS-MINIMAL
Input string: one two three four five more text
Output string: one_two_three_four_five
Output flags: [0, 'Y' (89)]
One more note: When coming up with these sorts of more complex matching
templates, keep in mind that imsimta test -match can be your friend! I find
it really helpful for refining such templates. E.g.,
# imsimta test -match
Pattern: $[!-~]%$[!-~]*$ <at> T*$[!-~]%$[!-~]*$ <at> T*$[!-~]%$[!-~]*$ <at> T*$[!-~]%$[!-~]*$ <at> T
*$[!-~]%$[!-~]*$ <at> T*$ <at> *
[ 1S] cchar [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 2S] cglob [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 3] cglob []
[ 4S] cchar [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 5S] cglob [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 6] cglob []
[ 7S] cchar [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 8S] cglob [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 9] cglob []
[ 10S] cchar [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 11S] cglob [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 12] cglob []
[ 13S] cchar [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 14S] cglob [!"#$%&'()*+,-./0123456789:;<=>? <at> ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
abcdefghijklmnopqrstuvwxyz{|}~]
[ 15] cglob []
[ 16] glob, req -1, reps 0
Target: one two three four five more text
Match.
0 - o
1 - ne
2 - t
3 - wo
4 - t
5 - hree
6 - f
7 - our
8 - f
9 - ive
Regards,
Kristin
On Mar 7, 2012, at 3:27 PM, Victor Shum wrote:
>
> We are on MS 7u4.
>
>
>
> We are trying to craft a filter mapping (called by sieve filter) to get the first 5 words of a subject. we
created something like this in the mappings file:
>
>
>
> FILTER_parsesubject
>
>
>
> first5words|$S*$T*$S*$T*$S*$T*$S*$T*$S** $0$ $2$ $4$ $6$ $8$Y$E
>
>
>
>
>
> We thought this works, until we came across characters like ":" in the subject line that apparently won't
match by $S*. Is there any metacharacter that represents more character sets than $S*, something like NOT
$T* to cover everthing that is not space or tab?
>
>
>
>
>
> Victor