Amit Aronovitch | 16 May 2010 13:08
Picon

Re: [pyweb-il:995] ipython and unicode



On Sun, May 16, 2010 at 10:29 AM, benny daon <bennydaon <at> gmail.com> wrote:
Hi all,
I've finally started using the powerful iPython and found out it doesn't handle unicode chars very good. I found this ticket - http://projects.scipy.org/ipython/ipython/ticket/239 - which is just what I'm seeing except I'm using hebrew and not chinesse...
I'm using version 0.1 (installed using easy_install) on OSX. Any ideas?

Thanks,

Benny

IPython bugs were moved to github. This one (or equivalent) has some followup there :
http://github.com/ipython/ipython/issuesearch?state=open&q=encodings#issue/25

It seems to be caused by a line that was explicitly inserted by someone to resolve some other issue, so the solution is currently in some testing phase.

A simple hack to resolve this without interfering with your current installation: start ipython using the attached script instead of the packaged ipython command:

  Amit

Attachment (hackipy2.py): text/x-python, 2461 bytes
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Yitzhak Wiener | 24 May 2010 16:55
Favicon

[python-il]location in file

Hi Dear people,

 

I am searching for data in file. The file is from type of text. I was using RE for finding the location in file that I was interested in. So I found the string that begins the section, and now I want to read information from that section according to its address (offset from beginning of the file).

 

For better understanding here is part of my file:

41c0 06cb 41c0 06cb 41c0 06cb 41c0 06cb

41c0 06cb 41c0 06cb 41c0 06cb 41c0 06cb

41c0 06cb 41c0 06cb 41c0 06cb 41c0 06cb

41c0 06cb 41c0 06cb 41c0 06cb 41c0 06cb

41c0 06cb 41c0 06cb 41c0 06cb 41c0 06cb

41c0 06cb

 

48. MultiProgPage_Data <at> c2 - SECTION HEADER:

s_paddr   = 0x52a

s_vaddr   = 0x0

s_size    = 420

s_scnptr  = 244630

s_relptr  = 0

s_lnnoptr = 0

s_nreloc  = 0

s_nlnno   = 0

s_flags   = 0x8        ( Data section, Mapped in code memory )

RAW DATA:

0002 bec3 0002 d3f2 0002 c331 0002 d3d6

0000 0540 0001 10f4 0002 64a7 0002 ec5d

0002 ed11 0002 c3e4 0002 f908 0002 a139

0002 9aee 0002 b325 0002 ca22 0001 39ac

0000 0596 0002 d62a 0002 bb4e 0001 4ec6

0001 3d1a 0001 3cae 0001 4a85 0001 47ee

0001 19c8 0002 ff2b 0002 e535 0001 2ea7

0001 4772 0001 3db9 0000 c452 0001 4aa5

0002 85bb 0001 4c14 0001 4a25 0001 49d0

0002 f1d5 0001 3c77 0001 4afd 0001 4c09

0001 4a35 0001 48fa 0001 3ac0 0001 3771

0001 3d84 0001 4631 0001 30f4 0002 b9bf

0001 3224 0000 c43f 0002 eb9e 0002 400c

0002 f6d6 0001 44eb 0002 ef9d 0001 4a55

0001 4ab4 0001 416b 0002 4b50 0002 b72c

0002 9f7d 0001 4d94 0001 3695 0001 49be

0002 d60f 0002 5cff 0001 20da 0002 ad34

0000 f818 0001 342f 0002 c9a0 0001 4df7

0002 df25 0002 c235 0001 4879 0001 3f0e

0001 3891 0001 3b82 0001 3c3b 0001 43c1

0001 4ba8 0000 f837 0001 278c 0002 3c92

0002 ff0a 0001 4615 0001 46f0 0002 d819

0001 4d64 0001 4e32 0001 4193 0001 42a2

0002 7a74 0001 4547 0002 c76e 0002 d444

0002 7975 0002 7fef 0002 2fa0 0001 1f90

0002 fc3e 0001 4e50 0001 1ce7 0001 1b79

0001 2f9d

 

49. beeper_FW_bg_Task <at> c2 - SECTION HEADER:

 

 

The data that I need is colored in red. What I would like to do is according to input offset, read the hexadecimal word in red that located in that offset from beginning of the file or of the red section. For example offset 3 = d3f2.

What is a correct method to refer to this data?

FYI To get to the beginning of the section I used re.match as follows:

s = file(r" project_release.dump").read().splitlines()

bInsideSection = False

for line in s:

 

   if re.match(".*MultiProgPage_Code <at> c0 - SECTION HEADER.*", line, re.S):

      bInsideSection=True

   if re.match(r".*RAW DATA.*", line):

 

 

 

Thanks for your help,

Yitzhak

 


______________________________________________________________________
DSP Group, Inc. automatically scans all emails and attachments using MessageLabs Email Security System.
_____________________________________________________________________
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Shai Berger | 24 May 2010 17:34
Gravatar

Re: [python-il]location in file

Hi Yitzhak,

You said,
> 
> I am searching for data in file. The file is from type of text. I was
> using RE for finding the location in file that I was interested in.

but in the code, you wrote,

> for line in s:
[...]
>    if re.match(r".*RAW DATA.*", line):

That is, instead of finding the location in the FILE, you found the location 
in the LINE.

What you should do instead is get a string that contains your whole section; 
you can do this with regular expressions (applied to the whole file, s, with 
re.M), or you can do this by collecting the relevant lines after having split 
the file into lines. Then, just use section.split() to get a list of the 
"words" (as separated by whitespace) in the section.

Have fun,
	Shai.
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Yitzhak Wiener | 24 May 2010 17:58
Favicon

Re: [python-il]location in file

Hi Shai,

Thanks for the prompt reply.
As a really beginner, I think I partly understand your idea, but I don't
know how to do it. Can help with this? Assuming I prefer the first
option to search on the entire file, I would start as follows:
txt = file(r" project_release.dump").read()
#now I should find the next x hexadecimal words (x value is known) that
start after the string "RAW DATA:" in section that starts with "
MultiProgPage_Data <at> c2 - SECTION HEADER".
How do I do that?

 
Thanks,
Yitzhak

-----Original Message-----
From: python-il-bounces <at> hamakor.org.il
[mailto:python-il-bounces <at> hamakor.org.il] On Behalf Of Shai Berger
Sent: Monday, May 24, 2010 6:34 PM
To: python-il <at> hamakor.org.il
Subject: Re: [Python-il] [python-il]location in file

Hi Yitzhak,

You said,
> 
> I am searching for data in file. The file is from type of text. I was
> using RE for finding the location in file that I was interested in.

but in the code, you wrote,

> for line in s:
[...]
>    if re.match(r".*RAW DATA.*", line):

That is, instead of finding the location in the FILE, you found the
location 
in the LINE.

What you should do instead is get a string that contains your whole
section; 
you can do this with regular expressions (applied to the whole file, s,
with 
re.M), or you can do this by collecting the relevant lines after having
split 
the file into lines. Then, just use section.split() to get a list of the

"words" (as separated by whitespace) in the section.

Have fun,
	Shai.
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il

______________________________________________________________________
DSP Group, Inc. automatically scans all emails and attachments using
MessageLabs Email Security System.
_____________________________________________________________________

______________________________________________________________________
DSP Group, Inc. automatically scans all emails and attachments using MessageLabs Email Security System.
_____________________________________________________________________
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Shai Berger | 24 May 2010 18:13
Gravatar

Re: [python-il]location in file

I would ignore the number of words, and focus on headers. With the headers, we 
specify the part of the text we want; we use a capturing group to pick out 
only the interesting part.

section_header = "MultiProgPage_Data <at> c2 - SECTION HEADER"
next_section_header = "beeper_FW_bg_Task <at> c2 - SECTION HEADER"
part_header = "RAW DATA:"

pattern = "%s.*%s(.*)%s" % (section_header, part_header, next_section_header)

Then, just extract your section

match = re.search(pattern, txt, re.M)
if match:
	section = match.group(1)

	words = section.split()
	del words[-1] # This is the '49.'  of the next header

You might find a reading of http://docs.python.org/library/re.html, top to 
bottom, worthwhile.

Have fun,
	Shai.

On Monday 24 May 2010 18:58:54 Yitzhak Wiener wrote:
> Hi Shai,
> 
> Thanks for the prompt reply.
> As a really beginner, I think I partly understand your idea, but I don't
> know how to do it. Can help with this? Assuming I prefer the first
> option to search on the entire file, I would start as follows:
> txt = file(r" project_release.dump").read()
> #now I should find the next x hexadecimal words (x value is known) that
> start after the string "RAW DATA:" in section that starts with "
> MultiProgPage_Data <at> c2 - SECTION HEADER".
> How do I do that?
> 
> 
> 
> Thanks,
> Yitzhak
> 
> 
> -----Original Message-----
> From: python-il-bounces <at> hamakor.org.il
> [mailto:python-il-bounces <at> hamakor.org.il] On Behalf Of Shai Berger
> Sent: Monday, May 24, 2010 6:34 PM
> To: python-il <at> hamakor.org.il
> Subject: Re: [Python-il] [python-il]location in file
> 
> Hi Yitzhak,
> 
> You said,
> 
> > I am searching for data in file. The file is from type of text. I was
> > using RE for finding the location in file that I was interested in.
> 
> but in the code, you wrote,
> 
> > for line in s:
> 
> [...]
> 
> >    if re.match(r".*RAW DATA.*", line):
> 
> That is, instead of finding the location in the FILE, you found the
> location
> in the LINE.
> 
> What you should do instead is get a string that contains your whole
> section;
> you can do this with regular expressions (applied to the whole file, s,
> with
> re.M), or you can do this by collecting the relevant lines after having
> split
> the file into lines. Then, just use section.split() to get a list of the
> 
> "words" (as separated by whitespace) in the section.
> 
> Have fun,
> 	Shai.
> _______________________________________________
> Python-il mailing list
> Python-il <at> hamakor.org.il
> http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
> 
> ______________________________________________________________________
> DSP Group, Inc. automatically scans all emails and attachments using
> MessageLabs Email Security System.
> _____________________________________________________________________
> 
> ______________________________________________________________________
> DSP Group, Inc. automatically scans all emails and attachments using
>  MessageLabs Email Security System.
>  _____________________________________________________________________
>  _______________________________________________
> Python-il mailing list
> Python-il <at> hamakor.org.il
> http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
> 
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Tal Einat | 24 May 2010 18:57
Picon
Gravatar

Re: [python-il]location in file

Perhaps regular expressions are overkill in this case? Regexps are versatile but can be confusing and hard to read. I prefer to focus on the logic and try to write it out in readable code instead of complex regexps.

lines = open("...", 'rU').read().splitlines()

# get the data in a specific section
section_header_index = lines.index("MultiProgPage_Data <at> c2 - SECTION HEADER")
section_raw_data_index = section_header_index + lines[section_header_index:].index("RAW DATA:")
words = []
for line in lines[section_raw_data_index+1:]:
    line = line.strip() # ignore whitespace at beginning and end of lines
    if not line: # assume an empty line means the end of the raw data
        break
    words.extend(line.split())

Good Luck,
- Tal


On Mon, May 24, 2010 at 7:13 PM, Shai Berger <shai <at> platonix.com> wrote:
I would ignore the number of words, and focus on headers. With the headers, we
specify the part of the text we want; we use a capturing group to pick out
only the interesting part.

section_header = "MultiProgPage_Data <at> c2 - SECTION HEADER"
next_section_header = "beeper_FW_bg_Task <at> c2 - SECTION HEADER"
part_header = "RAW DATA:"

pattern = "%s.*%s(.*)%s" % (section_header, part_header, next_section_header)

Then, just extract your section

match = re.search(pattern, txt, re.M)
if match:
       section = match.group(1)

       words = section.split()
       del words[-1] # This is the '49.'  of the next header


You might find a reading of http://docs.python.org/library/re.html, top to
bottom, worthwhile.

Have fun,
       Shai.


On Monday 24 May 2010 18:58:54 Yitzhak Wiener wrote:
> Hi Shai,
>
> Thanks for the prompt reply.
> As a really beginner, I think I partly understand your idea, but I don't
> know how to do it. Can help with this? Assuming I prefer the first
> option to search on the entire file, I would start as follows:
> txt = file(r" project_release.dump").read()
> #now I should find the next x hexadecimal words (x value is known) that
> start after the string "RAW DATA:" in section that starts with "
> MultiProgPage_Data <at> c2 - SECTION HEADER".
> How do I do that?
>
>
>
> Thanks,
> Yitzhak
>
>
> -----Original Message-----
> From: python-il-bounces <at> hamakor.org.il
> [mailto:python-il-bounces <at> hamakor.org.il] On Behalf Of Shai Berger
> Sent: Monday, May 24, 2010 6:34 PM
> To: python-il <at> hamakor.org.il
> Subject: Re: [Python-il] [python-il]location in file
>
> Hi Yitzhak,
>
> You said,
>
> > I am searching for data in file. The file is from type of text. I was
> > using RE for finding the location in file that I was interested in.
>
> but in the code, you wrote,
>
> > for line in s:
>
> [...]
>
> >    if re.match(r".*RAW DATA.*", line):
>
> That is, instead of finding the location in the FILE, you found the
> location
> in the LINE.
>
> What you should do instead is get a string that contains your whole
> section;
> you can do this with regular expressions (applied to the whole file, s,
> with
> re.M), or you can do this by collecting the relevant lines after having
> split
> the file into lines. Then, just use section.split() to get a list of the
>
> "words" (as separated by whitespace) in the section.
>
> Have fun,
>       Shai.
> _______________________________________________
> Python-il mailing list
> Python-il <at> hamakor.org.il
> http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
>
> ______________________________________________________________________
> DSP Group, Inc. automatically scans all emails and attachments using
> MessageLabs Email Security System.
> _____________________________________________________________________
>
> ______________________________________________________________________
> DSP Group, Inc. automatically scans all emails and attachments using
>  MessageLabs Email Security System.
>  _____________________________________________________________________
>  _______________________________________________
> Python-il mailing list
> Python-il <at> hamakor.org.il
> http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
>
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il

_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Shai Berger | 25 May 2010 11:47
Gravatar

Re: [python-il]location in file

Hi Yitzhak,

First of all, please keep the discussion public.

On Tuesday 25 May 2010 12:04:47 you wrote:
> 
> Thanks a lot. It works, but it takes a vvvvvveeeeeeeeerrrrrryyyyyyyy
> llllllllllllllllooooooooooonnnnnnnnnnnnnnggggggggggggggggg time to
> complete, something like 10 minutes. Why? Is there a way to make it
> significantly faster?
> 

That is quite odd; regular expression search of this kind is usually quite 
fast, even when the files are large (how large is your file?).

Can you try the different parts separately to find out where the problem is?

(I'm leaving the old messages in as they were not sent to the list)

Shai.

> -----Original Message-----
> From: Shai Berger [mailto:shai <at> platonix.com]
> Sent: Monday, May 24, 2010 11:14 PM
> To: Yitzhak Wiener
> Subject: Re: [Python-il] [python-il]location in file
> 
> On Monday 24 May 2010, you wrote:
> > Shai, thanks.
> >
> > What type is 'words'? I wanted to print it but " name 'words' is not
> > defined "!
> 
> Serves me right for posting untested code...
> 
> words is a list of strings, but it is only assigned if the search is
> successful; and that only happens when using re.S instead of re.M (you
> had
> re.S in your original, line-separating code, where it did no good; it
> only
> matters when you search multiline texts. I had confused re.M for re.S).
> 
> Sorry,
> 	Shai.
> 
> ______________________________________________________________________
> DSP Group, Inc. automatically scans all emails and attachments using
> MessageLabs Email Security System.
> _____________________________________________________________________
> 
> ______________________________________________________________________
> DSP Group, Inc. automatically scans all emails and attachments using
>  MessageLabs Email Security System.
>  _____________________________________________________________________
> 
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Yitzhak Wiener | 25 May 2010 12:20
Favicon

Re: [python-il]location in file

Hi Shai,

The file is indeed large, ~6MB.
I added print lines before/after each line and found that the only line
that consumes more than 1 second was: " match = re.search(pattern, txt,
re.S) ", it consumed ~5 minutes!

 
Best Regards,
Yitzhak

-----Original Message-----
From: Shai Berger [mailto:shai <at> platonix.com] 
Sent: Tuesday, May 25, 2010 12:47 PM
To: Yitzhak Wiener; python-il <at> hamakor.org.il
Subject: Re: [Python-il] [python-il]location in file

Hi Yitzhak,

First of all, please keep the discussion public.

On Tuesday 25 May 2010 12:04:47 you wrote:
> 
> Thanks a lot. It works, but it takes a vvvvvveeeeeeeeerrrrrryyyyyyyy
> llllllllllllllllooooooooooonnnnnnnnnnnnnnggggggggggggggggg time to
> complete, something like 10 minutes. Why? Is there a way to make it
> significantly faster?
> 

That is quite odd; regular expression search of this kind is usually
quite 
fast, even when the files are large (how large is your file?).

Can you try the different parts separately to find out where the problem
is?

(I'm leaving the old messages in as they were not sent to the list)

Shai.

> -----Original Message-----
> From: Shai Berger [mailto:shai <at> platonix.com]
> Sent: Monday, May 24, 2010 11:14 PM
> To: Yitzhak Wiener
> Subject: Re: [Python-il] [python-il]location in file
> 
> On Monday 24 May 2010, you wrote:
> > Shai, thanks.
> >
> > What type is 'words'? I wanted to print it but " name 'words' is not
> > defined "!
> 
> Serves me right for posting untested code...
> 
> words is a list of strings, but it is only assigned if the search is
> successful; and that only happens when using re.S instead of re.M (you
> had
> re.S in your original, line-separating code, where it did no good; it
> only
> matters when you search multiline texts. I had confused re.M for
re.S).
> 
> Sorry,
> 	Shai.
> 
> ______________________________________________________________________
> DSP Group, Inc. automatically scans all emails and attachments using
> MessageLabs Email Security System.
> _____________________________________________________________________
> 
> ______________________________________________________________________
> DSP Group, Inc. automatically scans all emails and attachments using
>  MessageLabs Email Security System.
>  _____________________________________________________________________
> 

______________________________________________________________________
DSP Group, Inc. automatically scans all emails and attachments using
MessageLabs Email Security System.
_____________________________________________________________________

______________________________________________________________________
DSP Group, Inc. automatically scans all emails and attachments using MessageLabs Email Security System.
_____________________________________________________________________
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Shai Berger | 25 May 2010 12:41
Gravatar

Re: [python-il]location in file

Hi again Yitzhak,

On Tuesday 25 May 2010 13:20:38 Yitzhak Wiener wrote:
> The file is indeed large, ~6MB.
> I added print lines before/after each line and found that the only line
> that consumes more than 1 second was: " match = re.search(pattern, txt,
> re.S) ", it consumed ~5 minutes!
> 

There are few things to try.

First is the easiest: In the pattern, change the two occurrences of ".*" to 
".*?". In this case, this should not change the result, but -- especially if 
most of the file is after the text you're looking for -- it should improve 
timings (".*" means "the longest possible string of characters", and ".*?" 
means "the shortest possible"; I'm assuming there is only one possible string, 
being both longest and shortest; why this should change the timing is left as 
an exercise).

If this doesn't help, perhaps you can do some smart cutting of the file to 
pieces before the search.

Have fun,
	Shai.
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il
Yitzhak Wiener | 26 May 2010 18:47
Favicon

Re: [python-il]location in file

Hi Shai,

It worked. Thanks.
Adding the '?' after the '*' solved the time problem. I found it in the
python documentation but didn't really understand the logic of that. Why
it has effect? The '*' is before so it should still be greedy according
to logic!? Shouldn't it?

 
Best Regards,
Yitzhak

-----Original Message-----
From: Shai Berger [mailto:shai <at> platonix.com] 
Sent: Tuesday, May 25, 2010 1:42 PM
To: Yitzhak Wiener
Cc: python-il <at> hamakor.org.il
Subject: Re: [Python-il] [python-il]location in file

Hi again Yitzhak,

On Tuesday 25 May 2010 13:20:38 Yitzhak Wiener wrote:
> The file is indeed large, ~6MB.
> I added print lines before/after each line and found that the only
line
> that consumes more than 1 second was: " match = re.search(pattern,
txt,
> re.S) ", it consumed ~5 minutes!
> 

There are few things to try.

First is the easiest: In the pattern, change the two occurrences of ".*"
to 
".*?". In this case, this should not change the result, but --
especially if 
most of the file is after the text you're looking for -- it should
improve 
timings (".*" means "the longest possible string of characters", and
".*?" 
means "the shortest possible"; I'm assuming there is only one possible
string, 
being both longest and shortest; why this should change the timing is
left as 
an exercise).

If this doesn't help, perhaps you can do some smart cutting of the file
to 
pieces before the search.

Have fun,
	Shai.

______________________________________________________________________
DSP Group, Inc. automatically scans all emails and attachments using
MessageLabs Email Security System.
_____________________________________________________________________

______________________________________________________________________
DSP Group, Inc. automatically scans all emails and attachments using MessageLabs Email Security System.
_____________________________________________________________________
_______________________________________________
Python-il mailing list
Python-il <at> hamakor.org.il
http://hamakor.org.il/cgi-bin/mailman/listinfo/python-il

Gmane