Dave Kuhlman | 3 Feb 2007 00:39

Support tools for analyzing pages on the Web

I'd like to implement and explore tools for analyzing Web pages.  I
have in mind things like:

- Tracing links from a Web page.  Building a tree structure of
  links to a specified depth.

- Tracing links to a Web page.  Showing incoming links to a
  specified depth.

- Word count, word frequency analysis, words in context, etc.

- Etc.

Basically, I'm interested in looking at the structure of the Web
and trying to help make it useful.

So, my question: Are there existing tools (in Python) of course for
this kind of thing.  I'd like (1) not to reinvent what is already
there and (2) to make use of what already exists.

I've done a few Web searches, but have not found that much of
interest.

I plan to start with BeautifulSoup.py at a minimum.

Thanks for help.

And, I'd be interested in any ideas and suggestions.

Dave
(Continue reading)

Christian Wyglendowski | 3 Feb 2007 04:41
Gravatar

Re: Support tools for analyzing pages on the Web

On 2/2/07, Dave Kuhlman <dkuhlman@...> wrote:
> I'd like to implement and explore tools for analyzing Web pages.  I
> have in mind things like:
>
> - Tracing links from a Web page.  Building a tree structure of
>   links to a specified depth.
>
> - Tracing links to a Web page.  Showing incoming links to a
>   specified depth.
>
> - Word count, word frequency analysis, words in context, etc.
>
> - Etc.
>
> Basically, I'm interested in looking at the structure of the Web
> and trying to help make it useful.

Sounds like an interesting project.

> So, my question: Are there existing tools (in Python) of course for
> this kind of thing.  I'd like (1) not to reinvent what is already
> there and (2) to make use of what already exists.

Well, for your analysis phase, I would look at the Natural Language
Tool Kit (NLTK) [1].  I haven't used it personally, but I have always
wanted to try it out.  The documentation is great.

> I've done a few Web searches, but have not found that much of
> interest.
>
(Continue reading)

Titus Brown | 9 Feb 2007 08:54
Picon
Favicon

wsgiref and wsgi.multithread/wsgi.multiprocess

Hi folks,

I just ran into an interesting sanity check problem, and I was hoping
you could all cross-check *my* sanity.

Should the WSGI environ variables 'wsgi.multithread' and
'wsgi.multiprocess' be set to 'True' in
wsgiref.simple_server.WSGIServer?

They are, currently, but I see no indication in WSGIServer
(inheriting from BaseHTTPServer.HTTPServer) of multithreadedness
or multiprocessedness.

thanks,
--titus
_______________________________________________
Web-SIG mailing list
Web-SIG@...
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/gcpw-web-sig%40m.gmane.org

Phillip J. Eby | 9 Feb 2007 18:10
Gravatar

Re: wsgiref and wsgi.multithread/wsgi.multiprocess

At 11:54 PM 2/8/2007 -0800, Titus Brown wrote:
>Hi folks,
>
>I just ran into an interesting sanity check problem, and I was hoping
>you could all cross-check *my* sanity.
>
>Should the WSGI environ variables 'wsgi.multithread' and
>'wsgi.multiprocess' be set to 'True' in
>wsgiref.simple_server.WSGIServer?
>
>They are, currently, but I see no indication in WSGIServer
>(inheriting from BaseHTTPServer.HTTPServer) of multithreadedness
>or multiprocessedness.

Yeah, multiprocess should probably be set false there, and 
multithreadedness should depend on whether the ThreadingTCPServer or 
whatever it's called is mixed in.  (HTTPServer does in fact support this, 
but it's not tested in a WSGI context as far as I know.)

_______________________________________________
Web-SIG mailing list
Web-SIG@...
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/gcpw-web-sig%40m.gmane.org

Titus Brown | 9 Feb 2007 18:56
Picon
Favicon

Re: wsgiref and wsgi.multithread/wsgi.multiprocess

On Fri, Feb 09, 2007 at 12:10:00PM -0500, Phillip J. Eby wrote:
-> At 11:54 PM 2/8/2007 -0800, Titus Brown wrote:
-> >Hi folks,
-> >
-> >I just ran into an interesting sanity check problem, and I was hoping
-> >you could all cross-check *my* sanity.
-> >
-> >Should the WSGI environ variables 'wsgi.multithread' and
-> >'wsgi.multiprocess' be set to 'True' in
-> >wsgiref.simple_server.WSGIServer?
-> >
-> >They are, currently, but I see no indication in WSGIServer
-> >(inheriting from BaseHTTPServer.HTTPServer) of multithreadedness
-> >or multiprocessedness.
-> 
-> Yeah, multiprocess should probably be set false there, and 
-> multithreadedness should depend on whether the ThreadingTCPServer or 
-> whatever it's called is mixed in.  (HTTPServer does in fact support this, 
-> but it's not tested in a WSGI context as far as I know.)

OK.  Err, do you want a patch? ;)

The problem I'm running into is that our (Mike Orr & I) WSGI interface
for Quixote does a check to make sure that the Quixote application is
explicitly marked as threadsafe before allowing a multithreaded WSGI
server to run it.  I can't bring myself to remove this sanity check,
because it does seem like a good idea, but it makes the example code a
bit more complicated...

--titus
(Continue reading)

Tassos Koutsovassilis | 14 Feb 2007 23:30

ANN: Porcupine Web Application Server v0.0.9 released

The inno:script team announces the new release of Porcupine server. This 
release introduces remarkable new features on the server side including 
a configurable in-memory object cache and a new post-processing filter 
for easy output i18n. Due to the method decorators used, Porcupine is no 
longer compatible with Python 2.3. We also recommend sub-classing the 
new type of QuiX servlet (XULSimpleTemplateServlet) instead of the 
primitive XULServlet class. The new type takes advantage of the new 
Python "string.Template" module, resulting in simpler and more readable 
QuiX templates.
By default, the object cache is configured for keeping up to 500 
objects. You can change this setting by editing the main Porcupine 
configuration file. Also keep in mind that each post processing filter 
is now declared as a child node of its registration node. See the store 
registrations file "store.xml" as a usage guideline.
On the browser side, QuiX adds minor improvements to better support 
Internet Explorer 7 but also includes many minor bug fixes. Last but not 
least, the rendering performance is greatly improved by minimizing the 
number of redraws required when drawing new interfaces from XML. As a 
side effect of this optimization, you might need an extra call to the 
"redraw" method of some of your dynamically added widgets in order to 
have them displayed correctly.

Enjoy.

Resources
=========
What is Porcupine?
http://www.innoscript.org/content/view/30/42/

Porcupine Downloads:
(Continue reading)

CLIFFORD ILKAY | 19 Feb 2007 15:50

Django Presentation at PyGTA Meeting on Feb. 20

Hello,

I will be presenting an overview of the Django web framework 
<http://djangoproject.com> at the monthly PyGTA (Greater Toronto Area 
Python user group) meeting on Feb. 20. Django is represented as 
being "The Web framework for perfectionists with deadlines." In my 
experience, that is an apt description. I have found it to be 
coherent, powerful, well-documented, and very approachable. The 
support one can get on the Django IRC channel (irc.freenode.net, 
#django) and the Google Group 
<http://groups.google.com/group/django-users/> is very good. There is 
an on-line book at <http://djangobook.com>, which fleshes out the 
documentation on the main site 
<http://www.djangoproject.com/documentation/> and the Wiki 
<http://code.djangoproject.com/>.

When
----
Feb. 20, 2007 - 6:30 p.m. - informal part of the meeting where we can 
get (non-alcoholic) drinks and socialize
7:00 p.m. - formal part of the meeting starts (formal wear not 
required)
Between 8:30 p.m. to 9:00 p.m. - wrap up and go to a nearby restaurant 
for beer, ice cream, hot chocolate, nibbles, sparkling conversation, 
etc.

Where
-----
LinuxCaffe (yes, that is how it is spelled)
<http://linuxcaffe.ca/location>
(Continue reading)

Robert Brewer | 23 Feb 2007 04:07

The web dudes pad is open for business

Chad Whitacre (of Aspen fame) and I got a nice suite across the street (from the PyCon hotel) at the Residence Inn, room 121. All web dudes welcome.

We've got a kitchen, fireplace, sofa, and 3 TV's. I just stocked the fridge with Heineken and Diet Coke, plus mudslide and blue margarita fixin's. The jacuzzi's open, too, if you brought trunks. Feel free to drop by anytime; but call me first: 619 846-5585 (they lock everything around here). I'm here 'til Monday morning.


Robert Brewer
CherryPy Team

_______________________________________________
Web-SIG mailing list
Web-SIG@...
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/gcpw-web-sig%40m.gmane.org
Chad Whitacre | 23 Feb 2007 14:06

Re: The web dudes pad is open for business

> The jacuzzi's open, too, if you brought trunks.

We have a jacuzzi!? Please tell me it's not heart-shaped ...

You guys have fun today, I don't get in until tonight. :^(

chad
_______________________________________________
Web-SIG mailing list
Web-SIG@...
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/gcpw-web-sig%40m.gmane.org

Titus Brown | 23 Feb 2007 16:30
Picon
Favicon

Re: The web dudes pad is open for business

now doesn't everyone wish they were at PyCon, too? ;)

On Fri, Feb 23, 2007 at 08:06:48AM -0500, Chad Whitacre wrote:
-> > The jacuzzi's open, too, if you brought trunks.
-> 
-> We have a jacuzzi!? Please tell me it's not heart-shaped ...
-> 
-> You guys have fun today, I don't get in until tonight. :^(
-> 
-> 
-> 
-> chad
-> _______________________________________________
-> Web-SIG mailing list
-> Web-SIG@...
-> Web SIG: http://www.python.org/sigs/web-sig
-> Unsubscribe: http://mail.python.org/mailman/options/web-sig/titus%40caltech.edu
-> 
_______________________________________________
Web-SIG mailing list
Web-SIG@...
Web SIG: http://www.python.org/sigs/web-sig
Unsubscribe: http://mail.python.org/mailman/options/web-sig/gcpw-web-sig%40m.gmane.org


Gmane