Todd A. Jacobs | 1 Jun 2007 01:10
Favicon

Re: Checking environment?

On Thu, May 31, 2007 at 01:18:07PM -0600, Ashley M. Kirchner wrote:

>     Is there some way to check whether a (bash shell) script is being
>     called from the command line or via a web process?  I need a way

Not as such, but you can use some tricks to determine if you're being
called from an interactive process on a relatively standard system. For
example:

    - shopt login_shell 	# Is shell a login shell?
    - test -n "$PS1"		# Is shell interactive?

Keep in mind that this is based on standard behavior, and you can do all
sorts of weird things that make this sort of result unreliable. For
example, starting shells with "bash -i" or unsetting PS1 in your
~/.bashrc file would seriously confuse the issue. So don't do those
things. :)

Most CGI environment variables are server- and request-specific, but the
specification says that they all need to set the server name:

    test -n "$SERVER_NAME"

You can also try such variables as DOCUMENT_ROOT (would this ever not be
set?) or some other relevant piece of info you could rely on as being
only set by your web server.

--

-- 
"Oh, look: rocks!"
	-- Doctor Who, "Destiny of the Daleks"
(Continue reading)

Robert P. J. Day | 1 Jun 2007 12:14
Picon

testing -- is this list still alive?


rday

--

-- 
========================================================================
Robert P. J. Day
Linux Consulting, Training and Annoying Kernel Pedantry
Waterloo, Ontario, CANADA

http://fsdev.net/wiki/index.php?title=Main_Page
========================================================================
_______________________________________________
Shell.scripting mailing list
Shell.scripting@...
http://moongroup.com/mailman/listinfo/shell.scripting
Charles Galpin | 1 Jun 2007 14:18
Favicon

Re: testing -- is this list still alive?

The moongroup one is deprecated. We are using  
shell.scripting@... now.

charles

On Jun 1, 2007, at 6:14 AM, Robert P. J. Day wrote:

>
> rday
>
> -- 
> ====================================================================== 
> ==
> Robert P. J. Day
> Linux Consulting, Training and Annoying Kernel Pedantry
> Waterloo, Ontario, CANADA
>
> http://fsdev.net/wiki/index.php?title=Main_Page
> ====================================================================== 
> ==
> _______________________________________________
> Shell.scripting mailing list
> Shell.scripting@...
> http://moongroup.com/mailman/listinfo/shell.scripting
> _______________________________________________
> Shell.Scripting mailing list
> Shell.Scripting@...
> http://www2.codegnome.org:59321/cgi-bin/mailman/listinfo/ 
> shell.scripting
(Continue reading)

Todd A. Jacobs | 2 Jun 2007 21:43
Favicon

Screen-scraping how-to?

So, I'm doing some job searches, and am finding more and more sites
(dice.com, anyone?) that no long have an email address for resume
submissions. I like to keep track of who I've sent resumes to, and
prefer to send them in my own ASCII formats, which is why I wrote
tkresume all those years ago.

Anyway, dice is now doing something like this:

    1. You look up a job. The url looks something like http://seeker.dice.com/jobsearch/servlet/JobSearch?op=302&dockey=xml/a/7/a7c6465573eb5ca49ad684aabe69c85f <at> endecaindex&source=19&FREE_TEXT=Linux+InfoSec+Unix+Firewalls%2FIDS+Windows+NT+Network+Design%2FMgmt+Novell+IDM+%26+LDAP+Firewall+IDS&rating=99

    2. You click on the apply button. The url looks something like http://seeker.dice.com/jobsearch/servlet/JobSearch?op=304&dockey=xml/a/7/a7c6465573eb5ca49ad684aabe69c85f <at> endecaindex&source=19&ral=JobSearch?op=302%26FREE_TEXT=Linux%20InfoSec%20Unix%20Firewalls/IDS%20Windows%20NT%20Network%20Design/Mgmt%20Novell%20IDM%20&amp;%20LDAP%20Firewall%20IDS%26rating=99%26source=19%26dockey=xml/a/7/a7c6465573eb5ca49ad684aabe69c85f <at> endecaindex

    3. You end up at a web form with entry fields for email address and
       document uploads. Lots of hidden form fields and a POST action.

What I really want to do is do some sort of screen scraping based on the
top-level URL, where I can pass the "apply" link to some script which
will handle the rest.

Can anyone recommend a place to start, or some tool that will get me
part of the way there? I can't even find any good books on
screen-scraping techniques, and dread the idea of coding something from
scratch if it isn't necessary.

Ideas?

--

-- 
"Oh, look: rocks!"
	-- Doctor Who, "Destiny of the Daleks"
_______________________________________________
(Continue reading)

Todd A. Jacobs | 2 Jun 2007 22:11
Favicon

Re: Screen-scraping how-to?

On Sat, Jun 02, 2007 at 12:43:58PM -0700, Todd A. Jacobs wrote:

> Can anyone recommend a place to start, or some tool that will get me
> part of the way there? I can't even find any good books on

By the way, I just ordered _Perl & LWP_ (0-596-00178-9) from Amazon, and
am reading up on WWW::Mechanize, but it's pretty confusing stuff. Blech.
:)

--

-- 
"Oh, look: rocks!"
	-- Doctor Who, "Destiny of the Daleks"
Todd A. Jacobs | 3 Jun 2007 00:44
Favicon

Re: Screen-scraping how-to?

Here's a bash-only solution that seems like it should work, but doesn't.
Cutting and pasting the constructed URL into an open browser window
doesn't do what I expect.

Using LiveHTTPHeaders in Firefox/Iceweasel shows that the data in a
working session gets submitted with "Content-Type: multipart/form-data"
so there must be more to it than just creating the right URL. Argh!

Anyway, here's my unsuccessful first try, just for edification:

#!/bin/bash

## 
## Name:
##	dice_scrape.sh
##
## Version:
##	$Revision: 1.2 $
##
## Purpose:
##	Shell script for responding to job postings on dice.com without
##	having to manually navigate the web forms.
##
## Usage:
##	dicescrape.sh [ -u | -h ]
##	dicescrape.sh [ -r <resume> | -e <email> ] <URL>
##
## Options:
##	-h = show documentation
##	-u = show usage
(Continue reading)

Russell Evans | 3 Jun 2007 05:37
Favicon
Gravatar

Re: Screen-scraping how-to?

On Sat, 2 Jun 2007 12:43:58 -0700
"Todd A. Jacobs" <nospam@...> wrote:

> Can anyone recommend a place to start, or some tool that will get me
> part of the way there? 

Get wireshark, http://www.wireshark.org/ , on your system. Go to the apply web page, fill in the page form,
but don't hit the submit,  Start wireshark and start a capture on your network card, it will be help full if
there isn't any other traffic. Hit the submit button in your browser. Go back to wireshark and stop the
capture. Click on one of the captured http packets and then under the menu Analyze, click on Follow TCP
Stream. 

In the window that pops up you'll be looking for the red text for what is posted to the web site.

I used the following, the pspi file is a shell script so don't get confused by the #/!/bin/sh below. 
email: fgdf@...
Resume: /home/revans/pspi
Text: jty fdhehe threhr

What I got via wireshark
POST /jobsearch/servlet/JobSearch HTTP/1.1
Host: seeker.dice.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.4) Gecko/20070529 SUSE/2.0.0.4-6.1 Firefox/2.0.0.4
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://seeker.dice.com/jobsearch/servlet/JobSearch?op=304&dockey=xml/a/7/a7c6465573eb5ca49ad684aabe69c85f <at> endecaindex&source=19&ral=JobSearch?op=302%26FREE_TEXT=Linux%20InfoSec%20Unix%20Firewalls/IDS%20Windows%20NT%20Network%20Design/Mgmt%20Novell%20IDM%20&amp;%20LDAP%20Firewall%20IDS%26rating=99%26source=19%26dockey=xml/a/7/a7c6465573eb5ca49ad684aabe69c85f <at> endecaindex
(Continue reading)

Todd A. Jacobs | 3 Jun 2007 06:23
Favicon

Re: Screen-scraping how-to?

Here's a ruby script I cobbled together that actually works (I've
managed to submit three resumes, verified by the CC option), but it's
pretty bare-bones at the moment. There's pretty much no sanity checking
at all, so caveat emptor.

My biggest need is to figure out how to determine whether the script has
succeeded; is there the equivalent of bash's $? in ruby, so that I can
determine whether the form submission worked as expected?

#!/usr/bin/ruby

require 'mechanize'

my_email = 'foo@...'
my_resume = '/home/foo/resume.wri'

agent = WWW::Mechanize.new

# FREE_TEXT seems to cause problems with URL encoding, so ditch it!
# Rating seems to cause some problems too, so overboard with that as
# well.
url = ARGV[0].sub(/FREE_TEXT=.*&/, '')
url = url.sub(/rating=.*&/, '')

page = agent.get(url)
page = agent.click page.links.text('Click Here to Apply')

reply_form = page.forms.with.name('APPLICATION_FORM').first
reply_form.replyaddr = my_email
reply_form.checkboxes.name('SEEKER_CC').check
(Continue reading)

Todd A. Jacobs | 3 Jun 2007 06:30
Favicon

Re: Screen-scraping how-to?

On Sat, Jun 02, 2007 at 11:37:47PM -0400, Russell Evans wrote:

> What I got via wireshark POST /jobsearch/servlet/JobSearch HTTP/1.1

Thanks, Russell. Yeah, I get very similar output with LiveHTTPHeaders in
Firefox--it's just a little less painful to use when trying to track web
transactions.

I'm not always comfortable using libraries like WWW::Mechanize because I
really don't understand what they're doing under the hood. On top of
that, Ruby isn't as self-evident (nor as well-documented) as Perl, but
the syntax from some of the examples around the web gave rise to the
code I munged together this evening.

I'm pretty sure I can whack together some code for submission tracking
and duplicate prevention, but I really have no idea at this point how to
ensure that submissions are error-free without viewing the actual page
itself.

Weirdly enough, I'm not as sure how to bullet-proof a Ruby app as I am a
bash script. You'd think Ruby would be more robust. Well, it probably
is; I'm just too inexperienced with it to know how to make it so.

But I'm having fun learning. :)

--

-- 
"Oh, look: rocks!"
	-- Doctor Who, "Destiny of the Daleks"
Todd A. Jacobs | 4 Jun 2007 06:23
Favicon

Ruby problem with conditional ranges

I'm running into a problem doing a conditional range in ruby. The code
is:

    #!/usr/bin/ruby

    ## one
    ## two
    ## three
    ## four
    ## five

    File.open($0, 'r') { |f|
	while f.gets
	    print $_.sub(/^##/, '') if /^## two/ .. /^## four/
	end
    }

I'm expecting:

    two
    three
    four

but I get the whole file with the leading pound signs stripped off.
What's wrong here?

--

-- 
"Oh, look: rocks!"
	-- Doctor Who, "Destiny of the Daleks"
(Continue reading)


Gmane