Author Topic: Looking for web crawler tool  (Read 2761 times)

PJYelton

  • Power of Voodoo
  • Jackass In Charge
  • Posts: 1597
  • Karma: +56/-12
    • TheRecursiveFractal
Looking for web crawler tool
« on: March 21, 2012, 01:02:23 PM »
So we are creating a tool that compares data in an excel sheet to data in a database, not too terribly difficult.  The problem we are having however is downloading these excel sheets.  To do it manually one needs to enter an oracle website called Siebel, login, click on a tab, enter in an order number, click on a link, then a tab, then a file name.  The user is then asked if they want to save to file.

We want to do this automatically, processing a hundred or so of these excel sheets a day.  My question is, does anyone know of a tool I can download that can do something like this automatically, kind of like a web crawler?  There would need to be some flexibility because the location and names of the links will change and the order number will need to be passed in as a parameter.  I am sure I could write my own program to do it but the languages I know (Java, Perl, C++) aren't well suited to this type of task so it would take awhile.

ober

  • Ashton Shagger
  • Ass Wipe
  • Posts: 14310
  • Karma: +73/-790
  • mini-ober is taking over
    • Windy Hill Web Solutions
Re: Looking for web crawler tool
« Reply #1 on: March 21, 2012, 01:09:34 PM »
One of my clients asked me to do this with one of his sites, but it would only copy the file to his webserver.  It's lower on his priority list but I looked into it and it can be done with PHP and I think I even saw a class on phpclasses.org that did it.  But I don't remember what it was called and I don't have a bookmark to it.

PJYelton

  • Power of Voodoo
  • Jackass In Charge
  • Posts: 1597
  • Karma: +56/-12
    • TheRecursiveFractal
Re: Looking for web crawler tool
« Reply #2 on: March 21, 2012, 01:17:02 PM »
Yeah, unfortunately I don't know a lick of PHP.

micah

  • A real person, on the Internet.
  • Ass Wipe
  • Posts: 6915
  • Karma: +58/-55
  • Truth cannot contradict truth.
    • micahj.com
Re: Looking for web crawler tool
« Reply #3 on: March 21, 2012, 01:19:50 PM »
I know you said PHP is out, but I'll just leave this link here: http://code.google.com/p/php-excel-reader/

I've had great success with it and the original project it's forked from... well, except it seems to break if you use newer version of excel.
"I possess a device, in my pocket, that is capable of accessing the entirety of information known to man.  I use it to look at pictures of cats and get in arguments with strangers."

PJYelton

  • Power of Voodoo
  • Jackass In Charge
  • Posts: 1597
  • Karma: +56/-12
    • TheRecursiveFractal
Re: Looking for web crawler tool
« Reply #4 on: March 21, 2012, 01:21:04 PM »
Reading the excel sheet isn't a problem, we've already got that part figured out.  Its just downloading the file from this Oracle webtool.

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Looking for web crawler tool
« Reply #5 on: March 21, 2012, 01:27:31 PM »
Is there any pattern to the file location and names?  After logging in could you just ask for the file directly?

PJYelton

  • Power of Voodoo
  • Jackass In Charge
  • Posts: 1597
  • Karma: +56/-12
    • TheRecursiveFractal
Re: Looking for web crawler tool
« Reply #6 on: March 21, 2012, 01:35:38 PM »
No, nowhere does it give a file location or direct link.  We've tried to find that from 10 different directions but keep coming up empty.

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Looking for web crawler tool
« Reply #7 on: March 21, 2012, 01:48:29 PM »
Use fiddler to sniff the traffic.  Cause if you can find where it is coming from then it get's a whole lot easier.

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Looking for web crawler tool
« Reply #8 on: March 21, 2012, 01:52:23 PM »
Some other stuff that'll help:

See if there are ids on the page elements you interact with manually.  Will make parsing easier.

To break down the bot stuff:
Basically you'll just be making HTTP calls (helps if you know the protocol), parsing the headers and the body, saving and passing cookies, to the next action until you finally get the file.

webwhy

  • Jackass IV
  • Posts: 608
  • Karma: +15/-10
Re: Looking for web crawler tool
« Reply #9 on: March 21, 2012, 02:16:46 PM »
There was an old Perl library in CPAN called WWW::Mechanize that might save you some time.

PJYelton

  • Power of Voodoo
  • Jackass In Charge
  • Posts: 1597
  • Karma: +56/-12
    • TheRecursiveFractal
Re: Looking for web crawler tool
« Reply #10 on: March 21, 2012, 03:08:34 PM »
Ok, thanks for the help guys.  I was really hoping I wouldn't have to resort to building my own tool but I guess I'll do a little more research and keep that as the backup plan.  I'm not very familiar with other tech forums, anyone know of another great place to pose a question such as this?

ober

  • Ashton Shagger
  • Ass Wipe
  • Posts: 14310
  • Karma: +73/-790
  • mini-ober is taking over
    • Windy Hill Web Solutions
Re: Looking for web crawler tool
« Reply #11 on: March 21, 2012, 03:34:56 PM »
www.phpfreaks.com/forums

There are boards there that are not for PHP but it might be worth throwing it out to the misc board just to see what you get?