Author Topic: Web crawler  (Read 3000 times)

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Web crawler
« on: June 09, 2008, 12:54:27 PM »
So I got asked to download the pages and files from a website that will be closed fairly soon.  Since FTP access isn't provided I'm gonna have to go the route of crawling the site and downloading the shit.

I'm certain I can write a script to do it but really I'd rather not take the time if you all know of a free program that can do it.  It would need to be able to do http auth but not https.

Dumah

  • Jackass IV
  • Posts: 960
  • Karma: +21/-6
Re: Web crawler
« Reply #1 on: June 09, 2008, 01:04:22 PM »
I heard wget can do this on linux, but never tried

Perspective

  • badfish
  • Jackass In Charge
  • Posts: 4635
  • Karma: +64/-22
    • http://jeff.bagu.org
Re: Web crawler
« Reply #2 on: June 09, 2008, 01:06:29 PM »
I wrote a python script that does this (it just validates that every link returns OK, it'd be simple to change it to actually download the page though). Its on my home computer, I can post it when I get home if you haven't found an alternative by then.

micah

  • A real person, on the Internet.
  • Ass Wipe
  • Posts: 6915
  • Karma: +58/-55
  • Truth cannot contradict truth.
    • micahj.com
Re: Web crawler
« Reply #3 on: June 09, 2008, 01:23:03 PM »
I wrote a python script that does this (it just validates that every link returns OK, it'd be simple to change it to actually download the page though). Its on my home computer, I can post it when I get home if you haven't found an alternative by then.

pft, python.
"I possess a device, in my pocket, that is capable of accessing the entirety of information known to man.  I use it to look at pictures of cats and get in arguments with strangers."

Mike

  • Jackass In Charge
  • Posts: 11257
  • Karma: +168/-32
  • Ex Asshole - a better and more caring person.
Re: Web crawler
« Reply #4 on: June 09, 2008, 06:23:42 PM »
I don't think my work computer has Python.  Though that did remind me that I already have a link validator script that can crawl.  I should be able to modify it to work.

hans

  • Guitar Addict
  • Jackass In Charge
  • Posts: 3523
  • Karma: +46/-18
Re: Web crawler
« Reply #5 on: June 09, 2008, 07:10:51 PM »
I used to have a program that did that, can't remember what it was called though. I used it to download some online books.
This signature intentionally left blank.