Spiders and creepy Crawlers
Author |
Message |
r.3volved
|
Posted: Wed Jun 14, 2006 7:04 pm Post subject: Spiders and creepy Crawlers |
|
|
Does anyone have experience here with Java based webcrawlers?
I've tried some open source crawlers out, but they aren't offering what I want...most seem to be for spidering and mirroring full sites.
What I'm looking to do is crawl a single domain and parse each page for specific data. Then export this data to a .CSV file for later use.
Has anyone come across any open source apps that will do this, or will allow me to say, search for the word "Description" and grab all text after that until I hit the end of a table row?
I am fairly proficient in Java, so modifying code is not a problem...I'm simply looking for a lightweight app that will support what I'm trying to do without all the crap I don't need.
Any help is much appreciated. |
|
|
|
|
|
Sponsor Sponsor
|
|
|
rizzix
|
Posted: Wed Jun 14, 2006 7:40 pm Post subject: (No subject) |
|
|
No I don't know any, but I do know some cool technologies you can make good use of and create your own webcrawler.
Search Engine: Lucene
HTTP & Misc: Jakarta Commons (HttpClient, etc..) |
|
|
|
|
|
r.3volved
|
Posted: Thu Jun 15, 2006 10:01 pm Post subject: (No subject) |
|
|
Does anyone have any relavent information in response to my question?
Some of you leet UofW students must know something about data mining... |
|
|
|
|
|
Tony
|
Posted: Thu Jun 15, 2006 10:57 pm Post subject: (No subject) |
|
|
I wrote my own web crawler in Ruby during a work term.. It was for a highly specialized data extraction, so I don't know about any (let alone Java) pre-packaged programs. |
Tony's programming blog. DWITE - a programming contest. |
|
|
|
|
|
|