Programming C, C++, Java, PHP, Ruby, Turing, VB
Computer Science Canada 
Programming C, C++, Java, PHP, Ruby, Turing, VB  

Username:   Password: 
 RegisterRegister   
 Spiders and creepy Crawlers
Index -> Programming, Java -> Java Help
View previous topic Printable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic
Author Message
r.3volved




PostPosted: Wed Jun 14, 2006 7:04 pm   Post subject: Spiders and creepy Crawlers

Does anyone have experience here with Java based webcrawlers?
I've tried some open source crawlers out, but they aren't offering what I want...most seem to be for spidering and mirroring full sites.

What I'm looking to do is crawl a single domain and parse each page for specific data. Then export this data to a .CSV file for later use.
Has anyone come across any open source apps that will do this, or will allow me to say, search for the word "Description" and grab all text after that until I hit the end of a table row?

I am fairly proficient in Java, so modifying code is not a problem...I'm simply looking for a lightweight app that will support what I'm trying to do without all the crap I don't need.

Any help is much appreciated.
Sponsor
Sponsor
Sponsor
sponsor
rizzix




PostPosted: Wed Jun 14, 2006 7:40 pm   Post subject: (No subject)

No I don't know any, but I do know some cool technologies you can make good use of and create your own webcrawler. Smile

Search Engine: Lucene
HTTP & Misc: Jakarta Commons (HttpClient, etc..)
r.3volved




PostPosted: Thu Jun 15, 2006 10:01 pm   Post subject: (No subject)

Confused
Does anyone have any relavent information in response to my question?
Some of you leet Rolling Eyes UofW students must know something about data mining...
Tony




PostPosted: Thu Jun 15, 2006 10:57 pm   Post subject: (No subject)

I wrote my own web crawler in Ruby during a work term.. It was for a highly specialized data extraction, so I don't know about any (let alone Java) pre-packaged programs.
Latest from compsci.ca/blog: Tony's programming blog. DWITE - a programming contest.
Display posts from previous:   
   Index -> Programming, Java -> Java Help
View previous topic Tell A FriendPrintable versionDownload TopicSubscribe to this topicPrivate MessagesRefresh page View next topic

Page 1 of 1  [ 4 Posts ]
Jump to:   


Style:  
Search: