Nutch and Solr are two solid tools created by the great folks at Apache that you can use to crawl the web (Nutch) and index your crawled data (Solr). There are obviously far more uses for these tools than just indexing random websites and I won’t go into those in this post, but seeing as though I struggled to find documentation on all of this when I started using them I thought I’d put together a quick starter’s guide to crawling the web with Nutch and using Solr to index and search the data that you have crawled.
Posts with tag: Nutch






