alvinalexander.com | career | drupal | java | mac | mysql | perl | scala | uml | unix  

Groovy example source code file (Indexer.groovy)

This example Groovy source code file (Indexer.groovy) is included in the DevDaily.com "Java Source Code Warehouse" project. The intent of this project is to help you "Learn Java by Example" TM.

Java - Groovy tags/keywords

add, create, create, file, file, filereader, index, indexing, indexing, indexwriter, ioexception, lucene, lucene, standardanalyzer

The Groovy Indexer.groovy source code

import org.apache.lucene.analysis.standard.StandardAnalyzer
import org.apache.lucene.document.Document
import org.apache.lucene.document.Field
import org.apache.lucene.index.IndexWriter

/**
 * Indexer: traverses a file system and indexes .txt files
 *
 * @author Jeremy Rayner <groovy@ross-rayner.com>
 * based on examples in the wonderful 'Lucene in Action' book
 * by Erik Hatcher and Otis Gospodnetic ( http://www.lucenebook.com )
 *
 * requires a lucene-1.x.x.jar from http://lucene.apache.org
 */

if (args.size() != 2 ) {
    throw new Exception("Usage: groovy -cp lucene-1.4.3.jar Indexer <index dir> ")
}
def indexDir = new File(args[0]) // Create Lucene index in this directory
def dataDir = new File(args[1]) // Index files in this directory

def start = new Date().time
def numIndexed = index(indexDir, dataDir)
def end = new Date().time

println "Indexing $numIndexed files took ${end - start} milliseconds"

def index(indexDir, dataDir) {
    if (!dataDir.exists() || !dataDir.directory) {
        throw new IOException("$dataDir does not exist or is not a directory")
    }
    def writer = new IndexWriter(indexDir, new StandardAnalyzer(), true) // Create Lucene index
    writer.useCompoundFile = false

    dataDir.eachFileRecurse {
        if (it.name =~ /.txt$/) { // Index .txt files only
            indexFile(writer,it)
        }
    }
    def numIndexed = writer.docCount()
    writer.optimize()
    writer.close() // Close index
    return numIndexed
}

void indexFile(writer, f) {
    if (f.hidden || !f.exists() || !f.canRead() || f.directory) { return }

    println "Indexing $f.canonicalPath"
    def doc = new Document()

    // Construct a Field that is tokenized and indexed, but is not stored in the index verbatim.
    doc.add(Field.Text("contents", new FileReader(f)))

    // Construct a Field that is not tokenized, but is indexed and stored.
    doc.add(Field.Keyword("filename",f.canonicalPath))

    writer.addDocument(doc) // Add document to Lucene index
}

Other Groovy examples (source code examples)

Here is a short list of links related to this Groovy Indexer.groovy source code file:

... this post is sponsored by my books ...

#1 New Release!

FP Best Seller

 

new blog posts

 

Copyright 1998-2024 Alvin Alexander, alvinalexander.com
All Rights Reserved.

A percentage of advertising revenue from
pages under the /java/jwarehouse URI on this website is
paid back to open source projects.