|
Lucene example source code file (Syns2Index.java)
The Lucene Syns2Index.java source codepackage org.apache.lucene.wordnet; /** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.InputStreamReader; import java.io.PrintStream; import java.util.Iterator; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Set; import java.util.TreeMap; import java.util.TreeSet; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriterConfig; import org.apache.lucene.index.TieredMergePolicy; import org.apache.lucene.index.IndexWriterConfig.OpenMode; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; /** * Convert the prolog file wn_s.pl from the <a href="http://www.cogsci.princeton.edu/2.0/WNprolog-2.0.tar.gz">WordNet prolog download * into a Lucene index suitable for looking up synonyms and performing query expansion ({@link SynExpand#expand SynExpand.expand(...)}). * * This has been tested with WordNet 2.0. * * The index has fields named "word" ({@link #F_WORD}) * and "syn" ({@link #F_SYN}). * <p> * The source word (such as 'big') can be looked up in the * "word" field, and if present there will be fields named "syn" * for every synonym. What's tricky here is that there could be <b>multiple * fields with the same name, in the general case for words that have multiple synonyms. * That's not a problem with Lucene, you just use {@link org.apache.lucene.document.Document#getValues} * </p> * <p> * While the WordNet file distinguishes groups of synonyms with * related meanings we don't do that here. * </p> * * This can take 4 minutes to execute and build an index on a "fast" system and the index takes up almost 3 MB. * * @see <a href="http://www.cogsci.princeton.edu/~wn/">WordNet home page * @see <a href="http://www.cogsci.princeton.edu/~wn/man/prologdb.5WN.html">prologdb man page * @see <a href="http://www.hostmon.com/rfc/advanced.jsp">sample site that uses it */ public class Syns2Index { /** * */ private static final PrintStream o = System.out; /** * */ private static final PrintStream err = System.err; /** * */ public static final String F_SYN = "syn"; /** * */ public static final String F_WORD = "word"; /** * */ private static final Analyzer ana = new StandardAnalyzer(Version.LUCENE_CURRENT); /** * Takes arg of prolog file name and index directory. */ public static void main(String[] args) throws Throwable { // get command line arguments String prologFilename = null; // name of file "wn_s.pl" String indexDir = null; if (args.length == 2) { prologFilename = args[0]; indexDir = args[1]; } else { usage(); System.exit(1); } // ensure that the prolog file is readable if (! (new File(prologFilename)).canRead()) { err.println("Error: cannot read Prolog file: " + prologFilename); System.exit(1); } // exit if the target index directory already exists if ((new File(indexDir)).isDirectory()) { err.println("Error: index directory already exists: " + indexDir); err.println("Please specify a name of a non-existent directory"); System.exit(1); } o.println("Opening Prolog file " + prologFilename); final FileInputStream fis = new FileInputStream(prologFilename); final BufferedReader br = new BufferedReader(new InputStreamReader(fis)); String line; // maps a word to all the "groups" it's in final Map<String,List Other Lucene examples (source code examples)Here is a short list of links related to this Lucene Syns2Index.java source code file: |
... this post is sponsored by my books ... | |
#1 New Release! |
FP Best Seller |
Copyright 1998-2021 Alvin Alexander, alvinalexander.com
All Rights Reserved.
A percentage of advertising revenue from
pages under the /java/jwarehouse
URI on this website is
paid back to open source projects.