|
Lucene example source code file (data.txt)
The Lucene data.txt source code# German special characters are replaced: häufig haufig # here the stemmer works okay, it maps related words to the same stem: abschließen abschliess abschließender abschliess abschließendes abschliess abschließenden abschliess Tisch tisch Tische tisch Tischen tisch Haus hau Hauses hau Häuser hau Häusern hau # here's a case where overstemming occurs, i.e. a word is # mapped to the same stem as unrelated words: hauen hau # here's a case where understemming occurs, i.e. two related words # are not mapped to the same stem. This is the case with basically # all irregular forms: Drama drama Dramen dram # replace "ß" with 'ss': Ausmaß ausmass # fake words to test if suffixes are cut off: xxxxxe xxxxx xxxxxs xxxxx xxxxxn xxxxx xxxxxt xxxxx xxxxxem xxxxx xxxxxer xxxxx xxxxxnd xxxxx # the suffixes are also removed when combined: xxxxxetende xxxxx # words that are shorter than four charcters are not changed: xxe xxe # -em and -er are not removed from words shorter than five characters: xxem xxem xxer xxer # -nd is not removed from words shorter than six characters: xxxnd xxxnd Other Lucene examples (source code examples)Here is a short list of links related to this Lucene data.txt source code file: |
... this post is sponsored by my books ... | |
#1 New Release! |
FP Best Seller |
Copyright 1998-2021 Alvin Alexander, alvinalexander.com
All Rights Reserved.
A percentage of advertising revenue from
pages under the /java/jwarehouse
URI on this website is
paid back to open source projects.