alvinalexander.com | career | drupal | java | mac | mysql | perl | scala | uml | unix  

jforum example source code file (ReindexingMessages.txt)

This example jforum source code file (ReindexingMessages.txt) is included in the DevDaily.com "Java Source Code Warehouse" project. The intent of this project is to help you "Learn Java by Example" TM.

Java - jforum tags/keywords

admin, id, it, it, jforum, jforum, lucene, panel, post, the, the, there, this, this

The jforum ReindexingMessages.txt source code

!!! Reindexing messages
[{TableOfContents}]
JForum uses the [Lucene|http://lucene.apache.org] library, which provides an excellent platform for fast content search and indexing. Versions of JForum previous to 2.1.8 used a database-driven approach, which included several tables for word indexing and search.

Lucene stores its index in the filesystem using a [special|http://lucene.apache.org/java/docs/fileformats.html] document format. On a regular development machine, JForum indexes about 1000 messages per second, value that can be higher on powerful machines. 

!! How to reindex 
There are two possible ways to reindex the messages: using a command-line tool, or using JForum's Admin Panel (web interface). It is up to you to choose which one to use, although there are some considerations. 

First, if you're migrating from JForum 2.1.7 of previous to JForum2.1.8 or newer, you will have to reindex the entire database, as the old search mechanism is no longer supported. This operation is fast and painless, and it is better to use the command-line tool for this task. Secondly, JForum's Admin Panel has a section that provides information about Lucene's index status and other general information, like index size and last modification date, as well a tool to reindex messages using the same parameters available to the command-line tool. You can reindex from a range of Post IDs or message date. 

!! Before you start: setting up index configuration
As a part of JForum configuration, you may want to take a look at the default settings used for indexing - that are OK for most environments. 

Open your ''SystemGlobals.properties'' file and look for the "SEARCH" search. There you will find a set of properties, as described in the following table:

||Property name||Default value||Description
|search.indexing.enabled|true|Enable or disable search indexing. Set it to ''false'' to disable.
|lucene.index.write.path |${resource.dir}/jforumLuceneIndex|The complete path to where the index will be written. The default value will write it to the directory ''WEB-INF/jforumLuceneIndex'' of your application. \\Please note that the directory must be writable by the user who runs the application server instance. 
|lucene.indexer.ram.numdocs|10000|used only for reindexation. It is the number of documents to keep in memory before flushing them to the disk. Keep in mind that a higher number means a higher memory usage. \\This value also affects a little bit how long the entire process will take. 
|lucene.indexer.db.fetch.count|50|Also used only for reindexation. It is the number of records to fetch from the database on each read

[{Note title='Index store directory permissions'

The directory specified in the property ''lucene.index.write.path'' __must__ be writable by the user who runs the web server. Not doing so will cause errors when trying to index any message
}]

!! Using the command-line tool
JForum comes with a tool named "Lucene Indexer", located in the directory ''tools/luceneIndexer''. It is a command-line interface, and can be used to reindex the entire database or just part of it. In the tool's directory there are two files - ''LuceneCommandLineReindexer.sh'' and ''LuceneCommandLineReindexer.bat'' -, being the first destined for Unix like systems and the second for Windows machines. 

Invoking the tool without any arguments will provide you with a list of available options, like shown below: 

[{Highlight

Usage: LuceneCommandLineReindexer \\
    --path __full_path_to_JForum_root_directory__ \\
    --type __{date|message}__ \\
    --firstPostId __a_id__ \\
    --lastPostId __a_id__ \\
    --fromDate __dd/MM/yyyy__ \\
    --toDate __dd/MM/yyyy__ \\
    [[--recreateIndex] \\
    [[--avoidDuplicatedRecords] \\
}]

The following table describes each argument. 

||Argument name||Description||
|--path|The complete path to where JForum is installed. Lucene Indexer will use it as base for reading the configuration files stored in ''WEB-INF/config''.
|--type|Type of indexing. Can be __date__ or __message__. The first enabled the use of ''--fromDate'' and ''--toDate'', while the second enables the use of ''--firstPostId'' and ''--lastPostId''.
|--firstPostId|The ID of the first message to index (the "Post ID"). Used only when ''--type=message''.
|--lastPostId|The ID of the last message to index (The "Post ID"). Used only when ''--type=message''. 
|--fromDate|The start date to index from. Used only when ''--type=date'', and the date format must be in the form ''dd/MM/yyyy'', like '23/07/2007' (July 23, 2007)
|--toDate|The end date to index from. Used only when ''--type=date'', and the date format must be in the form ''dd/MM/yyyy'', like '23/07/2007' (July 23, 2007)
|--recreateIndex|If specified (there is no value to provide, like other options), it will recreate the index from scratch. 
|--avoidDuplicatedRecords|This is useful when you don't want to add documents to an existing index (instead of recreating it) and want to make sure that there will not be any duplicated record. This option makes the indexing process slower.

!! Usage example
Let's see a set of examples. For all cases, considere that JForum is installed at /home/www/jforum

! Reindexing the entire board
We want to reindex all messages into a new brand new index. There are a total of 367.234 messages, so we will round it to 368.000 (there is no real need to round the number, but it also doesn't hurt).

[{Highlight

sh LuceneCommandLineReindexer.sh --path=/home/www/jforum --recreateIndex --type=message --firstPostId=1 --lastPostId=368000
}]

! Reindexing from a range of date
We want to only reindex messages from a given month, and add them to an existing index. We'll not care about duplicated records here. 

[{Highlight

sh LuceneCommandLineReindexer.sh --path=/home/www/jforum --type=date --fromDate=01/03/2006 --toDate=31/03/2006
}]

! Avoiding duplicated records
Now we also want to reindex a determined date range, but this time it is necessary to make sure that there will not be any duplicated record.

[{Highlight

sh LuceneCommandLineReindexer.sh --path=/home/www/jforum -avoidDuplicatedRecords --type=date --fromDate=15/04/2007 --toDate=27/05/2007
}]

!! Using the Web interface - Admin Panel
There is a web based interface for reindexing, which also provides general statistics about the current usage of the index. To access it, go to Admin Panel -> Lucene Statistics. There you will find two boxes, one named "Search statistics" and other called "Re-Index". 

! Search statistics
General information about the current usage of Lucene's index. It shows how many messages are in the database, the index storage directory, its last modification date and version, as well if it is locked or no - if it is locked, then some process is currently writing to the index. The last option, "Is Post Indexed?" allows you to query the index to see if a specific message is indexed. 

! Re-Index
This section works much like the Command Line interface, with the difference that runs on the Tomcat instance. All options are there, and the only difference is that reindexing by date requires that you provide the start and end hour as well, while the command line tool only asks for the date. 

After you click "Start" the process will start in background, and the page will refresh on each 5 seconds automatically. When the process finishes, it will change back to its previous state. 

Other jforum examples (source code examples)

Here is a short list of links related to this jforum ReindexingMessages.txt source code file:

... this post is sponsored by my books ...

#1 New Release!

FP Best Seller

 

new blog posts

 

Copyright 1998-2021 Alvin Alexander, alvinalexander.com
All Rights Reserved.

A percentage of advertising revenue from
pages under the /java/jwarehouse URI on this website is
paid back to open source projects.