- A socially-tagged source code search engine

This announcement is a little premature, but ... for at least ten years I've thought about creating a search engine to make it easier to find source code examples, and while Google and thousands of writers have pretty much eliminated the need for such a search engine, I decided to create one anyway. The approach I'm working on is to create a source code search engine that is driven by "tags", in much the same way that and other tag concepts work. - A socially tagged source code search engine

The way it works right now (Phase 1) is that I download the source code from open source projects, run a whole bunch of preprocessors against each file, and eventually insert each file into a database with machine-generated tags. After chunking through the roughly the first 100,000 files, you can find the result of these efforts at my new website, (I wanted the domain name, but of course a cybersquatter has their big old butt sitting on that domain name.)

This first iteration was driven by these simple goals:

  1. Get the basic concept working with machine-generated tags.
  2. Work through all the technical issues of getting a site like this hosted with GoDaddy.

It was also driven by one other goal: My desire to create my first PHP website written entirely from scratch, without the use of CakePHP or the Zend Framework libraries. (I may use the Zend libraries in the next iteration.)

Here's what it looks like at the moment:, a socially tagged source code search engine

(Yes, I know some of those colors have issues ... time, time, time.) - Phase 2

Due to time constraints I'm not sure when I'll develop the second phase of this project, but the goals for Phase 2 are pretty obvious:

  1. Enable the social-tagging capabilities.
  2. Integrate the social-tagging aspect with an anti-spam tool like Mollom.
  3. Possibly allow comments or some sort of markups for each file.

Unfortunately Phase 1 took a little longer than desired, so my social search engine isn't actually social just yet. - Phase 3

The goals for Phase 3 of this project are also pretty obvious to me:

  1. Add more programming languages to the database. Software-generated tags for other languages like PHP, C, C++, Ruby, Python, Perl, etc., can all be handled with the same approach I've used to pre-generate Java source code tags.
  2. Greatly improve the source code search results. An old search engine tool named Webglimpse actually offers the exact search engine results UI I'd like, and I plan to model the search results after it.

The tagged source code concept

I don't know if this will seem like a new concept or not, but the funny thing is that I've been tagging source code in my Java Source Code Warehouse project since 1998. Adding the "social tagging" aspect to the warehouse has been obvious since I first heard about tagging with, but I've never had the time to explore this idea like this until recently.

If you're interested in taking this "socially tagged source code engine" for a spin, just visit, and type in whatever tags you think might be interesting. (Just remember, they're all Java files right now.) Start with something like "stringbuffer", "swt", "ant", and go from there.