PHP Classes

Search 2.0, a site search better than Google could provide

Recommend this page to a friend!
  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Search 2.0, a site se...   Post a comment Post a comment   See comments See comments (16)   Trackbacks (2)  

Author:

Viewers: 5

Last month viewers: 4

This post presents the recent developments made to the internal site search engine.

It explains which features were implemented to provide a better search experience than using Google to search the site.

Some features are explained in detail like the search keyword auto-completion, AJAX based search results pagination, and DHTML based page animation effects.




Loaded Article
Contents

* Search 2.0

* Internal site search versus Google

* Making a better site search than Google could provide

* AJAX based keyword auto-completion

* Google search syntax

* Search the site by section

* AJAX based results pagination

* Special visual effects

* Try it now here


- Open Source CMS Award

Before proceeding to the actual topic of this post, I would like to let you know that Packt Publishing has just started a new Open Source Award. It is meant to promote the most appreciated Open Source content management systems (CMS).

The winners are nominated by the community users. If you use one or more PHP Open Source CMS, please collaborate by going to the Packt site and submit your nominations:

packtpub.com/award


* Search 2.0

Currently there are about 2,800 packages published in the site. That is a lot of content.

Once in a while I receive complaints from the users that are having difficulty to find what the packages they are looking for.

Therefore, the latest developments of the site have been concentrated on improving the internal site search engine.


* Internal site search versus Google

Currently, the default site search engine is Google. When the users submit a search form, they are redirected to Google.

The Google search pages that the users see are co-branded. You see the same logo and colors used in the PHPClasses site, but in reality you are accessing Google search pages.

The PHPClasses site also has an its own internal search engine. However, the default site search engine is Google for two reasons.

First Google co-branded search provides an additional revenue source by the means of the AdSense for search program.

AdSense is a Google program used by many site publishers to generate revenue that may help keeping their sites financially viable. I have already mentioned the AdSense program in the past. If you are a site publisher interested in generating revenue, here you may find more details:

phpclasses.org/tips.html?tip=site-r ...

The other reason why Google is the default site search, is because it helps off-loading the site server.

When used by many concurrent users, the internal search engine causes significant server load. It can make the site slower for everybody that is accessing it.


* Making a better site search than Google could provide

Please do not get me wrong. There is no competition between Google and the internal search engine that was implemented in the PHPClasses site.

Google is great search engine site, especially if you do not know which site has what you are looking for. We all love Google for that.

However, if PHPClasses users complain that they are not able to find what they are looking for using the default search engine, which is Google, it is reasonable to conclude that there may be ways to improve the search experience beyond using Google.

I studied the problem for a while and realized that there are some circumstances that make it possible to provide a site search that is better than what Google can provide, for instance, using knowledge about the context of the site being searched.

The new site search engine has been enhanced to take advantage of this contextual knowledge and offer the PHPClasses site users a better search experience.


* AJAX based keyword auto-completion

Since the first version of the internal search, implemented in the beginning of the year 2000, all the searches made by the site users are recorded in a database table. Currently, that table has near 2 million records.

When the search recording functionality was implemented, I had no idea of what that information could be good for. I just thought that some day I could use it for statistical purposes or some other kind of data warehouse based application.

While studying the problem of making a better site search using contextual information, I finally found an interesting application of the the recorded searches data.

It would be great if the site search could guess what the user wants to search after typing just a few of the first letters. This is a feature also know as auto-completion. It can be found in other search sites like Google.

But which keywords could be suggested to complete the first letters typed by the user?

I realized that different users tend to search the site for the same things using the same keywords. It is reasonable to assume that each new search very likely uses the same keywords of one of the most popular searches. Still the most popular searches may include hundreds or thousand keywords.

To solve the problem in an useful way, the search page makes an AJAX call to the server to request the top ten most searched keywords that begin with the first letters typed by the user.

To implement this feature I have developed a plug-in class for my forms generation class to perform auto-completion using AJAX.

The new plug-in class can execute an arbitrary database query to find the words that match the first typed letters. It returns the top results and displays them in a pull-down menu.

There are different variants of this plug-in that can query MySQL or any other database supported by the Metabase or PEAR::MDB2 databases abstraction layers.

All these plug-in variants are available as part of the forms generation package:

phpclasses.org/formsgeneration


* Google search syntax

What if the user wants to search for some words but exclude pages with other words?

What if the user wants to search for expressions with multiple words in the exact sequence?

Most people are so used to Google that they know the exact syntax that it supports for expressing search requests that meet these and other requirements.

Preceding a search word with the character - excludes search result pages that have that word. Using quotes around two or more words makes the search results include only pages that have those words in the exact sequence.

The PHPClasses internal search engine uses Ht:/Dig to index and search the site. It is an old but reliable Open Source search engine software.

htdig.org/

Ht:/Dig does not support Google search syntax directly. However it supports boolean searches. This allows to build search expressions that involve complex boolean expressions.

I have developed new parser that converts search expressions with Google syntax into Ht:/Dig boolean search expressions. Only a subset of the Google syntax is supported for now.

The Google search syntax converter for Ht:/Dig is part of a new version of a class that I developed a long time ago to interface with Ht:/Dig from PHP. The new version of the class will be available in the next few days from here:

phpclasses.org/htdiginterface


* Search the site by section

The PHPClasses has several distinct sections. If you are searching only for packages using certain keywords, you are probably not interested in pages from the reviews or the blog section. Maybe you want to include the package support forums pages in the search results or not.

The new site search lets you specify exactly what sections you want to search. There are several check boxes and radio buttons to let you tell exactly which sections you want to search.

Furthermore, the search results present the pages that are found according to the section that they belong. The results are presented with a tab based user interface. It can be used to navigate between the different search result sections.

Only the site sections with results are displayed in the tabs. Each tab contains the name of the section and the number of occurrences found in that section.


* AJAX based results pagination

Each section tab only displays up to 10 results. Sections that return more than 10 results appear with an additional tab row, so you can switch to other result pages in the same section.

The traditional implementations of tab based user interfaces require reloading the pages. That is a slow approach that causes unwanted delays and page flicker. Fortunately we can use AJAX techniques to avoid that problem.

The site is using another plug-in of the forms class, mentioned above, to make an AJAX call to the server and only replace the result page sections that correspond to the tab clicked by the user.


* Special visual effects

All these search page user interface improvements are great, but I thought it would be nice to go further. So I decided to give it an extra touch of style and employ a few special visual effects.

As you may be aware, newer browser version support a new page element style property named opacity. It is a property that can be used to make the page elements transparent, translucent or opaque. You can change the values of this property between transparent and opaque, and vice-versa, to achieve nice fade-in or fade-out effects.

This is not really an important detail, but the site search pages are using these effects to show and hide the progress feedback text and switch the search result pages when you click on the pagination tabs.

Right now, these effects are being tested and may be changed or removed later. There are still a few minor glitches to be fixed, but I hope you agree with me that it gives the search pages a nice touch.

These effects are achieved also with a new plug-in of the forms class mentioned above. Besides the fade effects, it implements other visual effects, not yet in use in the site. The effects are achieved with a new Javascript class that I developed for animating HTML pages.

Since this is a recent development, the new plug-in will only be published in a few days when I finish its documentation.


* Try it now here

As you may imagine, the new site search engine enhancements required a great development effort.

As I mentioned in past posts, the access to the internal search engine is one of the services that will be offered exclusively to the users that adhere to paid subscriptions.

phpclasses.org/blog/post/47-Planned ...

These services will be available finally in a couple of months. Until then you can try the new internal site search, free of charge and without any restrictions, here:

phpclasses.org/search.html


As usual it would be nice if you could provide your comments about these new search engine features or other features that you would like to suggest. Just follow the comment posting links.



You need to be a registered user or login to post a comment

1,614,673 PHP developers registered to the PHP Classes site.
Be One of Us!

Login Immediately with your account on:



Comments:

7. Cool - Austin White (2007-01-06 03:16)
Cool backend search... - 1 reply
Read the whole comment and replies

5. how to download - dan (2006-09-07 21:46)
need help... - 3 replies
Read the whole comment and replies

6. Bug - isnoopy (2006-08-22 02:50)
Bug report... - 1 reply
Read the whole comment and replies

4. Search 2.0 - Bill Madill (2006-08-11 18:21)
search page works well (FF 1.5.0.5 WinXP)... - 1 reply
Read the whole comment and replies

3. Why htdig? - Richard Barr (2006-08-02 00:45)
Mysql boolean fulltext vs htdig... - 1 reply
Read the whole comment and replies

2. a site search better than Google could provide - James Benson (2006-08-01 19:36)
a site search better than Google could provide... - 1 reply
Read the whole comment and replies

1. Javascript - Adam Balachowski (2006-08-01 18:45)
disable javascript... - 1 reply
Read the whole comment and replies


Trackbacks:

2. building better internal search engines (2008-04-23 04:34)
from the page:...

1. Søge Class php (2006-12-21 04:48)
Dette skulle være et rigtig fedt søge class som mindst skulle være på højde med google :D



  Blog PHP Classes blog   RSS 1.0 feed RSS 2.0 feed   Blog Search 2.0, a site se...   Post a comment Post a comment   See comments See comments (16)   Trackbacks (2)