Made of Everything You're Not

Because there's too much info for my brain.
  • Home
  • Projects
  • Portfolio
  • Resume
« Living in Two Worlds
WP-Click-Track 0.6: Going Native »

FULL TEXT Searching In MySQL

I’ve been wanting to add a search feature to wp-click-track for a while now. After almost a year of writing 150 posts I have a lot of links; so many in fact that it’s getting a little tough finding the links I want when I want them. I’ve written a few search engines in my time (some actually pretty good and some that just plain sucked) and thought this would be a simple addition to the program.

Boy, was I right.

php & MySQL Full Text SearchingText

php & MySQL Full Text SearchingText

Real quick; I’m going to be referencing a MySQL table below. Here’s the SQL to create that table:

CREATE TABLE tracking_links (
`link_id` int(10) NOT NULL auto_increment,
`link_title` varchar(255) NOT NULL default '',
`link_desc` text NOT NULL,
`link_destination` varchar(255) NOT NULL default '',
`link_total_clicks` int(10) NOT NULL default '0',
`link_unique_clicks` int(10) NOT NULL default '0',
`creation_date` datetime NOT NULL default '0000-00-00 00:00:00',
`last_modified` datetime NOT NULL default '0000-00-00 00:00:00',
PRIMARY KEY  (`link_id`)
)

The traditional, no frills, way easy path, would be to build a basic “LIKE” query and attach it to a single text field. The queries would then look like this:

SELECT * FROM tracking_links WHERE link_title LIKE '%$_search%'

If you don’t already know; the above query would match all the links with a title that has the string $_search anywhere within the column.The percent character (%) is a wild card; sandwiching $_search like it is basically says “Grab me everything that has this string anywhere within the column”.

The above is the lowest of the low when it comes to elegance of design and sheer awesomeness though. It’s pretty much the amateur version of a search mechanism.

For one thing performance is going to be an issue when dealing with a lot of rows in the table. The speed on a query like the above, even with an index on the “link_title” column, is going to crumble onceĀ  you get more than a couple hundred thousand rows because the MySQL has to do a full table scan.

Another issue is that relevancy is completely absent. There’s no way to remove elements or, really, do any boolean matching. You’re going to get results that just seem… off.

If I want to search against multiple columns the SQL just gets horrifying:

SELECT * FROM tracking_links WHERE link_title LIKE '%$_search1%' OR link_destination LIKE '%$_search2% OR...'

And then there’s the accompanying form for a query like the above. Traditionally, it would be x form fields for each column; in the case of wp-click-track that would be 3 individual fields. This would pretty much destroy the plugin interface or, worse, require a separate, dedicated, search page.

The answer to those problems is full text search. According to Wikipedia :

In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user.

This is a difficult concept to grasp, especially for someone coming from the LIKE camp; I know it took me a little bit to get it fully. Here are a couple notes that should help:

  1. String to search must be longer than 3 characters long.
  2. DB Type must be MyISAM
  3. Don’t try for individual search fields
    ex: SELECT * FROM table WHERE col1 = ‘$col1′ AND col2 = ‘$col2′
    instead
    Have all fields setup as a single Fulltext field.
    ex: SELECT table.* , MATCH (col1, col2, col3) AGAINST (‘+( $text1) -($text2)’) AS Score FROM table WHERE MATCH (col1, col2, col3) AGAINST (‘+( $text1) -($text2)’ IN BOOLEAN MODE)
  4. IN BOOLEAN MODE is required for queries with the ‘-’ attribute.
  5. Algorithm for relevance is:
    w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)

Using fulltext searching allows your site users to do Google style searching like so:

+php -mysql +sqlite +temp*

The above search query breaks down to “Grab everything with “php”, without “mysql” but with “sqlite” and any word that begins with “temp”. Pretty snazzy and useful if your users are savvy in how to search effectively. Hell, even without knowing how to perform boolean searches the above is still useful and makes your code way cleaner.

In order to implement fulltext searching the table has to be setup with the MyISAM engine and a fulltext index has to be defined. To change a table to use the MyISAM engine just run the below query:

ALTER TABLE tracking_links ENGINE = MYISAM

To add a fulltext index you create it like so:

ALTER TABLE tracking_links ADD FULLTEXT search_index (link_title,link_desc,link_destination)

The complete fulltext search query for the above query (+php -mysql +sqlite +temp*) example would look like:

SELECT *
FROM tracking_links
MATCH(link_title,link_desc,link_destination) AGAINST('+php -mysql +sqlite +temp*' IN BOOLEAN MODE);

There’s one other feature worth mentioning; the Score option. The Score

SELECT *,
         MATCH(link_title,link_desc,link_destination) AGAINST('+php -mysql +sqlite +temp*' IN BOOLEAN MODE) AS SCORE
FROM tracking_links
         MATCH(link_title,link_desc,link_destination) AGAINST('+php -mysql +sqlite +temp*' IN BOOLEAN MODE);

The above query will return all the rows as well as the relevance of the column to the dataset. On initial inspection the number won’t make much sense; it’s usually something like 0.9843949302 and 2.9408509384 which is, admittedly, pretty useless by itself. But, using a simple formula you can turn it into something pretty relevant:

Using the above code against a SCORE of .9823475 results in a $score of 98.2% which is a simple way to let your users know the result is relevant.

As the above, hopefully, demonstrates FULL TEXT searching with MySQL is easy and adds an extra layer of awesome and fun to an application.

Bookmark and Share

Related Posts

WP-Click-Track 0.7.2 Released
ExpressionEngine White Screen Fix
Portability Is A Good Goal
Create Expression Engine Plugin
hResume hKit Profile

Tags: fulltext, mysql, php

This entry was written by Eric Lamb and posted on Monday, August 17th, 2009 at 5:00 am and is filed under Code, Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

5 Comments

  1. 9neo says:
    November 6, 2009 at 4:49 am

    Nice thank for sharing.

    Reply
  2. Brandon Burkett says:
    January 7, 2010 at 7:15 pm

    I typically choose referential integrity over the ability to full text search by opting for InnoDB. Would be nice if InnoDB supported some method of full text searching like MyISAM tables.

    Reply
  3. uberVU - social comments says:
    January 8, 2010 at 6:20 am

    Social comments and analytics for this post…

    This post was mentioned on Reddit by lukemcr: I will use this posthaste. Awesome, thank you….

    Reply
  4. _Z says:
    January 21, 2010 at 7:01 am

    Very nice to see this clearly explained. I have been using amateur search and am glad to learn another way!

    Reply
  5. AL says:
    July 4, 2010 at 11:13 pm

    Is it possible to have full text search perform a search exactly like using like “% %”.

    Reply

Leave a Reply

Click here to cancel reply.

  • Subscribe: Entries | Comments
  • About Me

    Email Email
    Twitter Twitter
    310.739.3322
  • Categories

    • Brain Dump
    • Business
    • Code
    • IT
    • Programming
    • Rant
    • Servers
  • Archives

    • August 2010
    • July 2010
    • June 2010
    • May 2010
    • April 2010
    • March 2010
    • February 2010
    • January 2010
    • December 2009
    • November 2009
    • October 2009
    • September 2009
    • August 2009
    • July 2009
    • June 2009
    • May 2009
    • April 2009
    • March 2009
    • February 2009
    • January 2009
    • December 2008
    • November 2008
    • October 2008

Copyright © 2008 - 2010 Eric Lamb - All rights reserved