Add blog search to Ghost with PHP

By Michael Argentini
Managing Partner, Technology and Design

We recently replaced our blog publishing technology with the new Ghost blog platform. They've recently acquired Roon, so it's going to be very interesting in the months ahead. But one of the drawbacks of using an edge technology is that it's usually lacking a feature you need. In our case, it was lacking a few, but the biggest was search.

As we searched around to see if anyone had addressed this, we discovered one tool on Github (also mentioned on the Ghost forums) named GhostHunter. The author, Jamal Neufeld, took a creative approach to solving the search gap by using Javascript to hit the Ghost RSS feeds. Very clever. But as we implemented it on the development server, we saw the limitations of this approach.

  1. RSS feeds in Ghost are finite; you have to modify Ghost to show more entries than the default set.
  2. The user experience was poor; the search had to be modified to maintain state across pages, and browser history was mostly unusable.
  3. Styling the results was a bit of a chore.
  4. The results were not based on, or prioritized by, relevance. It was a simple text match (keyword) search.

For a super simple implementation (and for non-developers) it may be a good choice. But it wasn't for us. So for our purposes, we decided to write a search.

Our primary site is built with PHP. Ghost is running on its own subdomain, and uses a SQLite database. Two sites on the same server. It turns out that the version of SQLite Ghost uses also has the FTS4 extension compiled into the binary.

Score.

So the answer was clear: build a search page in PHP which taps into the SQLite database and uses FTS4.

We use a Windows Server with IIS 7, but the process should be similar in a LAMP configuration.

Step 1: Configure PHP

I had to modify the php.ini file to make sure that the SQLite library was installed. To do this, I had to make sure that the following was uncommented.

[PHP_SQLITE3]
extension=php_sqlite3.dll

Then I had to make sure that the binary existed in the PHP root path, under the "ext" folder. We run PHP 5.5, so the path to our binary is:

[PHP root]/v5.5/ext/php_sqlite3.dll

When I echo phpinfo(); the sqlite3 library version we're using is 3.8.3.1. FTS4 has been available since version 3.5.0, so I was good to go.

Step 2: Make Sure the Site Can Access the Database File

Your PHP code will have to be able to access the database file. We run our site and blog in distinct directories under a root, like this:

/web root/www/
/web root/blog/

In PHP, $_SERVER['DOCUMENT_ROOT'] returns the local drive path to the site root. So by performing a simple string replace I was able to get the local drive root of the blog like this:

str_replace("www", "blog", $_SERVER['DOCUMENT_ROOT'])

So as long as our site is running under a user context that has access to the blog folder, we can get the path to the database file itself like this:

str_replace("www", "blog", $_SERVER['DOCUMENT_ROOT']) . '\content\data\ghost.db'

Step 3: Building the Search Assets

SQLite with FTS4 requires a virtual table for searching. And with a CMS, that means rebulding it whenever there's a change to the post data. When you create a new post, the search should know this and update its search data. So the first step is to create a strategy to handle this process.

My strategy was to add a new table to the Ghost database to store a single "last update date" value. This table is created if it doesn't exist. But essentially this is checked on each page request. If the last update date is older than the last update date of any post, we know that we need to update the virtual search table. No scheduled task is needed to perform this check, and it only executes when it is necessary. I like simple. This approach should work through version updates to Ghost, provided the key columns and tables don't change significantly in structure.

I began by getting a search query from the URL, sanitizing it, and creating a query string that can be used for FTS4 search.

The code below looks for a URL parameter "s" and processes it for full-text search. The search query is converted into a string array, special characters trigger wrapping in quotation marks, and then the array is imploded and assigned to a SQL query clause.

$search = "";
$query = "1 = 0"; // Force no results for empty search
$itemnumber = 1;

if (isset($_GET["s"]))
{
    $query = "";
    $first = true;
    $search = urldecode($_GET["s"]);
    $search = str_replace("'", "", $search);
    $search = str_replace("\n", "", $search);
    $terms = explode(" ", $search);

    for ($x = 0; $x < count($terms); $x++)
    {
        if (strpos($terms[$x], '#') !== FALSE || strpos($terms[$x], '-') !== FALSE || strpos($terms[$x], '&') !== FALSE)
        {
            $terms[$x] = "\"" . $terms[$x] . "\"";
        }
    }

    $search_processed = implode(" ", $terms);
    $query = $query . "(ftc MATCH '" . $search_processed . "')";
}

Next, I needed to handle the search data update process. Below is a code block that essentially verifies the existence of the last update date table, and creates one if necessary.

// Default to rebuilding the search data
$rebuildsearchdata = "1";

$db = new SQLite3(str_replace("www", "blog", $_SERVER['DOCUMENT_ROOT']) . '\content\data\ghost.db');
$xresult = $db->query("PRAGMA auto_vacuum = NONE;");

///// Create the update check table if it doesn't exist.
$xresult = $db->query("SELECT name FROM sqlite_master WHERE type='table' AND name='searchdata_update_check'");
$xrows = array();
$xrows = $xresult->fetchArray();

if ($xrows[0] != "searchdata_update_check")
{
    $xresult = $db->query("CREATE TABLE searchdata_update_check (last_update TEXT);");
    $db->exec("INSERT INTO searchdata_update_check (last_update) VALUES ('0');");
}

With that completed, I needed to query the last update date to see if I need to rebuild the search data (the virtual table). I also check for the existence of the search data table and create it if necessary. When rebuilding the search data, it's done with a simple INSERT statement. Cake.

///// Read last update value from update check table
$xresult = $db->query("SELECT last_update FROM searchdata_update_check;");
$xrows = array();
$xrows = $xresult->fetchArray();

///// Look for posts that have been updated after last update check date
$yresult = $db->query("SELECT updated_at FROM posts ORDER BY updated_at DESC LIMIT 1;");
$yrows = array();
$yrows = $yresult->fetchArray();

if (isset($xrows) && isset($yrows))
{
    if ($xrows[0] == $yrows[0])
    {
        // If newer updates do not exist, no need to update
        $rebuildsearchdata = "0";
    }
}

///// Make sure searchdata virtual table exists...
$zresult = $db->query("SELECT name FROM sqlite_master WHERE type='table' AND name='searchdata'");
$zrows = array();
$zrows = $zresult->fetchArray();

if ($zrows[0] != "searchdata")
{
    // If searchdata table doesn't exist, force a rebuild
    $rebuildsearchdata = "1";
}

///// Rebuild search data if necessary
if ($rebuildsearchdata == "1")
{
    $db->exec("UPDATE searchdata_update_check SET last_update = '" . $yrows[0] . "';");

    $xresult = $db->query("SELECT name FROM sqlite_master WHERE type='table' AND name='searchdata'");
    $xrows = array();
    $xrows = $xresult->fetchArray();

    if ($xrows[0] != "searchdata")
    {
        // Essentially create a copy of the important post columns, with a new "ftc" column. We'll fill that with an aggregate of title and post content for searching against because FTS can only MATCH against one column. We use the other columns for rendering out the results list.
        $xresult = $db->query("CREATE VIRTUAL TABLE searchdata USING fts4(title, meta_description, slug, author_id, published_at, status, markdown, ftc);");
    }

    $xresult = $db->query("DELETE FROM searchdata;");

    // Create the search data with aggregate search column: CAKE.
    $xresult = $db->query("INSERT INTO searchdata (title, meta_description, slug, author_id, published_at, status, markdown, ftc) SELECT title, meta_description, slug, author_id, published_at, status, markdown, title || ' ' || markdown AS ftc FROM posts;");
}

Step 4: Perform the Search

At this stage, I need to perform the actual search. Two things remain:

  1. I need to create a PHP function that will determine the relevance based on a score (a weight), and tie this into SQLite. This will give me a score column in the results, so I can order by relevance.
  2. I need to perform the actual search query and output the results.

There are all kinds of relevance algorithms out there (like the Okapi BM25 ranking algorithm) that you can build into a custom SQLite binary (egads!), and to write one from scratch is a full-time job (just ask the people at Google). I was mostly interested in a frequency weighting, so I found this Github Gist that takes the count of the various keywords and phrases into account when it ranks the results. It's fast, and makes the search results really helpful, as opposed to a simple keyword match and ordering based on date or something else irrelevant. I'm sure we'll look into more advanced relevance algorithms over time, but this is a great start.

Below is the PHP function to place in the page, which I will reference later when I assign it to the SQLite instance.

function sql_rank($aMatchInfo)
{
    $iSize = 4;
    $iPhrase = (int) 0;   // Current phrase
    $score = (double)0.0; // Value to return

    /* Check that the number of arguments passed to this function is correct.
    ** If not, jump to wrong_number_args. Set aMatchinfo to point to the array
    ** of unsigned integer values returned by FTS function matchinfo. Set
    ** nPhrase to contain the number of reportable phrases in the users full-text
    ** query, and nCol to the number of columns in the table.
    */
    $aMatchInfo = (string) func_get_arg(0);
    $nPhrase = ord(substr($aMatchInfo, 0, $iSize));
    $nCol = ord(substr($aMatchInfo, $iSize, $iSize));

    if (func_num_args() > (1 + $nCol))
    {
        throw new Exception("Invalid number of arguments : ".$nCol);
    }

    // Iterate through each phrase in the users query.
    for ($iPhrase = 0; $iPhrase < $nPhrase; $iPhrase++)
    {
        $iCol = (int) 0; // Current column

        /* Now iterate through each column in the users query. For each column,
        ** increment the relevancy score by:
        **
        **   (<hit count> / <global hit count>) * <column weight>
        **
        ** aPhraseinfo[] points to the start of the data for phrase iPhrase. So
        ** the hit count and global hit counts for each column are found in
        ** aPhraseinfo[iCol*3] and aPhraseinfo[iCol*3+1], respectively.
        */
        $aPhraseinfo = substr($aMatchInfo, (2 + $iPhrase * $nCol * 3) * $iSize);

        for ($iCol = 0; $iCol < $nCol; $iCol++)
        {
            $nHitCount = ord(substr($aPhraseinfo, 3 * $iCol * $iSize, $iSize));
            $nGlobalHitCount = ord(substr($aPhraseinfo, (3 * $iCol + 1) * $iSize, $iSize));
            $weight = ($iCol < func_num_args() - 1) ? (double) func_get_arg($iCol + 1) : 0;

            if ($nHitCount > 0 && $nGlobalHitCount > 0)
            {
                $score += ((double)$nHitCount / (double)$nGlobalHitCount) * $weight;
            }
        }
    }

    return $score;
}

The actual search is really simple. I first added the ranking function to SQLite, then I ran a query and looped through the results. I constrain the search results to 99 to keep things manageable, since our search does not use pagination.

///// Search the virtual searchdata table...

$db->createFunction('rank', 'sql_rank');

$result = $db->query("SELECT title, meta_description, slug, author_id, published_at, rank(matchinfo(searchdata, 'als'), 0, 1.0, 0.5) AS score FROM searchdata WHERE status='published' AND " . $query . " ORDER BY score DESC LIMIT 99");

$rows = array();
$foundrows = false;

while ($rows = $result->fetchArray())
{
    $foundrows = true;
    $author_result = $db->query("SELECT name, slug FROM users WHERE id=" . $rows[3] . " LIMIT 1");
    $author_rows = array();
    $author_rows = $author_result->fetchArray();

At this point I needed to fill in the HTML code with the row data. So I implemented something like this:

<div class="search-slug">
    <div class="postheader_fynydd_blog">
        <h1 class="posttitle_fynydd_blog" itemprop="headline"><a href="//blog.fynydd.com/<?php echo $rows[2] ?>/" rel="bookmark"><?php echo $rows[0] ?></a></h1>
        <div class="article_details">
            By <span itemprop="author"><a href="//blog.fynydd.com/author/<?php echo $author_rows[1]; ?>/"><?php echo $author_rows[0]; ?></a></span> &bull; <time class="date_fynydd_blog" datetime="<?php echo date("Y-m-d", floatval($rows[4])/1000); ?>" itemprop="datePublished"><?php echo date("F j, Y", floatval($rows[4])/1000); ?></time>
        </div>
    </div>
    <div class="postcontent_fynydd_blog" itemprop="articleBody">
        <?php echo $rows[1] ?> <span class="hide">(Score: <?php echo round($rows[5], 2) ?>)</span> <a href="//blog.fynydd.com/<?php echo $rows[2] ?>/" rel="bookmark"><nobr>Keep reading &rarr;</nobr></a>
    </div>
</div>

And to close out the loop, I closed the database object and checked for no returned results.

}

$db->close();

if ($foundrows == false)
{
    echo "<h1>No results found.</h1>";
}   

Wrapping it Up

The last remaining piece was to create a search box and button in the Ghost theme. It's wrapped in a form tag that performs a GET to the search page I created. And likewise, I copied the search form and placed it on the search page itself as well. Here's an example.

<div class="blog_aside_search">
Blog Search
    <form action="/search/" method="get">
        <input id="search-field" name="s" placeholder="words to find..." value="<?php
            echo $search;
        ?>" type="text"/>
        <input type="submit" value="search">
    </form>
</div>

This works really well. It's fast, efficient, and is tolerant of Ghost platform upgrades. So give it a go, and use the contact form at the bottom of the page to let me know what you think.

Article last updated on 4/21/2018

Article tags

Software development web