Welcome, I'm Eric Hiller

Topical news and Information collected across the years from the network, telecom, software, and general science and technology fields.

Subscribe to Updates
RSS News Feed
Find Me On
GitHub LinkedIn Stack Overflow
 
Knowledge Base and Blog for Eric Hiller
 

Setting up Search on a Static Website

Creating a static website has a number of advantages, not least of which is speed. However there can certainly be some difficulties in creating ease-of-use features for visitors. One such feature that I wanted for this site, was a on-site search feature, as my intention is gradually convert my local documentation and upload it. In order for me to implement this I had to find a client-side solution as the server was to be entirely static, enter lunr.js. lunr.js is written in javascript, and has an indexer that runs on node on your local machine, goes over your site files generating a json index file, then has a search library that you call from your web frontend (and thus in the client’s browser) which reads and searches that index file. This allows for your site to stay static, and still have the dynamic feature of search. Thanks to these clever folks, search implementation became a snap for my static, hugo powered website.

Here I have put together a gist on how to get up and running:

Gist Description

How to implement a custom search for Hugo usig Gruntjs and Lunrjs.
Updated from original to allow for url page changes, more detailed install instructions.
See More at www.hiller.pro
Thanks to sebz for the initial writeup!

Requirements

Requires Nodejs

Setup

Project organization

Here is my Hugo based website project structure

	MySite/
		|- src/ <= Hugo source (private) root folder
			|- content <= .md content folders & files
			|- layout/
				|- static
				|- js/
					|- pindex.json <= Where we generate the lunr json index file
					|- lunrjs.min.js <= lunrjs library
					|- ...
			|- ...
			|- config.toml
		|- indexer/ <= lunr and associated node files
			|- node_modules/ <- node dependencies
			|- Gruntfile.js <= Where the magic happens (see below)
			|- package.json <= Dependencies declaration required to build the index
		|- ...

Install the Nodejs dependencies

  1. Navigate to the folder you wish to install the indexer into (in the above example MySite/indexer; and issue the commands:

    touch package.json
    npm --global install grunt-cli
    npm install --save-dev grunt string toml

  2. create the Gruntfile.js (see below)

Note I modified the Gruntfile.js to allow for changed urls

Time to work

The principle

We will work both at buildtime and runtime. With Gruntjs (buildtime), we’ll generate a JSON index file and with a small js script (runtime) initilize and use lunrjs.

Build the Lunr index file

Lunrjs allows you to define fields to describe your pages (documents in lunrjs terms) that will be used to search and hopefully find stuff. The index file is basically a JSON file corresponding to an array of all the documents (pages) composing the website.

Here are the fields I chose to describe my pages:

  • title => Frontmatter title or file name
  • tags => Frontmatter tags or nothing
  • content => File content
  • ref => Reworked file path used as absolute URL

ref can be drawn either from the directory position within content or from the url field within the frontmatter

Workflow

  1. Recursively walk through all files of the content folder
  2. Two possibilities
    1. Markdown file
      1. Parse the Frontmatter to extract the title and the tags
      2. Parse and clean the content
    2. HTML file
      1. Parse and clean the content
      2. Use the file name as title
  3. Use the path file as ref (link toward the page)

Show me the code!

Here is the Gruntfile.js file:


var toml = require("toml");
var S = require("string");

var CONTENT_PATH_PREFIX = "../src/content";
var SITE_IDX_DEST = "../src/static/js/pindex.json";

module.exports = function(grunt) {

    grunt.registerTask("lunr-index", function() {

        grunt.log.writeln("Build pages index");

        var indexPages = function() {
            var pagesIndex = [];
            grunt.file.recurse(CONTENT_PATH_PREFIX, function(abspath, rootdir, subdir, filename) {
                grunt.verbose.writeln("Parse file:",abspath);
                pagesIndex.push(processFile(abspath, filename));
            });

            return pagesIndex;
        };

        var processFile = function(abspath, filename) {
            var pageIndex;

            if (S(filename).endsWith(".html")) {
                pageIndex = processHTMLFile(abspath, filename);
            } else {
                pageIndex = processMDFile(abspath, filename);
            }

            return pageIndex;
        };

        var processHTMLFile = function(abspath, filename) {
            var content = grunt.file.read(abspath);
            var pageName = S(filename).chompRight(".html").s;
            var href = S(abspath)
                .chompLeft(CONTENT_PATH_PREFIX).s;
            return {
                title: pageName,
                href: href,
                content: S(content).trim().stripTags().stripPunctuation().s
            };
        };

        var processMDFile = function(abspath, filename) {
            var content = grunt.file.read(abspath);
			grunt.log.ok("READING FILE:" + abspath)
            var pageIndex;
            // First separate the Front Matter from the content and parse it
            content = content.split("+++");
            var frontMatter;
            try {
                frontMatter = toml.parse(content[1].trim());
            } catch (e) {
                grunt.log.error("ERROR WHILST PROCESSING: " + abspath + e.message);
            }
			if (frontMatter.url) {
				var href = frontMatter.url;
			} else {
				var href = S(abspath).chompLeft(CONTENT_PATH_PREFIX).chompRight(".md").s;
				// href for index.md files stops at the folder name
				if (filename === "index.md") {
					href = S(abspath).chompLeft(CONTENT_PATH_PREFIX).chompRight(filename).s;
				}
			}


            // Build Lunr index for this page
            pageIndex = {
                title: frontMatter.title,
                tags: frontMatter.tags,
                href: href,
                content: S(content[2]).trim().stripTags().stripPunctuation().s
            };

            return pageIndex;
        };

        grunt.file.write(SITE_IDX_DEST, JSON.stringify(indexPages()));
        grunt.log.ok("Index built");
    });
};

Example index file looks like:

[{
    "title": "Page1",
    "href": "/section/page1",
    "content": " This is the cleaned content of 'site/content/section/page1.md' "
}, {
    "title": "Page2",
    "tags": ["tag1", "tag2", "tag3"],
    "href": "/section/page2",
    "content": " This is the cleaned content of 'site/content/section/page2.md' "
}, {
    "title": "Page3",
    "href": "/section/page3",
    "content": " This is the cleaned content of 'site/content/section/page3.md' "
}]

Launch the task: grunt lunr-index or to run from any directory grunt --gruntfile the/remote/directory/indexer/Gruntfile.js lunr-index

Use the index

On the client side here is a small usage example:

<!DOCTYPE html>
<html>

<head>
    <title>Hugo + Lunrjs = &lt;3 search </title>
</head>

<body>
    Search:
    <input id="search" type="text">
    <br> Results:
    <ul id="results">
    </ul>
    <script type="text/javascript" src="https://code.jquery.com/jquery-2.1.3.min.js"></script>
    <script type="text/javascript" src="js/vendor/lunr.min.js"></script>
    <script type="text/javascript">
    var lunrIndex,
        $results,
        pagesIndex;

    // Initialize lunrjs using our generated index file
    function initLunr() {
        // First retrieve the index file
        $.getJSON("js/lunr/PagesIndex.json")
            .done(function(index) {
                pagesIndex = index;
                console.log("index:", pagesIndex);

                // Set up lunrjs by declaring the fields we use
                // Also provide their boost level for the ranking
                lunrIndex = lunr(function() {
                    this.field("title", {
                        boost: 10
                    });
                    this.field("tags", {
                        boost: 5
                    });
                    this.field("content");

                    // ref is the result item identifier (I chose the page URL)
                    this.ref("href");
                });

                // Feed lunr with each file and let lunr actually index them
                pagesIndex.forEach(function(page) {
                    lunrIndex.add(page);
                });
            })
            .fail(function(jqxhr, textStatus, error) {
                var err = textStatus + ", " + error;
                console.error("Error getting Hugo index flie:", err);
            });
    }

    // Nothing crazy here, just hook up a listener on the input field
    function initUI() {
        $results = $("#results");
        $("#search").keyup(function() {
            $results.empty();

            // Only trigger a search when 2 chars. at least have been provided
            var query = $(this).val();
            if (query.length < 2) {
                return;
            }

            var results = search(query);

            renderResults(results);
        });
    }

    /**
     * Trigger a search in lunr and transform the result
     *
     * @param  {String} query
     * @return {Array}  results
     */
    function search(query) {
        // Find the item in our index corresponding to the lunr one to have more info
        // Lunr result: 
        //  {ref: "/section/page1", score: 0.2725657778206127}
        // Our result:
        //  {title:"Page1", href:"/section/page1", ...}
        return lunrIndex.search(query).map(function(result) {
                return pagesIndex.filter(function(page) {
                    return page.href === result.ref;
                })[0];
            });
    }

    /**
     * Display the 10 first results
     *
     * @param  {Array} results to display
     */
    function renderResults(results) {
        if (!results.length) {
            return;
        }

        // Only show the ten first results
        results.slice(0, 10).forEach(function(result) {
            var $result = $("<li>");
            $result.append($("<a>", {
                href: result.href,
                text: "» " + result.title
            }));
            $results.append($result);
        });
    }

    // Let's get started
    initLunr();

    $(document).ready(function() {
        initUI();
    });
    </script>
</body>

</html>
comments powered by Disqus