Database

MongoDB Full Text Search

SHARES
ShareTweet

When you perform a full text search, you’re simply searching a full text database for certain search queries. The full text search then returns documents or rows that match the search criteria. You often perform a full text search without realizing it — such as on your computer looking for a certain file, or in Google when you want to know more about something.

This kind of functionality is incredibly useful on both websites and apps. For instance, if you have an online music catalog on your website and you want to let users search songs by title, artist or album name, you can use full text search and simply create a single text box for all three items, rather than three separate search boxes.  Not only is having a single box to search all three categories easier from a programming standpoint, but it’s also easier for your users to understand as well.

Understanding Full Text Search Terminology

Before you dive right into creating full text search capabilities, it’s important to have a basic understanding of the terminology you’ll come across when using MongoDB to program full search capabilities.
1. Stop Words:

  • These are the words that are generally irrelevant and taken out from the scope of searching.
    For example- a, an, the, is, at etc.MongoDB is smart enough to be able to discern stop words in different languages. If you haven’t specified a language in the search query, then the system will exclude stop words it has already discovered for that language. It is also possible to tell MongoDB to not consider ANY stop words in any language by specifying the language as “None”. We’ll learn how to do this later.

2. Stemming:

  • Stemming simply means to reduce the word to its stem or “root” form.
    For example- sleeping, slept, sleeps will stem to the word sleep.

3. Score:

  • In full text search we specify scores to determine which searched result is more relevant to the search criteria specified.

Conducting a Full Text Search in MongoDB

To get the most out of the full text feature in MongoDB, we’ll first need to create a text index within the field you want to enable searching on. Let’s take a closer look at how to do this, along with some examples: 

Let’s start using our song catalog website as an example. We have some basic details about a song and will now start adding details like the song name, artist and album.

db.songs.insert({ song_name: "I Gotta Right To Sing The Blues", artist_name: "Billie Holiday", album_name: "The Complete Commodore/Decca Masters" })

db.songs.insert({ song_name: "Blue Sky", artist_name: "The Allman Brothers Band", album_name: "Eat a Peach" })

db.songs.insert({ song_name: "Holiday", artist_name: "Madonna", album_name: "Madonna" })

There are three ways to create a text index in MongoDB:

 1. Single Index (This will index only a single field)

 db.songs.createIndex({"song_name":"text"}) 

Now we have a text index defined on the song_name field, let us search the collection specifying some search criteria with the help of the $text operator.  we will also use the $meta operator to get the textScore of the results. 

db.songs.find({$text: {$search: "blues"}}, {score: {$meta: "textScore"}})

The above query will return the following results: 

{ "_id" : ObjectId("590f6c21e3a010bd76608863"), "song_name" : "I Gotta Right To Sing The Blues", "artist_name" : "Billie Holiday", "album_name" : "The Complete Commodore/Decca Masters", "score" : 0.625 }

{ "_id" : ObjectId("590f6c58e3a010bd76608864"), "song_name" : "Blue Sky", "artist_name" : "The Allman Brothers Band", "album_name" : "Eat a Peach", "score" : 0.75 }

 Point to be noted in the above result is that we have searched for the word blues but we also got the result for “Blue” (second result). This happens because of the stemming process.


2. Compound Index (This will index multiple fields)

In MongoDB we can have at most one text index in a collection.  In nearly every case, you’ll want to have the text index in multiple fields, so we’ll want to use the compound index to make this happen.

Note: Since mongodb allows at most one text index, we need to first drop the previously created index otherwise mongodb will not allow us to create the compound index.

db.songs.dropIndex("song_name_text")

db.songs.createIndex({"song_name":"text","artist_name":"text"}) 

Since now we have text index applied on 2 fields, let us search for the word “holiday” and see what we get back. We will also sort the result according to the textScores of the individual results.

db.songs.find({$text: {$search: "holiday"}}, {score: {$meta: "textScore"}}).sort({score:{$meta:"textScore"}})

The above query will return the following results:

{ "_id" : ObjectId("59103ca0e3a010bd76608865"), "song_name" : "Holiday", "artist_name" : "Madonna", "album_name" : "Madonna", "score" : 1.1 }

{ "_id" : ObjectId("590f6c21e3a010bd76608863"), "song_name" : "I Gotta Right To Sing The Blues", "artist_name" : "Billie Holiday", "album_name" : "The Complete Commodore/Decca Masters", "score" : 0.75 }

3. Wildcard Index (This will index all the text fields present in the document) 

Wildcard Index is useful when you have lot of text fields and you want to index them all. This will also index the fields which are added after you defined the wild card index. Let us see how it works (remember to drop the previously created compound index)

 db.songs.createIndex({"$**":"text"})

 This will create text index on all of the text fields (To make sure this works for you, simply repeat the queries above and see the results for yourself).

 To demonstrate that it will index the fields that are added afterwards le’ts insert another record in our songs collection with another text field genre.

 db.songs.insert({ song_name: "Stairway To Heaven", artist_name: "Monsters Of Rock", album_name: "Led Zeppelin Tribute EP - Monsters Of Rock", genre: "Hard rock" })

 Now if we search for “hard” (genre of above record “Hard Rock”) we will get the above record as result which shows that the genre field was indexed.

Control the Stop words and Stemming: 

If we want MongoDB to use any language specific stop words and stemming, we can do this in the following two ways:

1. Specify the language in search query 

db.songs.find({$text: {$search: "holiday", $language: 'en'}}, {score: {$meta: "textScore"}})

 2. Specify the language at the time of creating the index

db.songs.createIndex({"$**":"text"}, { default_language: "en" }) 

Which Languages are Supported by MongoDB? 

You can visit (https://docs.mongodb.com/manual/reference/text-search-languages/#text-search-languages) to see which languages are supported by MongoDB. You can also set the language as “none” if you want to completely ignore the stop words and stemming.

Things to remember while querying 

  • By default MongoDB performs logical OR on the keywords provided in the search criteria.
    For example: {$search: “holiday to heaven”} will be treated as “holiday” or “to” (stop word ignored) or “heaven”.
  • So if you want to search for an exact phrase then above query will be changed to {$search: “\”holiday to heaven\”” }
  • To exclude any term from the search criteria use “-” (minus sign)  {$search: “holiday -heaven” }

Pros of MongoDB Full Text Search 

  • Lets you implement the full text search in your existing DB architecture without having to use any external database like Elastic Search(https://www.elastic.co/) or SOLR(http://lucene.apache.org/solr/)
  • MongoDB supports multiple languages for Text indexes allowing you or your users to search in different languages.
  • It is Supported in both, find and aggregate queries.

Cons of MongoDB Full Text Search

  • You’ll notice a slight overhead at the time the record is inserted because of the indexing process
  • You can’t search for similar words (synonyms) or a substring within a string.  So for instance if you wanted to search for the artist “Michael Jackson”, you’ need to provide the full name, not just the first name.
  • When used in the aggregation pipeline, the $match stage that includes a $text must be the first stage in the pipeline and the $text operator can only occur once in the stage.

 As you can see, MongoDB is a versatile way to cover a variety of use cases in a wide range of applications. However, there are still some complex scenarios where using external full text search databases may be better.  Try it yourself and see how full text search in MongoDB can help meet your search requirements.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *