Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4535

A Simple Tagging Implementation with Elasticsearch

$
0
0

1. Overview

Tagging is a common design pattern that allows us to categorize and filter items in our data model.

In this article, we’ll implement tagging using Spring and Elasticsearch. We’ll be using both Spring Data and the Elasticsearch API.

First of all, we aren’t going to cover the basics of getting Elasticsearch and Spring Data – you can explore these here.

2. Adding Tags

The simplest implementation of tagging is an array of strings.  We can implement this by adding a new field to our data model like this:

@Document(indexName = "blog", type = "article")
public class Article {

    // ...

    @Field(type = String, index = not_analyzed)
    private String[] tags;

    // ...
}

Notice the use of the not_analyzed flag on the index. We only want exact matches of our tags to filter a result. This allows us to use similar but separate tags like elasticsearchIsAwesome and elasticsearchIsTerrible.

Analyzed fields would return partial hits which is a wrong behavior in this case.

3. Building Queries

Tags allow us to manipulate our queries in interesting ways. We can search across them like any other field, or we can use them to filter our results on match_all queries. We can also use them with other queries to tighten our results.

3.1. Searching Tags

The new tag field we created on our model is just like every other field in our index. We can search for any entity that has a specific tag like this:

@Query("{\"bool\": {\"must\": [{\"match\": {\"tags\": \"?0\"}}]}}")
Page<Article> findByTagUsingDeclaredQuery(String tag, Pageable pageable);

This example uses a Spring Data Repository to construct our query, but we can just as quickly use a Rest Template to query the Elasticsearch cluster manually.

Similarly, we can use the Elasticsearch API:

boolQuery().must(termQuery("tags", "elasticsearch"));

Assume we use the following documents in our index:

[
    {
        "id": 1,
        "title": "Spring Data Elasticsearch",
        "authors": [ { "name": "John Doe" }, { "name": "John Smith" } ],
        "tags": [ "elasticsearch", "spring data" ]
    },
    {
        "id": 2,
        "title": "Search engines",
        "authors": [ { "name": "John Doe" } ],
        "tags": [ "search engines", "tutorial" ]
    },
    {
        "id": 3,
        "title": "Second Article About Elasticsearch",
        "authors": [ { "name": "John Smith" } ],
        "tags": [ "elasticsearch", "spring data" ]
    },
    {
        "id": 4,
        "title": "Elasticsearch Tutorial",
        "authors": [ { "name": "John Doe" } ],
        "tags": [ "elasticsearch" ]
    },
]

Now we can use this query:

Page<Article> articleByTags = articleService.findByTagUsingDeclaredQuery("elasticsearch", new PageRequest(0, 10));

// articleByTags will contain 3 articles [ 1, 3, 4]
assertThat(articleByTags, containsInAnyOrder(
 hasProperty("id", is(1)),
 hasProperty("id", is(3)),
 hasProperty("id", is(4)))
);

3.2. Filtering All Documents

A common design pattern is to create a Filtered List View in the UI that shows all entities, but also allows the user to filter based on different criteria.

Let’s say we want to return all articles filtered by whatever tag the user selects:

@Query("{\"bool\": {\"must\": " +
  "{\"match_all\": {}}, \"filter\": {\"term\": {\"tags\": \"?0\" }}}}")
Page<Article> findByFilteredTagQuery(String tag, Pageable pageable);

Once again, we’re using Spring Data to construct our declared query.

Consequently, the query we’re using is split into two pieces. The scoring query is the first term, in this case, match_all. The filter query is next and tells Elasticsearch which results to discard.

Here is how we use this query:

Page<Article> articleByTags =
  articleService.findByFilteredTagQuery("elasticsearch", new PageRequest(0, 10));

// articleByTags will contain 3 articles [ 1, 3, 4]
assertThat(articleByTags, containsInAnyOrder(
  hasProperty("id", is(1)),
  hasProperty("id", is(3)),
  hasProperty("id", is(4)))
);

It is important to realize that although this returns the same results as our example above, this query will perform better.

3.3. Filtering Queries

Sometimes a search returns too many results to be usable. In that case, it’s nice to expose a filtering mechanism that can rerun the same search, just with the results narrowed down.

Here’s an example where we narrow down the articles an author has written, to just the ones with a specific tag:

@Query("{\"bool\": {\"must\": " + 
  "{\"match\": {\"authors.name\": \"?0\"}}, " +
  "\"filter\": {\"term\": {\"tags\": \"?1\" }}}}")
Page<Article> findByAuthorsNameAndFilteredTagQuery(
  String name, String tag, Pageable pageable);

Again, Spring Data is doing all the work for us.

Let’s also look at how to construct this query ourselves:

QueryBuilder builder = boolQuery().must(
  nestedQuery("authors", boolQuery().must(termQuery("authors.name", "doe"))))
  .filter(termQuery("tags", "elasticsearch"));

We can, of course, use this same technique to filter on any other field in the document. But tags lend themselves particularly well to this use case.

Here is how to use the above query:

SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(builder)
  .build();
List<Article> articles = 
  elasticsearchTemplate.queryForList(searchQuery, Article.class);

// articles contains [ 1, 4 ]
assertThat(articleByTags, containsInAnyOrder(
 hasProperty("id", is(1)),
 hasProperty("id", is(4)))
);

4. Filter Context

When we build a query, we need to differentiate between the Query Context and the Filter Context. Every query in Elasticsearch has a Query Context so we should be used to seeing them.

Not every query type supports the Filter Context. Therefore if we want to filter on tags, we need to know which query types we can use.

The bool query has two ways to access the Filter Context. The first parameter, filter, is the one we use above. We can also use a must_not parameter to activate the context.

The next query type we can filter is constant_score. This is useful when uu want to replace the Query Context with the results of the Filter and assign each result the same score.

The final query type that we can filter based on tags is the filter aggregation. This allows us to create aggregation groups based on the results of our filter. In other words, we can group all articles by tag in our aggregation result.

5. Advanced Tagging

So far, we have only talked about tagging using the most basic implementation. The next logical step is to create tags that are themselves key-value pairs. This would allow us to get even fancier with our queries and filters.

For example, we could change our tag field into this:

@Field(type = Nested, index = not_analyzed)
private List<Tag> tags;

Then we’d just change our filters to use nestedQuery types.

Once we understand how to use key-value pairs it is a small step to using complex objects as our tag. Not many implementations will need a full object as a tag, but it’s good to know we have this option should we require it.

6. Conclusion

In this article, we’ve covered the basics of implementing tagging using Elasticsearch.

As always, examples can be found over on GitHub.


Viewing all articles
Browse latest Browse all 4535

Trending Articles