Elastic Search with Spring

ElasticSearch quite a familiar term. The first question that comes in our mind is:

1) What ElasticSearch is? &
2) Why should we use it?



So to answer the first question, I would like to describe ElasticSearch as a wrapper over Apache Lucene (Apache Lucene is a high performance, text search engine library written in Java). So, ElasticSearch in its core is a distributed full-text search engine which is incredibly easy to scale (because of its well designed anatomy), returns results at a lightning speed (because of its analyzers which tokenize the texts and creates a reverse indexing just like the indexing of a book, a variety of analyzers are available), schema free in nature and supports JSON. The best part about elastic search which I experienced is all its functionalities are being exposed as REST services.

ElasticSearch is a distributed NO SQL database where every input to it goes in JSON format and it has its very own way of storing data in data files which by default is present inside the data directory (which is inside the elasticsearch installation folder).

Now for the second question:

In traditional SQL database implementing full text search functionality (Like searching a BLOG POST) will not only increase the size but too will increase the complexity too as traditional SQL database being schema dependent so storing different kinds of documents means indivudual schema for each document type, which adds on to the complexity.

Here elasticsearch is a perfect choice for implementation as it being schema less and NO-SQL, so we can store variety of documents with ease as elasticsearch supports JSON and also with the reverse indexing in action it returns results at the lighning speed.
When we store a document in elastic DB, various analyzers (default as well as custom if we specify) analyzes the text that we store tokennize them and creates an index out of them which aid in returning results at a faster speed. ElasticSearch defaults has automatic type guessing mechanism which detects the type of fields we are indexing and apply standard analyzers accordingly.

From configuraton to data storage and querying (using query DSL) everything has been described so well in elasticsearch official website, So I am not reiterating all those.

My primary focus would be using elasticsearch with Spring, i.e. Spring-Data-ElasticSearch.

My first encounter with elasticsearch happened while I was exploring ELK stack i.e. (ElasticSearch, Logstash and Kibana). So before deep diving lets get some idea about elastic search Anatomy. Though it has been described very well Here and its subsequent posts.

At the top level lies the Cluster, by default it is being named as "ElasticSearch". Within cluster there lies Node. when we start Elasticsearch engine at the command prompt, we can see the names of various Marvel Comic character, those designate the nodes. Within nodes there lies Shards (Primary as well as Replica). Replica are the mirror of Primary shards. Data saving and retrieval in Primary and Replica Shards as well as Data consistency everything has been defined well in the official website. Within shards there likes segments where the data resides. Now while working with elasticsearch we will come across terms like index, types etc.

Now if we describe them as:

Elasticsearch Index = Database
Types = Tables
Mapping = Schema
then it would be easy for us who has some SQL BACKGROUND. Though we have said from the beginning that elasticsearch is schema-less, the above similarity is only for the sake of understanding the concepts. An Elasticsearch index is a logical namespace to organize data. A Index may span across multiple nodes.

The Data in the tables are the documents that we store.

Lets see how the basic functionality of elastic search can be implemented with Spring.

We will :

1) Create an index.
2) Create type for the index (Rather than auto generation while saving document).
3) Save a document.
4) Execute queries on the document.
Now the document that we will be storing has both nested structures as well as parent child relationship. We will see both of them in a while.

The documents that we will storing here has the following structure.

However,the codebase can be found Here


@ToString
class Author {

    String name
}

@ToString
@Document(indexName = "bookindex", type = 'book')
class Book {

    @Id
    String id

    String name

    @Field(type = FieldType.Nested)
    Author author

}


@ToString
@Document(indexName = "parchild")
class Doctor {
     String id
     String parentId
     String docname
     String trade
}


@ToString()
@Document(indexName = "parchild")
class Hospital {
     String id
     String hospname
     String location

}

Author is nested within the type Book which is denoted by
@Field(type = FieldType.Nested)


For Nested Data Model we will be dealing with the Documents of type Author and Book and for Parent Child Data Model the document type which will be in action will be of type Doctor and Hospital.

For all the operations mentioned above we will be using ElasticSearchTemplate. For autowiring the dependencies required is:


'org.springframework.boot:spring-boot-starter-data-elasticsearch:1.4.0.RELEASE'
 'com.vividsolutions:jts:1.13'

and in the DBConfig.groovy we can see how ElasticSearchTemplate is being exposed as bean:


 @Bean
    public ElasticsearchTemplate elasticsearchTemplate()
    {
        ElasticsearchTemplate elasticsearchTemplate = new ElasticsearchTemplate(client())
        elasticsearchTemplate
    }

    @Bean
    public Client client() {
        Settings settings = Settings.settingsBuilder().put("cluster.name",'elasticsearch').put("client.transport.sniff",true).build()
        TransportClient transportClient =  TransportClient.builder().settings(settings).build()

        InetAddress inetAddress =  InetAddress.getByName("127.0.0.1")
        transportClient.addTransportAddress(new InetSocketTransportAddress(inetAddress, 9300))
        transportClient
    }

Here we can see the Client being used is a TransportClient. However there is another client available in elasticsearch i.e. NodeClient. The differences as well as their applicability has been well elaborated in the Official Website. Since we are starting the elasticsearch server in our local machine so localhost address we have used along with the default port(9300) for the client to communicate with elasticsearch cluster for all the operations.

We will try to explore the basic operations with ElasticSearchTemplate in the

next

post.

Feel free to provide your comments and feedback

Comments

  1. ElasticSearch + Kibana database

    Elasticsearch is a free, open-source search and analytics engine based on the Apache Lucene library. It’s the most popular search engine and has been available since 2010. It’s developed in Java, supporting clients in many different languages, such as PHP, Python, C#, and Ruby.

    Kibana is an free and open frontend application that sits on top of the Elastic Stack, providing search and data visualization capabilities for data indexed in Elasticsearch. Commonly known as the charting tool for the Elastic Stack (previously referred to as the ELK Stack after Elasticsearch, Logstash, and Kibana), Kibana also acts as the user interface for monitoring, managing, and securing an Elastic Stack cluster — as well as the centralized hub for built-in solutions developed on the Elastic Stack. Developed in 2013 from within the Elasticsearch community, Kibana has grown to become the window into the Elastic Stack itself, offering a portal for users and companies.ElasticSearch + Kibana database

    our ElasticSearch + Kibana database expert skills & 24/7 dedicated support for stable clusters and achieve unparalleled performance and cost reduction along with stable data health. Experience our enterprise-class, worldwide support for Kibana integrated Elasticsearch & other stack.With years of direct, hands-on experience managing large Elasticsearch deployments, Genex efficiently supports data-analytics in real time. Take advantage of market-leading functionalities and Kibana visualizations on large data sets, with features including high available clusters, TLS, and RBAC

    ReplyDelete

Post a Comment

Popular posts from this blog

Use of @Configurable annotation.

Spring WS - Part 5

Spring WS - Part 4