Getting ELK Running with Docker and Learning the Basics of Elasticsearch

ELK at a glance

ELK stands for Elasticsearch, Logstash, and Kibana. In this stack, Elasticsearch handles full-text search, Logstash is responsible for collecting and processing real-time data, and Kibana provides the interface for analysis, statistics, and visualization.

Environment used

Physical machine: CentOS 7
Docker version: 19.03.5

Deploying ELK with Docker

For this setup, Docker makes deployment straightforward. The image used here is sebp/elk.

Before starting the container, one system parameter needs to be adjusted on the host. Otherwise Elasticsearch may fail with an error similar to: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

sysctl -w vm.max_map_count=262144

Then start the container:

docker run --ulimit nofile=65536:65536 -p 5601:5601 -p 9200:9200 -p 5044:5044 -p 5045:5045 -p 5046:5046 -d --restart=always -v /etc/logstash:/etc/logstash -v /etc/localtime:/etc/localtime --name elk sebp/elk

After waiting a short while, open http://服务器IP:9200 in a browser. If the service is up, Elasticsearch will return version and cluster information similar to the following:

{
  "name" : "elk",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "CtPcMe_ERQKzvt8Sul3MNQ",
  "version" : {
    "number" : "7.4.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "22e1767283e61a198cb4db791ea66e3f11ab9910",
    "build_date" : "2019-09-27T08:36:48.569419Z",
    "build_snapshot" : false,
    "lucene_version" : "8.2.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

A quick look at Elasticsearch REST APIs

Elasticsearch exposes a REST-style interface. In typical usage:

GET is used to retrieve data
PUT / POST can be used to update data
POST is commonly used to create new data
DELETE removes data

curl is a convenient way to test these APIs. The most common options are:

<table> <thead> <tr> <th>Parameter</th> <th>Meaning</th> <th>Example</th> </tr> </thead> <tbody> <tr> <td>-H</td> <td>Set request headers</td> <td>-H 'Content-Type:application/json' means the body is JSON</td> </tr> <tr> <td>-d</td> <td>Set request body</td> <td>-d '{"json": "data"}' sends JSON data</td> </tr> <tr> <td>-X</td> <td>Set HTTP method</td> <td>-XGET sends a GET request</td> </tr> </tbody> </table>

Creating an index

An index is needed before storing documents. To create one:

curl -XPUT http://127.0.0.1:9200/books_idx

If you want the output to be easier to read, append pretty:

curl -XPUT http://127.0.0.1:9200/books_idx?pretty

Deleting an index

curl -XDELETE http://127.0.0.1:9200/books_idx

Inserting documents

The sample data below is in raw JSON format:

{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}
{"name": "活着", "author": "余华", "publish": "作家出版社", "isbn": "9787506365437", "pubdate": "2017"}
{"name": "数学之美", "author": "吴军", "publish": "人民邮电出版社", "isbn": "9787115373557", "pubdate": "2014"}
{"name": "LaTeX入门", "author": "刘海洋", "publish": "电子工业出版社", "isbn": "9787121202087", "pubdate": "2013"}
{"name": "Rust编程之道", "author": "张汉东", "publish": "电子工业出版社", "isbn": "9787121354854", "pubdate": "2019"}

Converted to curl, the inserts look like this:

curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787111606420?pretty -H 'Content-Type: application/json' -d '{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}'

curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787506365437?pretty -H 'Content-Type: application/json' -d '{"name": "活着", "author": "余华", "publish": "作家出版社", "isbn": "9787506365437", "pubdate": "2017"}'

curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787115373557?pretty -H 'Content-Type: application/json' -d '{"name": "数学之美", "author": "吴军", "publish": "人民邮电出版社", "isbn": "9787115373557", "pubdate": "2014"}'

curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787121202087?pretty -H 'Content-Type: application/json' -d '{"name": "LaTeX入门", "author": "刘海洋", "publish": "电子工业出版社", "isbn": "9787121202087", "pubdate": "2013"}'

curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787121354854?pretty -H 'Content-Type: application/json' -d '{"name": "Rust编程之道", "author": "张汉东", "publish": "电子工业出版社", "isbn": "9787121354854", "pubdate": "2019"}'

In the URL http://127.0.0.1:9200/books_idx/_doc/9787111606420?pretty, the value 9787111606420 is a manually assigned document ID.

If no ID is supplied, Elasticsearch will generate one automatically:

curl -XPOST http://127.0.0.1:9200/books_idx/_doc?pretty -H 'Content-Type: application/json' -d '{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}'

A successful response will include an auto-generated _id:

{
  "_index" : "books_idx",
  "_type" : "_doc",
  "_id" : "plCP9G8BXUFnJZ4gtu4W",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 8,
  "_primary_term" : 1
}

Updating data

Both PUT and POST can overwrite an existing document with the same ID:

curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787111606420?pretty -H 'Content-Type: application/json' -d '{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工 业出版社", "isbn": "9787111606420", "pubdate": "2018"}'

After each submission, the returned JSON shows a different version value such as "_version" : 7.

In effect, this kind of full update removes the old document and writes a new one under the same ID.

When only part of a document needs to change, a partial update is more suitable:

curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787111606420/_update?pretty -H 'Content-Type: application/json' -d '{"doc": {"pubdate": "2000"}}'

Querying data

A simple global fuzzy search can be done like this:

http://192.168.0.101:9200/books_idx/_search?q=Rust

{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":0.62883455,"hits":[{"_index":"books_idx","_type":"_doc","_id":"9787121354854","_score":0.62883455,"_source":{"name": "Rust编程之道", "author": "张汉东", "publish": "电子工业出版社", "isbn": "9787121354854", "pubdate": "2019"}},{"_index":"books_idx","_type":"_doc","_id":"9787111606421","_score":0.62883455,"_source":{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}},{"_index":"books_idx","_type":"_doc","_id":"9787111606420","_score":0.62883455,"_source":{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}}]}}

To search within the name field only:

http://192.168.0.101:9200/books_idx/_search?q=name:%E6%95%B0%E5%AD%A6

{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":3.0808902,"hits":[{"_index":"books_idx","_type":"_doc","_id":"9787115373557","_score":3.0808902,"_source":{"name": "数学之美", "author": "吴军", "publish": "人民邮电出版社", "isbn": "9787115373557", "pubdate": "2014"}}]}}

To require matches across two fields:

http://192.168.0.101:9200/books_idx/_search?q=name:Rust&q=author:张汉东

{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":4.3965135,"hits":[{"_index":"books_idx","_type":"_doc","_id":"9787121354854","_score":4.3965135,"_source":{"name": "Rust编程之道", "author": "张汉东", "publish": "电子工业出版社", "isbn": "9787121354854", "pubdate": "2019"}}]}}

To fetch the full source of a single document:

http://192.168.0.101:9200/books_idx/_doc/9787115373557/_source

{"name": "数学之美", "author": "吴军", "publish": "人民邮电出版社", "isbn": "9787115373557", "pubdate": "2014"}

To fetch only selected fields:

http://192.168.0.101:9200/books_idx/_doc/9787115373557/_source?_source=author,name

{"author":"吴军","name":"数学之美"}

One important note about types

Elasticsearch has been gradually de-emphasizing Type and preparing for its removal. Many older tutorials still use paths like:

PUT {index}/{type}/{id}

From version 7.0 onward, the recommended form is:

PUT {index}/_doc/{id}

Final thoughts

After spending some time with Elasticsearch, one thing becomes clear: its REST API is simple and practical. CRUD operations are only the starting point. This setup is still a single-node environment, so cluster deployment is another topic to learn. Large-scale data ingestion also needs testing. And ELK includes two other major components beyond Elasticsearch. To make the stack genuinely useful in real projects, there is still plenty left to study.

References

最新ElasticSearch快速入门教程【 2019 千锋大数据】
Elasticsearch 移除 type 之后的新姿势

ELK at a glance

Environment used

Deploying ELK with Docker

A quick look at Elasticsearch REST APIs

Creating an index

Deleting an index

Inserting documents

Updating data

Querying data

One important note about types

Final thoughts

References

Related Posts

Notes at the Summer Solstice

What the Two-Dimensional World Means to Those Who Live in It

A Kind of Sketch That Begins With Words

I Think I’m Slipping Back Into Hibernation

A Spring Walk Through Chenshan Botanical Garden