ELK at a glance
ELK stands for Elasticsearch, Logstash, and Kibana. In this stack, Elasticsearch handles full-text search, Logstash is responsible for collecting and processing real-time data, and Kibana provides the interface for analysis, statistics, and visualization.
Environment used
- Physical machine: CentOS 7
- Docker version: 19.03.5
Deploying ELK with Docker
For this setup, Docker makes deployment straightforward. The image used here is sebp/elk.
Before starting the container, one system parameter needs to be adjusted on the host. Otherwise Elasticsearch may fail with an error similar to: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
sysctl -w vm.max_map_count=262144
Then start the container:
docker run --ulimit nofile=65536:65536 -p 5601:5601 -p 9200:9200 -p 5044:5044 -p 5045:5045 -p 5046:5046 -d --restart=always -v /etc/logstash:/etc/logstash -v /etc/localtime:/etc/localtime --name elk sebp/elk
After waiting a short while, open http://服务器IP:9200 in a browser. If the service is up, Elasticsearch will return version and cluster information similar to the following:
{
"name" : "elk",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "CtPcMe_ERQKzvt8Sul3MNQ",
"version" : {
"number" : "7.4.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "22e1767283e61a198cb4db791ea66e3f11ab9910",
"build_date" : "2019-09-27T08:36:48.569419Z",
"build_snapshot" : false,
"lucene_version" : "8.2.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
A quick look at Elasticsearch REST APIs
Elasticsearch exposes a REST-style interface. In typical usage:
GETis used to retrieve dataPUT/POSTcan be used to update dataPOSTis commonly used to create new dataDELETEremoves data
curl is a convenient way to test these APIs. The most common options are:
Creating an index
An index is needed before storing documents. To create one:
curl -XPUT http://127.0.0.1:9200/books_idx
If you want the output to be easier to read, append pretty:
curl -XPUT http://127.0.0.1:9200/books_idx?pretty
Deleting an index
curl -XDELETE http://127.0.0.1:9200/books_idx
Inserting documents
The sample data below is in raw JSON format:
{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}
{"name": "活着", "author": "余华", "publish": "作家出版社", "isbn": "9787506365437", "pubdate": "2017"}
{"name": "数学之美", "author": "吴军", "publish": "人民邮电出版社", "isbn": "9787115373557", "pubdate": "2014"}
{"name": "LaTeX入门", "author": "刘海洋", "publish": "电子工业出版社", "isbn": "9787121202087", "pubdate": "2013"}
{"name": "Rust编程之道", "author": "张汉东", "publish": "电子工业出版社", "isbn": "9787121354854", "pubdate": "2019"}
Converted to curl, the inserts look like this:
curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787111606420?pretty -H 'Content-Type: application/json' -d '{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}'
curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787506365437?pretty -H 'Content-Type: application/json' -d '{"name": "活着", "author": "余华", "publish": "作家出版社", "isbn": "9787506365437", "pubdate": "2017"}'
curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787115373557?pretty -H 'Content-Type: application/json' -d '{"name": "数学之美", "author": "吴军", "publish": "人民邮电出版社", "isbn": "9787115373557", "pubdate": "2014"}'
curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787121202087?pretty -H 'Content-Type: application/json' -d '{"name": "LaTeX入门", "author": "刘海洋", "publish": "电子工业出版社", "isbn": "9787121202087", "pubdate": "2013"}'
curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787121354854?pretty -H 'Content-Type: application/json' -d '{"name": "Rust编程之道", "author": "张汉东", "publish": "电子工业出版社", "isbn": "9787121354854", "pubdate": "2019"}'
In the URL http://127.0.0.1:9200/books_idx/_doc/9787111606420?pretty, the value 9787111606420 is a manually assigned document ID.
If no ID is supplied, Elasticsearch will generate one automatically:
curl -XPOST http://127.0.0.1:9200/books_idx/_doc?pretty -H 'Content-Type: application/json' -d '{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}'
A successful response will include an auto-generated _id:
{
"_index" : "books_idx",
"_type" : "_doc",
"_id" : "plCP9G8BXUFnJZ4gtu4W",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 8,
"_primary_term" : 1
}
Updating data
Both PUT and POST can overwrite an existing document with the same ID:
curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787111606420?pretty -H 'Content-Type: application/json' -d '{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工 业出版社", "isbn": "9787111606420", "pubdate": "2018"}'
After each submission, the returned JSON shows a different version value such as "_version" : 7.
In effect, this kind of full update removes the old document and writes a new one under the same ID.
When only part of a document needs to change, a partial update is more suitable:
curl -XPOST http://127.0.0.1:9200/books_idx/_doc/9787111606420/_update?pretty -H 'Content-Type: application/json' -d '{"doc": {"pubdate": "2000"}}'
Querying data
A simple global fuzzy search can be done like this:
http://192.168.0.101:9200/books_idx/_search?q=Rust
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":3,"relation":"eq"},"max_score":0.62883455,"hits":[{"_index":"books_idx","_type":"_doc","_id":"9787121354854","_score":0.62883455,"_source":{"name": "Rust编程之道", "author": "张汉东", "publish": "电子工业出版社", "isbn": "9787121354854", "pubdate": "2019"}},{"_index":"books_idx","_type":"_doc","_id":"9787111606421","_score":0.62883455,"_source":{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}},{"_index":"books_idx","_type":"_doc","_id":"9787111606420","_score":0.62883455,"_source":{"name": "深入浅出Rust", "author": "范长春", "publish": "机械工业出版社", "isbn": "9787111606420", "pubdate": "2018"}}]}}
To search within the name field only:
http://192.168.0.101:9200/books_idx/_search?q=name:%E6%95%B0%E5%AD%A6
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":3.0808902,"hits":[{"_index":"books_idx","_type":"_doc","_id":"9787115373557","_score":3.0808902,"_source":{"name": "数学之美", "author": "吴军", "publish": "人民邮电出版社", "isbn": "9787115373557", "pubdate": "2014"}}]}}
To require matches across two fields:
http://192.168.0.101:9200/books_idx/_search?q=name:Rust&q=author:张汉东
{"took":1,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":4.3965135,"hits":[{"_index":"books_idx","_type":"_doc","_id":"9787121354854","_score":4.3965135,"_source":{"name": "Rust编程之道", "author": "张汉东", "publish": "电子工业出版社", "isbn": "9787121354854", "pubdate": "2019"}}]}}
To fetch the full source of a single document:
http://192.168.0.101:9200/books_idx/_doc/9787115373557/_source
{"name": "数学之美", "author": "吴军", "publish": "人民邮电出版社", "isbn": "9787115373557", "pubdate": "2014"}
To fetch only selected fields:
http://192.168.0.101:9200/books_idx/_doc/9787115373557/_source?_source=author,name
{"author":"吴军","name":"数学之美"}
One important note about types
Elasticsearch has been gradually de-emphasizing Type and preparing for its removal. Many older tutorials still use paths like:
PUT {index}/{type}/{id}
From version 7.0 onward, the recommended form is:
PUT {index}/_doc/{id}
Final thoughts
After spending some time with Elasticsearch, one thing becomes clear: its REST API is simple and practical. CRUD operations are only the starting point. This setup is still a single-node environment, so cluster deployment is another topic to learn. Large-scale data ingestion also needs testing. And ELK includes two other major components beyond Elasticsearch. To make the stack genuinely useful in real projects, there is still plenty left to study.
References
- 最新ElasticSearch快速入门教程【 2019 千锋大数据】
- Elasticsearch 移除 type 之后的新姿势