Exercise 3: Initializing Elasticsearch¶
Now that both Elasticsearch and Kibana are operational, let us create the necessary indices for us to work with in the following exercises.
Index a document via PowerShell¶
First, we are going to use Elasticsearch's REST API through PowerShell.
-
To index a document in Elasticsearch, issue the following command.
(Invoke-WebRequest 'http://localhost:9200/test/_doc/1?pretty' -Method Put -ContentType 'application/json' -Body '{ "name": "John Doe" }' -UseBasicParsing).Content
This way, we inserted a document of type
_doc
into the index calledtest
with id1
. The response JSON should state"result": "created"
. -
Query the document with the following command.
(Invoke-WebRequest 'http://localhost:9200/test/_doc/1?pretty' -Method Get -UseBasicParsing).Content
The result JSON tells us the name of the index, the document's id, and the entire document we inserted in the
_source
field.{ "_index": "test", "_type": "_doc", "_id": "1", "_version": 1, "_seq_no": 0, "_primary_term": 1, "found": true, "_source": { "name": "John Doe" } }
Create an index and index a document using Kibana¶
In this part of the exercise, we will create an index for documents containing information about people working in the fast-food industry. Here is a sample document.
Sample document¶
When using this sample document, make sure to replace the Neptun code with yours all uppercase in the gender
and company
fields. The final value should look like this: ABC123 female
and ABC123 Subway
respectively.
{
"gender": "NEPTUN female",
"firstName": "Evelyn",
"lastName": "Petersen",
"age": 17,
"phone": "+1 (900) 503-3892",
"address": {
"zipCode": 63775,
"state": "NY",
"city": "Lynn",
"street": "Clarkson Avenue",
"houseNumber": 503
},
"salary": 87217,
"company": "NEPTUN Subway",
"email": "evelyn.petersen@subway.com",
"hired": "09/29/2009"
}
We are going to use Kibana's Dev Tools for this part of the exercise. Although it uses the same REST API that we used through PowerShell, it provides a more convenient GUI for us to use. In this Dev tool, we can run queries.
-
A query in Kibana's Dev Tools contains an http verb and an URL matching Elasticsearch' REST API in the first line, following with a body as JSON. Copy the text from below then press the Play button in the top right corner of the editor.
PUT salaries { "settings": { "number_of_shards": 1, "number_of_replicas": 0 }, "mappings": { "properties": { "gender": { "type": "keyword" }, "address.state": { "type": "keyword" }, "company": { "type": "keyword" }, "hired": { "type": "date", "format": "MM/dd/yyyy" } } } }
The settings we use here are the following.
-
settings
: We set the number of shards and replicas here. While settings the number of shards is not that important here, we must set the number of replicas to zero to have an index with green health value. Elasticsearch refuses to put a shard and its replica on the same node, and we only have a single node. -
mapping
: Mapping is the "schema" of the data. It is not mandatory to set this, but Elasticsearch will choose how to interpret data when it is ambiguous unless we specify the mapping.gender
,address.state
,company
: These are values we know are only going to have a few select values (e.g., "male" and "female" for gender), therefore we do not want to allow free text search on them. We can help the system by specifying this.
hired
: Although this is a date field, the date representation is not standard — Elasticsearch wouldn't recognize it by itself. Therefore we have to specify the date format explicitly.
-
-
We can check the indices with the
GET _cat/indices?v
query. (Use the Dev Tools to execute this query too.)Note how the
test
index's health is yellow, and the health of thesalaries
index is green. That is because the default value for the number of replicas is 1. -
Insert the sample document into the created index.
Before executing this query do not forget to edit the Neptun code in the
gender
andcompany
fields.POST salaries/_doc { "gender": "NEPTUN female", "firstName": "Evelyn", "lastName": "Petersen", "age": 17, "phone": "+1 (900) 503-3892", "address": { "zipCode": 63775, "state": "NY", "city": "Lynn", "street": "Clarkson Avenue", "houseNumber": 503 }, "salary": 87217, "company": "NEPTUN Subway", "email": "evelyn.petersen@subway.com", "hired": "09/29/2009" }
Executing the query will yield a similar result (on the right side of the window). This is the response of the POST query with the
id
of the document inserted.We can use the
_id
value from the response to query the document.GET salaries/_doc/eZSmaGkBig5GeeBFsFG6
Modify the input data¶
Before importing the rest of the sample data, add your Neptun code as a prefix to some of the values in the salaries.json
file too:
- Each
gender
value shall be prefixed, e.g."gender":"NEPTUN female"
-
Each
company
value shall be prefixed, e.g."company":"NEPTUN McDonalds"
-
Find the
salaries.json
file in the root of the repository. Open a PowerShell console here. -
Edit the following command by adding your Neptun code all uppercase, then execute it in PowerShell (do NOT change the quotation marks, only edit the 6 characters of the Neptun code!):
(Get-Content .\salaries.json) -replace '"gender":"', '"gender":"NEPTUN ' -replace '"company":"', '"company":"NEPTUN ' | Set-Content .\salaries.json
-
Verify the results; it should look similar (with your own Neptun code):
The file must be a valid JSON! Please double-check the quotation marks around the values. If the result is not correct, you can revert the change made to this file using git (
git checkout HEAD -- salaries.json
) and then retry.The modified file shall be uploaded as part of the submission.
IMPORTANT
Adding your Neptun code is a mandatory step. It will be displayed on visualizations created in the following exercises.
Index many documents using the bulk API¶
And now, let us index these documents.
-
We can add multiple documents to the index using the bulk API. Issue the following command from the PowerShell window in the folder of the starter solution.
Invoke-WebRequest 'http://localhost:9200/_bulk' -Method Post -ContentType 'application/json' -InFile .\salaries.json -UseBasicParsing
-
Check the response for errors. You will see a similar message if everything is OK (note the errors in the response):
If you see a similar error, it means the source file changes resulted in an invalid json file.
If this happens, you need to start over:
-
Delete the
salaries
index by executing aDELETE salaries
request in Kibana. -
Go back to the index creation step, then repeat the index creation and indexing of the single document.
-
Reset the changes made to the
salaries.json
file, and retry the replacement with special care regarding the quotation marks. -
Now, retry the bulk index request.
-
-
Execute a search using query
GET salaries/_search
(using Kibana). This will return a few documents and let us know how many documents there are (total number matching the query will be the total number of documents, due to the lack of filtering in this search). There should be 1101 documents.If you see fewer documents, you might try using the Refresh API to ensure Elasticsearch is finished with all indexing operations. To trigger this, execute a
POST salaries/_refresh
request. Then check the count again. If it is still not correct, you need to start over.