Exercise 4: Elasticsearch queries¶
The purpose of this exercise is to use the search and aggregation capabilities of Elasticsearch to answer questions of the data imported previously.
Query JSON
You will be requested to save the query JSON in the following exercises. The image below shows you what that means: the VALID JSON sent to Elasticsearch, WITHOUT the first line, and definitely NOT the result JSON that is on the right side in Kibana.
Search syntax
Please use the search syntax of Elasticsearch that defines the criteria in JSON body. Do not use the URL query method filtering.
All of the following queries are sent to the salaries/_search
endpoint with a GET
method. This means that the first line in Kibana Dev Tools will be GET salaries/_search
for all of them.
"SELECT *" in Elasticsearch¶
Issue the following query.
{
"query": { "match_all": {} },
"from": 0,
"size": 10,
"sort": ["_doc"]
}
-
query
: This provides the filter for the query. Think of theWHERE
clause in SQL. -
from
andsize
: These can be used for paging results. It is important to note that there is no way to query all documents using Elasticsearch. If you omit thesize
value, it defaults to 10. -
sort
: This can be used to sort the results.
You do not have to save this query.
a) Who are the top 5 workers with the best salaries?¶
Change the previous query:
- we need the first 5, so the size will be 5;
- the sort has to be performed on field salary, for which use the following syntax:
{
...
"sort": [{ "salary": { "order": "desc" } }]
}
Execute this query and verify the results.
SUBMISSION
Save the final query JSON as ex4-a.json
.
b) Who are the top 5 workers at McDonalds aged between 18 and 30 with the best salaries?¶
Building on the previous query, replace the query
with a boolean query as follows:
{
"query": {
"bool": {
"must": { "match_all": {} },
"filter": [
{ "range": { "age": { "gte": 18, "lte": 30 } } },
{ "term": { "company": "NEPTUN McDonalds" } }
]
}
},
...
}
The difference between this query and the previous one is that we have to apply some filters to the results.
- The
age
must be between 18 and 30. - The
company
must be NEPTUN McDonalds prefixed with your Neptun code.
Execute this query and verify the results.
SUBMISSION
Save the final query JSON as ex4-b.json
.
c) Are there more men or women working for these companies? Is there a difference between the average salaries?¶
We have to use term aggregations to answer these questions. We can use the following query.
{
"size": 0,
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender"
},
"aggs": {
"average_salary": {
"avg": {
"field": "salary"
}
}
}
}
}
}
Specifying "size": 0
means that we do not want any matching documents as we only need the aggregation results. (There are no search criteria specified here, but there could be.)
This query first groups the documents by the gender
value and then calculates the average of the salary
values within the groups.
SUBMISSION
Based on the results, what is the average salary of women? Type your answer into ex4-c.txt
. Include only the number and nothing else in this text file (e.g., 123.45
). Don't worry about decimal digits, the value will be rounded for comparison. (You do not need to save the query JSON here.)