Kusto Query language 101
The article aims to provide the fundamentals of Kusto query language to search for complex data patterns as well as generate complicated analytics in the Azure cloud.
As a cybersecurity analyst, Splunk is one of the most important tools for my daily tasks. Automatically over years, I become comfortable with Splunk SPL (Search Processing Language). However, I have always been opening myself for Splunk SPL alternatives or a better query language. With cloud popularity, I have recently participated in Azure training on Azure Sentinel and the Kusto query language, which offers at least (if not more) the capabilities available in Splunk SPL. Hence, this article aims to share my Kusto learning with fundamentals and common commands to cybersecurity analysts.
The article contains 3 mains parts as below:
- Kusto Fundamentals: provides some key information about this query language
- Searching and Presentation: provides queries to search for data and to present data in a structured way so that an analyst can derive information easily.
- Analytics and Charting: introduces the power of creating analytics, summarizing events, and visualizing data.
Kusto query language is organized in a SQL-alike hierarchy including databases, tables, and columns, which makes its syntax also a bit SQL-query alike.
Kusto is a powerful query language to not only search for complex patterns, but also create complicated analytics. The following sections will demonstrate the power of Kusto.
A Kusto query contains one or more statements that return tabular results. In general, a statement starts with a table name and pipe(s) (|) for subsequence processing. each statement is separated by a semicolon as in the example below:
StormEvents --> table name "StormEvent", Start of Statement 1
--> subsequence processing after pipe
| where StartTime > datetime(2007-01-14 00:11:00)
; --> end of statement 1StormEvents --> start of statement 2
| where EndTime < datetime(2007-01-14 8:30:00)
As seen in the figure above, there are 2 statements in the query; hence, there are two tables returned for each statement.
Another important concept in Kusto is “term”. Since a security analyst will have to work a lot with strings, understand “term” is essential. Each string value is broken into maximal sequences of at least 3-character ASCII alphanumeric letters and each of those broken values is called “term”. For example, column “name” contains a string value “this is a test”, Kusto will index two terms in the string, namely “this” and “test”. Hence, when searching, it is more efficient to search on indexed terms (e.g. name has “test”) rather than to scan over the whole string value (e.g. name contains “test”).
Basic Logical operators can be found in the figure below. For further about string operators, I would highly recommend checking this page.
For learning purposes, Azure provides the free data explorer service (https://dataexplorer.azure.com) with some data samples to practice Kusto queries. In the following sections, I use StormEvents sample data in dataexplorer to demonstrate Kusto syntax.
Kusto query language full documentation can be found on MS official page — https://docs.microsoft.com/en-us/azure/data-explorer/kusto/concepts/
Search and Presentation
where operator (Splunk=where): is used to filter results based on the logic comparison. It has an alias name called filter. In the example below, where is used to filter events within the StartTime/EndTime values and EventType with term “flood”
Search operator (splunk=search): offers a similar capability as where. However, the syntax is clearer and more intuitive when looking for term and free text based on regex. Personally, I would use where for field value comparison and search for string-related lookup.
Note: this is quite different to Splunk in which you are advised to use search, and only use where if only you know what you are doing and understand their differences, subtle but important
The side-by-side usage of search and where is presented in the figure below:
The example below searches for the term “flood” in all columns and the value in column “State” starts with “ken” and the value in EventType matches the defined regex.
Project operator (splunk=table): is to identify columns to be presented
Order operator (Splunk=sort): is to sort the returned result based on provided columns. The example belows sorts the returned results based on values in 2 columns (State and EventType) in the ascending order.
ago operator(splunk=earliest, latest): it is common for security analyst to investigate events with relative times such last 24 hours, last 30 days. ago(timespan) is for that purpose which subtract the current UTC time with the provided timespan (e.g 10s, 30m, 24h and 30d).
StormEvents| where StartTime > ago(24h)
Distinct operator (Splunk=dedup): is to deduplicate data based on provided column names. It is similar to dedup command in Splunk
StormEvents| distinct EventType,State --> deduplicate returned values based on EventType and State columns
Extend operator (Splunk=eval): is to create a new variable during processing. The example below shows that a new variable, called duration, is created.
Let operator: is to assign a name to a statement, which can be referred later in the other statement by name (think of it like a function in programming). This is quite an exiting capability which I couldn’t think of an equivalent command/feature in Splunk (partly Splunk Marco). The example below demonstrate the power of let which binds “texas” name to the whole expression and allow usage in the next statement.
Top operator (Splunk=top): returns the top N results sorted by specified columns. The example below returns the top 3 rows with the highest duration values.
Join operator (Splunk=join): is similar to join in Splunk and SQL, which return a new table by combining two tables based on common column(s). There are quite a few joining kinds as shown in the figure below, which determines the merging behavior and returned results.
The example below tries to match rows from “StormEvents” and “covid19” table over a common column “State” and the results contain rows with State, EventType (in “StormEvents” table), and StateCode (in “covid19” table) columns.
Extract operator (Splunk=rex): is used to extracted substrings based on a regular expression.
Syntax: extraxt (regex_formula, capturegroup, input_text, optional_extract_datatype)
The example below tries to extract a number after FM in the EventNarrative string values.
Analytics and Charting
Summarize operator (Splunk=stats): can be used to create various analytics over the examined data such as min, max, average, count etc. Professional security analyst would use this capability heavily in order to get an insight into a large number of events rather than go through to each event manually.
The example below demonstrates the power of this operator by counting all storms grouped by State and EventType
Bin operator (Splunk=bin): is frequently used with summarize if you want to examine events at a more granular level. For example, instead of counting all connections to a server, you may want to check the number of connections on an hourly basis (in Splunk, either bin or timechart (running bin in background) commands should achieve the similar goal).
Syntax: bin(value, binsize). Given an example of value 11 and binsize 2, this value would categorized into the bin for values between 10 and 12.
The example below shows the breakdown counting of storms with a duration in bin size of 1 hour.
Render operation: is frequently used with summarise to visualize aggregation information.
Syntax: render charttype (e.g. timechart, barchart etc.). The full list can be found here.
I re-use the above example to visualize the counting of each bin by a line chart.
The article has provided some common operators that a security analyst could apply to enhance his investigation. However, the sections above don’t mean to be an exhausting list, but its main goal is to lay a foundation to start with. It is highly recommended to check the official documentation page if you need to understand an operator or look for a new operator that can processing data as your need.