With the growth of technology into every aspect of our lives, the amount of data being reported back has grown with it. The sheer volume of data means that manually sifting through the data is impossible within a reasonable time frame. Splunk is software whose aim is to sort through this data and highlight any interesting trends or patterns within the data. There are a number of different versions of Splunk available tailored to the type of data you’re looking to analyse and it can be expanded with plug-ins posted on Splunkbase. Luckily a free version is offered by Splunk, allowing you to download and learn Splunk without paying the [significant] fee for one of the Splunk variants.
Downloading, Installing, and Launching
Splunk is available for all major operating systems; Linux, Windows, and macOS; but I’ll focus on macOS. To download Splunk you’ll need to sign-up for an account on the Splunk website, once complete you’ll have the choice of downloading and installing your preferred platform.
I chose to install Splunk from the command line on macOS by downloading the tar file and ran this command to install into the applications directory:
$ tar xvzf splunk-version.tgz -C/Applications
With Splunk installed you’ll then need to start the Splunk process. If you install Splunk to the applications directory as I did then you’ll need to start the Splunk process by running:
$ /Applications/splunk/bin/splunk start
During the startup process you’ll be asked to assign the installation an admin password of your choice. For ease you can add an alias to the .bash_profile file within your home directory that will allow you to start and stop the process with a simple ‘splunk’ command. Use nano to open your .bash_profile
$ nano ~/.bash_profile
and add the line
With everything installed and started navigate to 127.0.0.1:8000 in your browser and you’ll be able to login with the user name password and the admin password entered during startup.
With Splunk set up, you can now import data into it to be analysed. Splunk provide sample data on their site to be imported (the instructions on uploading the data on the Splunk site explain it concisely).
Searching the Data
By default Splunk will have the Search & Reporting app installed that will allow you to get started in finding something useful in the data. Clicking on the data summary button on the right-hand side you will be presented with three headings; Hosts, Sources, and Sourcetypes. A Host is the host name, IP address or fully qualified domain name of the machine from where the event originated. The Source is the path, port, or script where the event originated. Finally the Sourcetype is what type of data it is.
I’d recommend going try a few of the entires within each heading and finding out what is produced and remember that after clicking on them if you aren’t shown any data, you’ll likely need to expand the timeframe in the top right from “Last 24 hours” to a greater period; I chose all time for ease as it does not make my Mac struggle and there isn’t a massive amount of data.
An example of what can be found just in the this search view is when choosing the “access_combined_wcookie” Sourcetype. Once this is selected you’ll be able to see all the fields this data contains. Two of the fields that provide some form of interesting data are the “action” and “useragent” field. These two fields are the most interesting as “action” would allow you to see some of the most common user behaviours while on a website, while useragent would allow you to prioritise development for certain browsers, or for desktop or mobile.
Returning back to the front page of the Search & Reporting app allows you to query the data. For example I used the query “categoryid=accessories” “action=addtocart” to find events where the addtocart action was triggered while in the accessories section.
Next we’ll create a chart using the data, but before this can be done we’ll need to import a Lookup file. A lookup file allows you to pair the anonymous looking product ID data with a more descriptive product name and description, along with other data. The Splunk tutorial explains how to do this clearly and in detail.
Creating a chart
To create a chart with the Search & Reporting tool a query needs to be entered in the search field. In the case of this data I used the example query listed within the Splunk tutorial as this is a query with at least a basic level of complexity, but with a time frame of all time as otherwise nothing was shown for me.
sourcetype=access_* status=200 | chart count AS views count(eval(action=”addtocart”)) AS addtocart count(eval(action=”purchase”)) AS purchases by productName | rename productName AS “Product Name”, views AS “Views”, addtocart AS “Adds to Cart”, purchases AS “Purchases”
Switch to the statistics tab to more easily see that this command counts the number of “addtocart” and “purchase” instances in the data that each product has, then also displays them with the number of views. Moving to the Visualization tab allows you to present this data in a more visual form; I chose the column chart format but try different charts if you think it may be presented in a better form. Splunk also has more in-depth data visualisation options, but I won’t go into them at this time.
Create a Dashboard
You can create dashboards within Splunk so that you have quick, visual access to data as fast as possible. For this I’ve again used the example query on the Splunk website, but with the time frame set as all time:
sourcetype=access_* status=200 action=purchase | top categoryId
With the query run move over to the Visualization tab and change the data so it’s shown as a pie chart and click Save As in the top right and choose “Dashboard”. You can fill in the fields as you choose, but I’ve included a screenshot of how I filled out the data. With this saved you have the option of returning to your Splunk homepage and adding this freshly saved Dashboard so its viewable straightaway after logging into Splunk.
My blogpost only goes over what Splunk can do at an incredibly basic level, what’s possible with Splunk is far greater than what I’ve outlined here and it is used everyday by a large number of companies to analyse the massive volumes of data they collect. For someone in a network security role, being able to sort through the large amount of log data collected in order to find what could be a significant threat to an organisation, tools such as Splunk are an excellent way of being able to respond to potential threats faster as reducing the chance of a benign event being blown up into something significant.