We had a need in our environment for Splunk to be able to do lookups against some open source threat feeds. Basically, we take a data source like firewall logs and then see if any of the computers in the environment has tried to talk to a known ‘bad’ IP or URL on these threat feeds.
First off, let’s start with the feeds. What is a threat feed? A threat feed is a service that security software companies or volunteers provide. A threat feed is basically a list of URLs or IP addresses that are hosting malware or a Command & Control server or they serve some other nefarious purpose. There are many feeds available, some freely available, some commercial.
As explained in my previous post, the threat feeds live on the internet, and our Splunk environment is on a separate network that doesn’t have internet access. Hence the reason for sneakernetting files across.
These threat feeds were mainly to be used for enriching our client data, potentially identifying large botnet infections in certain network segments.
So here’s the problem. How do I get these threat feeds downloaded from the internet and then updated on our Splunk environment.
Of course, Google is your friend.
I ended up finding this blog post, which describes exactly what I need! This turned out to be an excellent starting point for me and I suggest you go read it.
There was just one problem.. The author keeps his threat feed updates stored in a seperate Splunk index and then looks up data using subsearches.
This would mean indexing new data everyday. Indexing means your Splunk license usage increases. Which could be a potential problem.
(For those who don’t know, Splunk charges a license fee based on the amount of data you index on a daily basis. Also, in South Africa, we are charged in US dollars for the license… kind of expensive when you look at the exchange rates.)
So I needed to store my threat feeds somewhere else, as well as have Splunk lookup some data against the data stored in the threat feeds.
Splunk has an excellent app called Splunk DB Connect which basically gives Splunk the functionality to import data from a database as well as do lookups in database fields. Perfect.
So once I have sneakernetted the threat feeds across, I have a bash script on the other end that processes the files and prepares CSV files for importing into one of two MySQL tables in the same database. I’ve differentiated between IP based lookups and URL based lookups. Some of our data sources that feed into Splunk report on IPs. Others on URLs. I’ll be showing you the IP feeds example.
Here’s the table structure I’m using:
CREATE TABLE ipfeeds ( ip VARCHAR(18), feedname VARCHAR(25), extrainfo VARCHAR(25), INDEX USING BTREE (ip) );
I’m not a MySQL optimisation expert, but I did see significant lookup speed performance gains when I indexed the ip field using BTREE. Also note that this is not a unique index, as there is some overlap of IP addresses being reported by the different feeds. Some feeds also provide some extra information (possibly the category of an IP, or the severity of the malware hosted), which I store in the extrainfo field. This just enriches my data from Splunk a bit.
A snippet of a CSV file that I generated from processing the feed downloads, looks like this:
18.104.22.168,AlienVault Reputation Snort,Scanning Host 22.214.171.124,AlienVault Reputation Snort,Scanning Host 126.96.36.199,AlienVault Reputation Snort,C&C 188.8.131.52,Zeus C&C Tracker Blocklist
The CSV file contains no field headers and the order of the field data correspond with the order of the fields in the database table.
I first truncate the table to clear it out and then run mysqlimport to pull the data into the table:
mysqlimport --local --fields-terminated-by=, --user=User --password -h localhost tablename csvfile
Right, so that updates my ipfeeds table. Now for the Splunk part.
I’m not going to explain how to get Splunk DB Connect up and running. For more information on that, head on over here.
Once you’ve got it up and running though, you need to create a database connection. Under the Splunk DB Connect app, navigate to Settings and then Manage Database Connections. Here’s what my setup looks like:
Once your connection has been setup, you need to setup a lookup. Navigate to Settings, then Manage Database lookups. My lookup looks like this:
You need to set the correct Splunk permissions for the connection and the lookups table to restrict it to the users that need to be able to do the lookups.
How do we now use this lookup with a Splunk search? Here’s a simple Splunk search:
sourcetype="firewall" | rename dest_ip AS ip | lookup local=1 ipfeeds ip OUTPUT feedname, extrainfo
This would add the fields feedname and extrainfo on every event that had a matching IP address that existed in the database. You need to rename the field you want to lookup to be the same as the fieldname in the database, in this case ip.
Let’s pull some statistics on those fields. Maybe we want to know which host on your network is connecting to these known bad IPs the most:
sourcetype="firewall" | rename dest_ip AS ip | lookup local=1 ipfeeds ip OUTPUT feedname, extrainfo | stats count by src_ip, feedname, extrainfo | sort -count
That’s pretty much it. Whenever there is a hit on an IP in the database, Splunk will show you.
This can be a pretty powerful tool when trying to identify what your machines on your network might be doing. You can also adjust firewall rules based on the results of these searches. Maybe your hosts are connecting to a known bad IP on a commonly allowed port like port 80 (HTTP) and you actually need to drop traffic going to these known bad hosts based on the IP address, not the destination port.
Hopefully this post will help someone else achieve what I did.