The first step in anomaly detection is to determine what is normal. The goal is to detect abnormal web traffic hitting the web server – whether it’s strange connection patterns or unexplained high traffic loads – there has to be an understanding and a common consensus of what is normal. This is where the network administrator can put the vast amounts of data in the IIS logs to work. By analyzing the IIS logs and creating a baseline, administrators will have benchmark figures for comparison when assessing a potential security event. For example, the IIS logs show 23456 Syncs at 4%. Is that a good figure?
To answer the question, it is impossible to say without context and a baseline comparison, so the first step an administrator must take when applying security is to determine what normal behavior is by creating a baseline.
Fortunately, the sheer volume of data in the IIS log can provide the basis for creating a benchmark. In order to parse and analyze the log, the administrator requires tools that can mine the data and reduce it to summary totals. The tool used in this article is Microsoft’s own product for parsing logs, Log Parser.
Log Parser uses universal query language to access text-base data from log files. An understanding of SQL is helpful but not required, as there are many sample scripts, which an administrator can tweak to fit his own environment. The scripts in this article are standard Log Parse scripts, which an administrator can use on any IIS log file.
The Log Parser syntax is
The command line above will analyze the log file ex1401*.log which is the address referenced log file for January 2014. This will provide initial data for the baseline however the larger the sample the more accurate the baseline. When creating a baseline for a web application, there are certain characteristics and criteria that are of interest at the web application layer. These are unique client IP, top client IP, user agent characteristics, IP to user agent characteristics and total number of requests. Therefore, the first parsing of the IIS log will be to extract that data by focusing on the number of hits per page.
The query to get the number of ‘URI (Universal Request Identifier) Hits’ is shown below, this is standard Log Parse script that the administrator can run as is without changing any parameters.
<URI HITS SCRIPT<
SELECT cs-uri-stem AS URI-Stem, cs-host AS HostName, COUNT(*) AS Hits INTO DATAGRID FROM %logfile%< GROUP BY cs-uri-stem, cs-host ORDER BY Hits DESC
This will return a list of web pages, hit count and client host name similar to this example
In order to understand the normal request patterns directed towards a site an administrator should consider the distribution of client to requests. The following query will extract that information.
CLIENT_IP to REQUSTS SCRIPT
SELECT c-ip AS ClientIP, cs-host AS HostName cs-uri-stem AS URIStem, sc-status AS Status, cs(User-Agent) AS UserAgent, count (*) as Requests INTO DATAGRID FROM %logfile% GROUP BY c-ip, cs-uri-stem, cs-host, cs(User-Agent), sc-status ORDER BY Requests DESC
This query selects or extracts the Client IP, Host Name, URI, Status, User Agent and outputs to the display grid.
An example of a typical result set from the query above would look something similar to this
The figures returned are useful in comparing what normal traffic looks like, and a significant deviation can indicate abnormal request patterns from individual clients. One way to get a birds-eye view of the ratio of unique IP / Total IP is to run this query
UNIQUE/TOTAL REQUEST SCRIPT
SELECT COUNT(DISTINCT c-ip) AS UniqueIPs, COUNT(ALL c-ip) AS TotalRequests INTO DATAGRID FROM %logfile%
What this script does is count only unique IPs, then it counts all IPs, which will provide a benchmark figure of unique client / total requests. A significant change in this pattern may indicate unusual request patterns from one or more clients. If the administrator does detect suspicious request patterns for one or a group of hosts then the client IP to Request script could be amended to include a reverse DNS look up to identify the domain from the IP address. This reverse DNS lookup is slow, so it is not advisable to run it against full logs.
By collecting and analyzing the IIS logs an administrator can build up a baseline and a good understanding what normal well behaved traffic looks like. However to get a picture of what are typical levels of poorly behaved requests the administrator must look to creating a baseline for the HTTP.sys error log. This is important because IIS will not show rejected requests, so if an attacker made 456 successful requests from IP 10.1.1.5 these would be in the IIS log. However if the attacker also made 209874 requests which were rejected then these would not be shown in the IIS log, but they would be recorded in the HTTP.sys error log. Similarly the URL Scan logs should be analyzed and be part of the baseline.
When building a baseline the administrator must collate over time a sufficient body of data to leverage the power of large numbers in leveling out discrepancies. The larger the sample, collected over many months, the more accurate the baseline will be. IIS logs provide a wealth of information by default (they can be configured to collect even more) and by utilizing Microsoft’s Log Parser an administrator can build an accurate baseline of what is normal.