(c) Larry Ewing, Simon Budig, Garrett LeSage
с 1994 г.

Кафедра Информатики и Математического Обеспечения

ПетрГУ | ИМиИТ | О кафедре | Мобильные платформы | Лаборатория ИТС | Семинары НФИ/AMICT
Сотрудники | Выпускники | Учебный процесс | Табель-календарь | Курсовые и выпускные работы
Вычислительные ресурсы | Публикации | Архив новостей | Контактная информация

Joint analysis of Squid and Netflow log files using http client port information

Alexandr S. Volkov (Petrozavodsk State University, Russia),
Yury A. Bogoyavlensky (Petrozavodsk State University, Russia)

Recently, some authors pay attention to joint analysis of traffic data (log files) from different levels of Internet protocols. This kind of analysis allows to obtain new information about network and to raise the accuracy of traffic analysis algorithms due to additional information obtained from compound traffic data.

This report contains key features of algorithm for joint analysis of Netflow and access.log log file of proxy server Squid. The proposed approach for joint analysis provide next new features:

Using access.log log file and Netflow data, and also being aware of system clock's divergences at any moment, we can unambiguously find basic Netflows (Netflows corresponding to data transfer between a client and proxy server) corresponding to any access.log's items. Being founded on access.log data, we can obviously find all collateral Netflows which correspond to retrievals and loads of a requested object by proxy server. The number of generated Netflows is within the limits of $2$ and $2m+2n+6$ and depends on router's, local proxy server's and dns-server's settings, depends on network topology and on local and sibling proxy servers' caches' conditions ($m $ and $n$ is the numbers of icq- and dns-requests from local proxy and dns-servers to sibling proxies and official dns-servers correspondingly, which passing through the router).

There are some cases of data comparison then one Netflow corresponds to several access.log's records, and vice versa, several Netflows correspond to one record. The first one is possible because of using HTTP/1.1 protocol, which allow to request and load several objects trough one TCP connection. With all this going on this TCP connection is represented only by two Netflows. The second case, then some Netflows, for example then transmitting data from client to proxy (primary http-request), correspond to one access.log's record, is possible because of rules for expiring NetFlow cache entries.

Because of stable connections using HTTP/1.1 some of the data flows are presented only by couple of Netflows, and logically we can joint such access.log's records into more common units of traffic - web seance flows. Generally web seance flow is a number of access.log's records and Netflows corresponding to one full html page including all inner objects' requests and loads. Using such traffic units we can look differently at the web user's activity. Hereafter we plan to use even more common units for traffic except web seance flows - web session flows. Web seance flow is a number of web seance flows (for fixed client) that are very close to each over in time. There should be long pauses between such two groups of web seance flows, which actually enable us to divide the consecution of this web seance flows to web session flows. Logically, one web session flow presents user's activity of searching and processing some piece of information at that long pauses between two web session flows correspond to the process of pondering over founded (loaded) information (reading text for example).

The final aim of the work is raw Netflow and access.log's data transformation into mentioned units of web traffic, and subsequent logical analysis of obtained data.