Web Classification Based on Internet Directory Categories

Ekaterina Nerman, Alexandr Borodin (Petrozavodsk State University, Russia)

Main purpose of web classification is to assign web pages to one or more predefined categories according to their content. This task arises in many areas such as context advertisement, web spam recognition, automatic document annotation and so on. Mainly, classification is based on the clustering hypothesis, which states that documents having similar contents are also relevant to the same category. Nevertheless, to accomplish classification in unsupervised manner the system must have an initial clissification set of keywords assigned with categories.

We propose the architecture of unsupervised web classification system, which builds initial classification set based on web directory analysis. We also provide system prototype based on Yandex catalogue analysis.

Department of Computer Science

Web Classification Based on Internet Directory Categories