(c) Larry Ewing, Simon Budig, Garrett LeSage
Ó 1994 Ç.

Department of Computer Science

PetrSU | Software projects | AMICT | Staff | News archive | Contact | Search

Web Classification Based on Internet Directory Categories

Ekaterina Nerman, Alexandr Borodin (Petrozavodsk State University, Russia)

Main purpose of web classification is to assign web pages to one or more predefined categories according to their content. This task arises in many areas such as context advertisement, web spam recognition, automatic document annotation and so on. Mainly, classification is based on the clustering hypothesis, which states that documents having similar contents are also relevant to the same category. Nevertheless, to accomplish classification in unsupervised manner the system must have an initial clissification set of keywords assigned with categories.

We propose the architecture of unsupervised web classification system, which builds initial classification set based on web directory analysis. We also provide system prototype based on Yandex catalogue analysis.