Finding Representatives in a Heterogeneous Network

Laura A. Langohr (University of Helsinki, Finland)

Large networks can be found in different areas, e.g. in social sciences, information science, or biology, where vertices represent objects and edges connections between them. We want to find few representative vertices within a given set of vertices. Therefore, k-medoids, a clustering method, is applied on a weighted graph. K-medoids uses information given by a network in order to cluster a given set of vertices. For each cluster a vertex is then suggested as a representative. In citation networks vertices represent articles and edges citations between them. Finding representatives could be useful, e.g. for a researcher starting to work in a new area. He could get an overview of this area by reading few representative articles.

The Biomine system ( http://biomine.cs.helsinki.fi/search/) provides a directed, labeled, and weighted graph, where vertices represent biological entities and edges connections between them. The weight of an edge states how likely it is that the edge exist. The probability of a path is then the product of the probabilities of the edges along the path. The best path probability again is the probability of a path between two objects, such that no path between these two objects has a higher probability.

We applied k-medoids to the Biomine system to automatically find representative genes, thereby using the best path probability as pairwise similarity measure. An automatic selection of representatives would be advantageous, e.g. if a biologist obtained a large set of biological entities from first studies, and has to select few biological entities for further research. An exemplary application and its result are shown.

Department of Computer Science

Finding Representatives in a Heterogeneous Network