HITS - Hypertext Induced Topic Selection


Definition

The HITS algorithm measures the “hubness” and the “authority” of each node in the graph. “Hubness” of a node is a defined in terms of the authorities of the outgoing nodes and “authority” is defined in terms of the hubness of the incoming nodes. Therefore, a node has a high hubness value if it has many outgoing edges leading to nodes with high authority values and a node has a high authority value if the hubness values of its incoming nodes are high. Hubness of a node depends not only on the number of nodes that it points, but also on the authority of those nodes. A high authority value shows that the node is an important because several other highly connected nodes are pointing to it. A high hubness value shows that the node is important because it has connections to many other high authority nodes in the graph. The Hub score and Authority score for a node is calculated using the following algorithm:
  1. Start with each node having a hub score and authority score of 1.
  2. Run the Authority Update Rule - Update each node's Authority score to be equal to the sum of the Hub Scores of each node that points to it.
  3. Run the Hub Update Rule - Update each node's Hub Score to be equal to the sum of the Authority Scores of each node that it points to.
  4. Normalize values by dividing each Hub score by the sum of squares of all Hub scores, and dividing each Authority score by the sum of the squares of all Authority scores.
  5. Repeat from the second step as necessary
HITS is somewhat different from random walk/eigenvector-based algorithms such as PageRank in that:
  1. there are two mutually recursive scores being calculated, rather than a single value
  2. the edge weights are effectively all 1, i.e., they can't be interpreted as transition probabilities. This means that the more inlinks and outlinks that a vertex has, the better, since adding an inlink (or outlink) does not dilute the influence of the other inlinks (or outlinks) as in random walk-based algorithms.
  3. the scores cannot be interpreted as posterior probabilities (due to the different normalization)


See: HITS Based Page Rank

Requirements

Require connected and loop free network.

References

  • HITS algorithm. (2014, July 17). In Wikipedia, The Free Encyclopedia. Retrieved 13:30, August 11, 2014, from http://en.wikipedia.org/w/index.php?title=HITS_algorithm&oldid=617343038
  • KLEINBERG, J. M. 1999. Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46, 604-632.