# Content Centrality

#### Definition

It considers the feature vector of each node generated from its posting activities in social media, its own properties and so forth, in order to extract nodes who have neighbors with similar features. Assuming content distribution over a network, a novel centrality measure was developed called Content Centrality.

In a given, simple undirected network structure $G =(V,E)$ where $V$ and $E$ are the node set and link set, respectively, each node $u\in V$ has $J-dimensional$ content vector $X_u$. Let $d(v,u)$ be the shortest path length between nodes $u$ and $v$ and $d(u,v)=d(v,u),d(u,u)=0$. We deﬁne $u^,$s neighbor node set with distance $d$ as $\Gamma_d(u)=\{v:d(u,v)=d\}\subset V$.

Assuming content distribution, the very distant nodes naturally exert almost no inﬂuence. To express such eﬀects, we introduce two types of decay functions. The ﬁrst is an exponential decay function deﬁned by

$$\rho(d;\lambda)=exp(-\lambda d)$$,

where $\lambda$ is a parameter that controls the decay power. Another natural one is a power-law decay function deﬁned by

$$\rho(d;\lambda)=exp(-\lambda log d)$$.

Now, for each node, we calculate the resultant vector with the distance-based decay weight as follows:

$$y_u={\underset{d=1}{\overset{D_u}{\sum}}}\rho(d;\lambda){\underset{v\in \Gamma_d(u)}{\sum}} X_u={\underset{v\in V \{u\}}{\sum}}\rho(d(u,v);\lambda)X_u,$$

where $D_u=max_{v\in V} d(u,v)$. We call this the Resultant Vector with a Decay weight of node $u(RVwD)$. The $RVwD$ of node $u$ is appended to the vectors of the near nodes, including the directly connected nodes with strong weights and those of the distant nodes with weak weights. Therefore, the vector is somewhat smoothed.

In order to quantify the density expressing how many similar nodes exist near each node, we calculate the cosine similarity between each node and its neighbors, and deﬁne this value as content centrality score of each node:

$$CDC(u)={\langle X_u,Y_u\rangle}={\left\langle X_u,{{{\sum}_{d=1}^{D^u}}\rho(d;\lambda){{\sum}_{v\in \Gamma_d(u)}{X_v}}\over{||{{\sum}_{d=1}^{D^u}}\rho(d;\lambda){{\sum}_{v\in \Gamma_d(u)}{X_v}}||}}\right\rangle}$$

where original content vector $X_u$ is normalized as the $L2$ norm to $1$. When this value $CDC(u)$ exceeds the other nodes, node $u$ is a highly ranked node of content centrality, which means that similar contents are concentratedly distributed around it.

In a given, simple undirected network structure $G =(V,E)$ where $V$ and $E$ are the node set and link set, respectively, each node $u\in V$ has $J-dimensional$ content vector $X_u$. Let $d(v,u)$ be the shortest path length between nodes $u$ and $v$ and $d(u,v)=d(v,u),d(u,u)=0$. We deﬁne $u^,$s neighbor node set with distance $d$ as $\Gamma_d(u)=\{v:d(u,v)=d\}\subset V$.

Assuming content distribution, the very distant nodes naturally exert almost no inﬂuence. To express such eﬀects, we introduce two types of decay functions. The ﬁrst is an exponential decay function deﬁned by

$$\rho(d;\lambda)=exp(-\lambda d)$$,

where $\lambda$ is a parameter that controls the decay power. Another natural one is a power-law decay function deﬁned by

$$\rho(d;\lambda)=exp(-\lambda log d)$$.

Now, for each node, we calculate the resultant vector with the distance-based decay weight as follows:

$$y_u={\underset{d=1}{\overset{D_u}{\sum}}}\rho(d;\lambda){\underset{v\in \Gamma_d(u)}{\sum}} X_u={\underset{v\in V \{u\}}{\sum}}\rho(d(u,v);\lambda)X_u,$$

where $D_u=max_{v\in V} d(u,v)$. We call this the Resultant Vector with a Decay weight of node $u(RVwD)$. The $RVwD$ of node $u$ is appended to the vectors of the near nodes, including the directly connected nodes with strong weights and those of the distant nodes with weak weights. Therefore, the vector is somewhat smoothed.

In order to quantify the density expressing how many similar nodes exist near each node, we calculate the cosine similarity between each node and its neighbors, and deﬁne this value as content centrality score of each node:

$$CDC(u)={\langle X_u,Y_u\rangle}={\left\langle X_u,{{{\sum}_{d=1}^{D^u}}\rho(d;\lambda){{\sum}_{v\in \Gamma_d(u)}{X_v}}\over{||{{\sum}_{d=1}^{D^u}}\rho(d;\lambda){{\sum}_{v\in \Gamma_d(u)}{X_v}}||}}\right\rangle}$$

where original content vector $X_u$ is normalized as the $L2$ norm to $1$. When this value $CDC(u)$ exceeds the other nodes, node $u$ is a highly ranked node of content centrality, which means that similar contents are concentratedly distributed around it.