# Neighborhood Correlation Coefficient

#### Definition

Neighborhood_Correlation_Coefficient considers the number of neighbors shared by a node and its neighbors accounts for determining its influence. This method provieds, an improved cluster rank approach is presented that takes into account common hierarchy of nodes and their neighborhood set.

In a social network as a graph which represented by $G = (V, E)$, where users are $V=\{v_1,v_2 ,..., v_{|V|}\}$ and the relationships between them are $E=\{e_1, e_2,...,e_{|E|}\}$. Also, $N_i$ is used to represent the set of neighbors of node $v_i$. The number of neighbors of node $v_i$ determines its degree, and is shown by $d_i$.

Cluster rank method regards relationships between the neighbors of a node as a factor affecting its influence range. It has been shown that strong interactions among neighbors of a node often have negative impacts on its influentiality. In other words, if the neighbors are tightly interconnected, it might cause messages to be spread in small cycles of the graph, which might reduce the likelihood of reaching out large portion of the network. The extent of relationships between the neighbors of a node can be expressed with the notion of local clustering coefficient. The local clustering coefficient of node $v_i (LC_i)$ is calculated as:

$$LC_i={\underset{v_j\in N_i}{\sum}}{|N_i \cap N_j|\over d_i(d_i-1)}$$

Indeed, the local clustering coefficient examines the number of neighbors shared by a node and each of its neighbors. Neighborhood_Correlation_Coefficient aims to finding effective measures of correlations between a node and its neighbors by examining what the node has in common with its neighbors. In the proposed approach, referred to as Extended Cluster Coefficient Ranking Measure $(ECRM)$, it considers common hierarchy between a node and its neighbors rather than the number of their common neighbors. To this end, the network is first divided into different parts. Nodes that are removed in the same iterations $(IT)$ of the $k$-shell algorithm can be assumed to be in the same hierarchy. Inspired by the $k$-shell decomposition, the iteration $(IT)$ in which the nodes are removed determine. The obtained hierarchy is considered as a commonality between nodes.

After decomposing the network into different hierarchies, a shell vector is calculated for each node $v_i$ based on the hierarchy of its neighbors. The shell vector for node $v_i$ is defined as:

$$SV_i= \left\{ |N_i^{(1)}|,|N_i^{(2)}|,|N_i^{(3)}|,...,|N_i^{(f)}|\right\}$$

were $N_i^{(k)}$ indicates the number of neighbors of node $v_i$ belonging to hierarchy $k$, and $f$ is the maximum hierarchy in the network.

$ECRM$ is based on correlation between nodes, which is calculated using Pearson’s correlation coefficient between the shell vectors. In order to measure commonality between neighbors of two nodes, the proposed algorithm considers Pearson correlation between their shell vector, which is obtained as:

$$C_{i,j} ={{\sum_{k=1}^f (SV_i^{(k)}- \overline {SV}_i) (SV_j^{(k)}- \overline {SV}_j)} \over \sqrt{\sum_{k=1}^f (SV_i^{(k)}- \overline {SV}_i)^2} \sqrt{\sum_{k=1}^f (SV_j^{(k)}- \overline {SV}_j)^2}}$$

where $SV_j^{(k)}=|N_i^{(k)}|$ represents the value of the $k$th cell in vector $SV_i$, and $\overline {SV}_i$ indicates the mean value in the shell vector.

The shell clustering coefficient of node $v_i$ is calculated as:

$$SCC_i={\underset{v_j \in N_i}{\sum}} \left( (2-C_{ij}+(2 {d_j\over max(d)}+1)) \right)$$

The above equation has two parts. The correlation coefficient is included in the first part and the degree in the second. Higher correlation between node vi and each of its neighbors has negative effects on spreading ability of $v_i$. Thus, $(2 − C_{i,j})$ is used for calculation of the shell clustering coefficient of node $v_i$. In addition to the correlation between node vi and its neighbor $v_j$, the degree of the latter is also effective on its shell clustering coefficient.

In the proposed approach, the neighborhood rule is first applied for calculation of the Cluster Coefficient Ranking Measure $(CRM)$ for each node given the shell clustering coefficients of the neighbors. The following eq shows how this measure is calculated for node $v_i$. Along the same lines, $ECRM$ is obtained by summation of the $CRMs$ for the neighbors

$$CRM_i ={\underset{v_j \in N_i}{\sum}} SCC_j$$

$ECRM$ is calculated for node $v_i$ as follows:

$$ECRM_i ={\underset{v_j \in N_i}{\sum}} CRM_j$$

This method is based on local clustering coefficient and uses similarity of connections between neighboring nodes. The method is based on $k$-shell decomposition approach where influentiality of a node depends on how the node has shared connections with its neighbors.

In a social network as a graph which represented by $G = (V, E)$, where users are $V=\{v_1,v_2 ,..., v_{|V|}\}$ and the relationships between them are $E=\{e_1, e_2,...,e_{|E|}\}$. Also, $N_i$ is used to represent the set of neighbors of node $v_i$. The number of neighbors of node $v_i$ determines its degree, and is shown by $d_i$.

Cluster rank method regards relationships between the neighbors of a node as a factor affecting its influence range. It has been shown that strong interactions among neighbors of a node often have negative impacts on its influentiality. In other words, if the neighbors are tightly interconnected, it might cause messages to be spread in small cycles of the graph, which might reduce the likelihood of reaching out large portion of the network. The extent of relationships between the neighbors of a node can be expressed with the notion of local clustering coefficient. The local clustering coefficient of node $v_i (LC_i)$ is calculated as:

$$LC_i={\underset{v_j\in N_i}{\sum}}{|N_i \cap N_j|\over d_i(d_i-1)}$$

Indeed, the local clustering coefficient examines the number of neighbors shared by a node and each of its neighbors. Neighborhood_Correlation_Coefficient aims to finding effective measures of correlations between a node and its neighbors by examining what the node has in common with its neighbors. In the proposed approach, referred to as Extended Cluster Coefficient Ranking Measure $(ECRM)$, it considers common hierarchy between a node and its neighbors rather than the number of their common neighbors. To this end, the network is first divided into different parts. Nodes that are removed in the same iterations $(IT)$ of the $k$-shell algorithm can be assumed to be in the same hierarchy. Inspired by the $k$-shell decomposition, the iteration $(IT)$ in which the nodes are removed determine. The obtained hierarchy is considered as a commonality between nodes.

After decomposing the network into different hierarchies, a shell vector is calculated for each node $v_i$ based on the hierarchy of its neighbors. The shell vector for node $v_i$ is defined as:

$$SV_i= \left\{ |N_i^{(1)}|,|N_i^{(2)}|,|N_i^{(3)}|,...,|N_i^{(f)}|\right\}$$

were $N_i^{(k)}$ indicates the number of neighbors of node $v_i$ belonging to hierarchy $k$, and $f$ is the maximum hierarchy in the network.

$ECRM$ is based on correlation between nodes, which is calculated using Pearson’s correlation coefficient between the shell vectors. In order to measure commonality between neighbors of two nodes, the proposed algorithm considers Pearson correlation between their shell vector, which is obtained as:

$$C_{i,j} ={{\sum_{k=1}^f (SV_i^{(k)}- \overline {SV}_i) (SV_j^{(k)}- \overline {SV}_j)} \over \sqrt{\sum_{k=1}^f (SV_i^{(k)}- \overline {SV}_i)^2} \sqrt{\sum_{k=1}^f (SV_j^{(k)}- \overline {SV}_j)^2}}$$

where $SV_j^{(k)}=|N_i^{(k)}|$ represents the value of the $k$th cell in vector $SV_i$, and $\overline {SV}_i$ indicates the mean value in the shell vector.

The shell clustering coefficient of node $v_i$ is calculated as:

$$SCC_i={\underset{v_j \in N_i}{\sum}} \left( (2-C_{ij}+(2 {d_j\over max(d)}+1)) \right)$$

The above equation has two parts. The correlation coefficient is included in the first part and the degree in the second. Higher correlation between node vi and each of its neighbors has negative effects on spreading ability of $v_i$. Thus, $(2 − C_{i,j})$ is used for calculation of the shell clustering coefficient of node $v_i$. In addition to the correlation between node vi and its neighbor $v_j$, the degree of the latter is also effective on its shell clustering coefficient.

In the proposed approach, the neighborhood rule is first applied for calculation of the Cluster Coefficient Ranking Measure $(CRM)$ for each node given the shell clustering coefficients of the neighbors. The following eq shows how this measure is calculated for node $v_i$. Along the same lines, $ECRM$ is obtained by summation of the $CRMs$ for the neighbors

$$CRM_i ={\underset{v_j \in N_i}{\sum}} SCC_j$$

$ECRM$ is calculated for node $v_i$ as follows:

$$ECRM_i ={\underset{v_j \in N_i}{\sum}} CRM_j$$

This method is based on local clustering coefficient and uses similarity of connections between neighboring nodes. The method is based on $k$-shell decomposition approach where influentiality of a node depends on how the node has shared connections with its neighbors.

#### References

- Zareie A., Sheikhahmadi A., Jalili M., Fasaei M.S.K., 2020. Finding influential nodes in social networks based on neighborhood correlation coefficient. Knowledge-Based Systems, 194. DOI: 10.1016/j.knosys.2020.105580