# UC - United Complex Centrality

#### Definition

This is a new method to identify essential proteins by integrating protein complexes and topological features of $PPI$ networks.

Given a $PPI$ network $G(V, E)$ and a list of known protein complexes, then the united complex centrality $UC(i)$ for any protein $i\in V$ is defined as:

$$UC(i)={\sum}_{j\in N_i} \left({f_j+1\over f_M+1} \times ECC_{i,j}\right)$$

where $N_i$ is the set of neighbors of protein $i$, $f_j$ denotes the frequency of protein $j$ appeared in the known protein complexes and $f_M$ is the maximum frequency that a protein appeared in the known protein complexes. If a protein did not appear in any protein complex, then $f_j= 0$, and $ECC_{i,j}$ is edge clustering coefficient which calculates as follows:

$$ECC_{i,j}= {z_{i,j}\over min(k_i -1 , k_j -1)}$$

where, $z_{i,j}$ is the number of triangles that include the edge $e(i,j)$ actually in the network, $k_i$ and $k_j$ denote the degrees of node $i$ and node $j$, respectively, and $min(k_i- 1, k_j- 1)$ is the number of triangles in which the edge e(i,j) may possibly participate at most.

The logic behind this method comes from the findings that suggest proteins in complexes are more likely to be essential compared with the proteins not included in any complexes and the proteins appeared in multiple complexes are more inclined to be essential compared to those only appeared in a single complex.

Given a $PPI$ network $G(V, E)$ and a list of known protein complexes, then the united complex centrality $UC(i)$ for any protein $i\in V$ is defined as:

$$UC(i)={\sum}_{j\in N_i} \left({f_j+1\over f_M+1} \times ECC_{i,j}\right)$$

where $N_i$ is the set of neighbors of protein $i$, $f_j$ denotes the frequency of protein $j$ appeared in the known protein complexes and $f_M$ is the maximum frequency that a protein appeared in the known protein complexes. If a protein did not appear in any protein complex, then $f_j= 0$, and $ECC_{i,j}$ is edge clustering coefficient which calculates as follows:

$$ECC_{i,j}= {z_{i,j}\over min(k_i -1 , k_j -1)}$$

where, $z_{i,j}$ is the number of triangles that include the edge $e(i,j)$ actually in the network, $k_i$ and $k_j$ denote the degrees of node $i$ and node $j$, respectively, and $min(k_i- 1, k_j- 1)$ is the number of triangles in which the edge e(i,j) may possibly participate at most.

The logic behind this method comes from the findings that suggest proteins in complexes are more likely to be essential compared with the proteins not included in any complexes and the proteins appeared in multiple complexes are more inclined to be essential compared to those only appeared in a single complex.

#### References

- Li, M., Lu, Y., Niu, Z. and Wu, F.X., 2017. United Complex Centrality for Identification of Essential Proteins from PPI Networks. IEEE/ACM transactions on computational biology and bioinformatics, 14(2), pp.370-380. DOI: 10.1109/TCBB.2015.2394487