# UC-P - United Complex Centrality with Parameter Alpha

#### Definition

This method is a modified version of United Complex Centrality with parameter alpha. Considering that some protein complexes generated by computational methods are inaccurate the $UC$ method modified to tackle this problem.

$UC$ distinguishes the contributions of clustering coefficients of edges connecting the given protein and its neighbors by calculating the frequencies of its neighbors appeared in protein complexes. More neighbors, higher frequencies, larger edge clustering coefficients, all will result in higher value of $UC$. However, if $f_M$ is very large, then the effect of neighbors’ frequencies will be degraded, especially for the neighbors that only appeared in few protein complexes. Of course, if the known protein complexes are strictly correct, the frequencies of proteins appeared in protein complexes are also correct, then the value of $UC$ will be accurate. But, when the frequencies of proteins appeared in protein complexes are not so accurate, It needed to evaluate the effect of the neighbors’ frequencies in a more fuzzy way. Hence, $UC$ extend by providing a new choice for users to use a parameter αto evaluate the contribution of edge clustering coefficient. The $UC$ with the parameter $\alpha >0$ as $UC-P$ to distinguish it from $UC$ which does not use the parameter $\alpha$. If the input parameter $\alpha=0$, then $UC$ will be used, else $UC-P$ will be used.

$UC-P$ does not calculate the accurate frequency of each protein any more. $UC-P$ divides the proteins in the given $PPI$ network $G$ into three groups: proteins appeared in single complex $(f_j=1)$, proteins appeared in multiple complexes $(f_j>1)$, and those did not appear in any complexes $(f_j=0)$. For a given protein $i$, it is obvious that the neighbors with $f_j>1$ will contribute more to the calculation of $UC-P$ than those with $f_j=1$ when they have the same edge clustering coefficient to the given protein. And the neighbors with $f_j=1$ will contribute more than those with $f_j=0$.

Hence, for a given protein $i\in V$, its $UC-P(i)$ is defined:

$$UC-P(i)={\sum}_{f_i>1}ECC_{i,j}+ \alpha {\sum}_{f_j=1}ECC_{i,j}+ (1-\alpha) {\sum}_{f_j=0}ECC_{i,j}$$

where, edge clustering coefficient represents by $ECC_{i,j}$, $j$ represents the neighbor protein of the given protein $i$, The default value of $\alpha$ will set to $0.8$ according to the Pareto principle.

$UC-P$ developed to identify essential proteins with the consideration that $UC$ may have limitations when using an inaccurate set of predicted complexes.

When a set of accurate known protein complexes are available, the authors of this method suggest the users to choose $UC$. Else, $UC-P$ is recommended to be used for the identification of essential proteins with an accurate set of protein complexes.

$UC$ distinguishes the contributions of clustering coefficients of edges connecting the given protein and its neighbors by calculating the frequencies of its neighbors appeared in protein complexes. More neighbors, higher frequencies, larger edge clustering coefficients, all will result in higher value of $UC$. However, if $f_M$ is very large, then the effect of neighbors’ frequencies will be degraded, especially for the neighbors that only appeared in few protein complexes. Of course, if the known protein complexes are strictly correct, the frequencies of proteins appeared in protein complexes are also correct, then the value of $UC$ will be accurate. But, when the frequencies of proteins appeared in protein complexes are not so accurate, It needed to evaluate the effect of the neighbors’ frequencies in a more fuzzy way. Hence, $UC$ extend by providing a new choice for users to use a parameter αto evaluate the contribution of edge clustering coefficient. The $UC$ with the parameter $\alpha >0$ as $UC-P$ to distinguish it from $UC$ which does not use the parameter $\alpha$. If the input parameter $\alpha=0$, then $UC$ will be used, else $UC-P$ will be used.

$UC-P$ does not calculate the accurate frequency of each protein any more. $UC-P$ divides the proteins in the given $PPI$ network $G$ into three groups: proteins appeared in single complex $(f_j=1)$, proteins appeared in multiple complexes $(f_j>1)$, and those did not appear in any complexes $(f_j=0)$. For a given protein $i$, it is obvious that the neighbors with $f_j>1$ will contribute more to the calculation of $UC-P$ than those with $f_j=1$ when they have the same edge clustering coefficient to the given protein. And the neighbors with $f_j=1$ will contribute more than those with $f_j=0$.

Hence, for a given protein $i\in V$, its $UC-P(i)$ is defined:

$$UC-P(i)={\sum}_{f_i>1}ECC_{i,j}+ \alpha {\sum}_{f_j=1}ECC_{i,j}+ (1-\alpha) {\sum}_{f_j=0}ECC_{i,j}$$

where, edge clustering coefficient represents by $ECC_{i,j}$, $j$ represents the neighbor protein of the given protein $i$, The default value of $\alpha$ will set to $0.8$ according to the Pareto principle.

$UC-P$ developed to identify essential proteins with the consideration that $UC$ may have limitations when using an inaccurate set of predicted complexes.

When a set of accurate known protein complexes are available, the authors of this method suggest the users to choose $UC$. Else, $UC-P$ is recommended to be used for the identification of essential proteins with an accurate set of protein complexes.

#### References

- Li, M., Lu, Y., Niu, Z. and Wu, F.X., 2017. United Complex Centrality for Identification of Essential Proteins from PPI Networks. IEEE/ACM transactions on computational biology and bioinformatics, 14(2), pp.370-380. DOI: 10.1109/TCBB.2015.2394487