http://www.abbs.info e-mail:[email protected]

ISSN 0582-9879                                    ACTA BIOCHIMICA et BIOPHYSICA SINICA 2003, 35(1): 35-40                                    CN 31-1300/Q

A Soft Docking Algorithm for Predicting the Structures of Protein-protein Complexes

LI Chun-Hua, MA Xiao-Hui, CHEN Wei-Zu, WANG Cun-Xin*

( Center for Biomedical Engineering, Beijing Polytechnic University, Beijing 100022, China )

 

Abstract   An efficient soft docking algorithm is described to predict the mode of binding between two proteins based on the three-dimensional structures of molecules. The molecular model used in this work was grounded on the simplified protein model used in Janins docking algorithm. The side chain flexibility of the amino acid residues Arg, Lys, Asp, Glu and Met at the protein surface was considered through softening the molecular surface. A double filtering technique was used to eliminate most of the unlike binding modes. The energy minimization was performed on the retained structures, and then these structures were evaluated with the scoring function which included electrostatic, desolvation and van der Waals energy terms. The 26 complexes were used to test this docking algorithm and good results were obtained. The native-like conformations of all the complexes were all found, of which 20 were ranked in the top 10.

 

Key words     protein-protein interactions; molecular recognition; molecular flexibility; binding free energy; soft docking

 

Protein-protein interactions play an important role in many physiological processes such as signal transduction, cell regulation and the immune response. Tremendous experimental and computational efforts[1C5] are devoted to studying protein-protein association, with the goal of scientific and commercial breakthrough in drug discovery. Due to the difficulties in determining the structures of protein-protein complexes by X-ray crystallography or NMR spectroscopy, the docking method to predict protein-protein recognition has immense applications[6].

In general, the docking algorithm can be divided into three stages: searching, filtering and scoring. It is a hard problem to perform an entire conformation search, even neglecting the crucial effect of solvent, owing to the large number of atoms and degrees of freedom involved in the system. Fortunately, during protein-protein association, the large conformational changes are frequently confined to the protein surface, especially for the flexible amino acid side chains[7,8]. Currently, some techniques using a soft representation of the molecular surface[9C14] have been developed to tolerate a limited degree of molecular flexibility. Jiang et al.[9] have used a cube representation of the molecular surface and volume in the soft docking procedure. Ritchie et al.[10] have introduced a soft model of electrostatic complementarity into the algorithm. Afterwards, Palma et al.[11] have proposed a surface-implicit method to embody the softness of the molecular surface. In this paper, the surfaces of the flexible amino acid residues Arg, Lys, Asp, Glu and Met at the protein surface are soften in the molecular model, which solves the unbound docking problem in some degree.

A search procedure may produce millions of binding modes. It is necessary to drastically reduce the number of solutions at the filtering stage. So far, in most of the docking algorithms, the geometric complementarity of protein surfaces has been used as the filtering criterion to select the potential solutions. It is, however, generally recognized that the single criterion of the geometric complementarity is not sufficient to discriminate between correct and incorrect docked structures except for a very few cases[15]. In this article, besides the geometric complementarity, the residue pairing preferences[16] at the protein-protein interface are taken into account at the filtering stage. The double filtering technique can retain much more native-like structures compared with that based solely on the geometric complementarity.

The development of a scoring function which can reliably distinguish correct docked structures from incorrect ones is a challenging topic of current research. Many scoring functions have been proposed based on geometric complementarity[9,17] or electrostatic interaction[14,15,18]. In this work, the combination of the molecular potential energy and the solvation energy is used as the scoring function to rank the putative docked structures.

1    Materials and Methods

1.1   The selected test systems

The 26 protein-protein complexes used to test the docking algorithm were selected from Protein Data Bank and listed in Table 1. These complexes included enzyme-inhibitor, antibody-antigen and the other complexes. In order to test the ability of the program to handle the conformational changes that occur upon complex formation, three kinds of docking patterns were performed. For the first 6 cases marked with XX, the complexes were reconstructed from the bound structures of both receptors and ligands. This set of docking simulations was designed as BOUND in Table 1. In the following 7 cases marked with FX or XF, the structures of the complexes were predicted based on the unbound/bound or bound/unbound conformations of the two interacting proteins (PSEUDO UNBOUND). For the last 13 cases marked with FF, the unbound structures of both proteins were used in our docking simulations (UNBOUND). Here, F and X denoted the unbound and bound structures, respectively.

 

Table 1   The 26 protein-protein complexes used to test the docking algorithm

Casea

Description

Receptor

Number of residues

Ligand

Number of residues

BOUND

1CHOXX

- Chymotrypsin/Ovomucoid

1chob

245

1chob

53

2SICXX

Subtilisin/Streptomyces inhibitor

2sicb

275

2sicb

107

1ACBXX

- Chymotrypsin/Eglin C

1acbb

245

1acbb

63

2SNIXX

Subtilisin/Chymotrypsin inhibitor

2snib

275

2snib

64

2PTCXX

- Trypsin/Pancreatic trypsin inhibitor

2ptcb

223

2ptcb

58

1TECXX

Thermitase/Eglin c

1tecb

279

1tecb

70

PSEUDO CUNBOUND

1UDIFX

Virus Uracil-DNA glycosylase/inhibitor

1udh

228

1udib

83

1JHLXF

IgG1 Fv Fragment/Lysozyme

1jhlb

224

1ghl

129

1TABFX

Trypsin/BBI

3ptn

223

1tabb

36

1BRCFX

Trypsin/APPI

1bra

223

1brcb

56

1GLAXF

Glycerol kinase/GSF III

1glab

489

1f3g

150

1TGSFX

Trypsinogen/pancreatic trypsin inhibitor

1tgt

225

1tgsb

56

3HFLXF

Fab HyHel-5 ( l,h-chains)/lysozyme

3hflb

427

1lza

129

UNBOUND

1BRCFF

Trypsin/APPI

1bra

223

1aap

56

1FSSFF

Acetylcholinesterase/Fasciculin II

2ace

523

1fsc

61

1FQ1FF

CDK2/KAP

1b39

290

1fpz

178

1CSEFF

Subtilisin Carlsberg/Eglin C

1scd

274

1acb

63

1MAHFF

Mouse Acetylcholinesterase/inhibitor

1maa

536

1fsc

61

1MLCFF

Fab D44.1 (a,b-chains)/lysozyme

1mlb

432

1lza

129

2PTCFF

-Trypsin/pancreatic trypsin inhibitor

3ptn

223

4pti

58

1AHWFF

Antibody Fab 5G9/Tissue factor

1fgn

221

1boy

211

2KAIFF

Kallikrein A/pancreatic trypsin inhibitor

2pka

232

1bpi

57

1CHOFF

- Chymotrypsin/Ovomucoid

5cha

237

2ovo

53

1MDAFF

Methylamine dehydrogenase/Amicyanin

2bbk

470

1aan

103

1BRBFF

Trypsin/pancreatic trypsin inhibitor

1bra

223

1bpi

51

1CGIFF

- Chymotrypsinogen/pancreatic trypsin inhibitor

1chg

245

1hpt

56

a The PDB code of the complex. b The protein taken from the bound structure.

 

1.2   Treating molecular flexibility

Since amino acid residues Arg, Lys, Asp, Glu and Met at the protein surface present much larger flexibility than the other ones[19], these residues were specially treated in our molecular model which was based on the simplified protein model with one sphere per residue[1]. They were replaced with the spheres whose centers were at the C atoms of their side chains and their radii were all equal to 0.15 nm (less than that of the Janins original molecular model). Thus the molecular surface was softened to some extent.

1.3   Searching

The six rigid-body docking parameters that defined the position and orientation of one molecule relative to the other were five Euler rotation angles (1, 1, 2, 2 and ) and an intermolecular distance [1]. 1 and 1 located the center of the ligand relative to the receptor. 2 and 2 located the center of the receptor relative to the ligand. was a spin angle about the center line. The space of the five angles was systematically searched in steps of 10.0. The search ranges of 1 and 1 were limited to +/- 20 around the active site and the search ranges of the other angles were: 2 in +/-90, 2 and in +/-180. Therefore, about 2.3105 different binding modes were generated for each complex.

1.4   Filtering

Before filtering, those conformations with the interface area less than 5 nm2 were eliminated. Then a sub-population of 1000 binding modes was obtained from the remained structures by the double filtering technique.

In this work, we compared the effects of the double filter and the surface matching filter which was only based on the criterion of geometric complementarity scaled with the interface area. First, the top 1000 solutions were sorted by the descending interface area. Thus, a list containing 1000 solutions was founded. Then, for each of the rest, its value of the interface area was compared with that of the last one in the list. If its surface matching is worse than that of the last solution in the list, it will be discarded. Otherwise, it will be saved and inserted into the list by the interface area and the last one in the list will be eliminated.

The geometric surface complementarity may not be sufficient to reliably eliminate the unlike binding modes. In the tested cases, we have found that some native-like solutions have poor intermolecular surface contacts compared with some incorrect solutions. This can cause the native-like solution to be pushed down in the list and even excluded from the list of the retained solutions in some cases.

Therefore, the residue pairing preferences were introduced into the filtering criteria. A double filtering technique was implemented in our docking algorithm. In this procedure, the top 1000 solutions were still sorted by the descending interface area and each of the other solutions with a lower index of surface matching compared with the last element in the list was immediately discarded as above. However, the solutions with a higher index of surface matching were not automatically kept. They would be checked for the residue pairing preferences at protein-protein interfaces. Only those structures with more favorable residue pairing preferences as well as higher interface area were saved and inserted into the list by the interface area and the last element in the list was discarded.

After filtering, for the remained 1000 solutions, several binding modes with similar structure were replaced with an average conformation. This cluster analysis was similar to that used in Janins docking method[8].

1.5   Scoring

After 1000 steps of energy minimization using GROMACS package[20], all the retained structures were evaluated by the scoring function[Function (1)]:

Score=Eelec+Gdes(ACE)+Evdw                                                      (1)

where Eelec and Evdw denoted the changes in the electrostatic and van der Waals energies, respectively. They were calculated based on the GROMOS force field[21]. Gdes(ACE) was the desolvation free energy based upon the atomic contact energy (ACE)[22]. In the ACE model, the local interactions were given by , where eij denoted the atomic contact energy between atoms i and j and the sum was taken over all atom pairs that were less than 0.6 nm apart. In this work, in order to avoid the use of a sharp distance cutoff, we defined nij as a function [Function (2)] of the distance between the two atoms (rij):

 

                     (2)

 

Where ron =0.6 nm and roff=1 nm. According to this definition, a contact would be counted if the distance between the atoms i and j is less than 0.6 nm. When rij>0.6 nm, nij would become a fraction and gradually fall to zero as rij approaches 1 nm. When rij>1 nm, the atomic contact energy would be zero. Therefore, the atomic contact energy could be written as function (3):

 

                                                            (3)

 

2    Results and Discussion

2.1   Treatment of molecular flexibility and double filtering technique

In order to examine the effect of the molecular flexibility treatment in our molecular model, we compared the two kinds of docked structures obtained with our modified molecular model and the Janins original model with the experimental structure. Fig.1 shows one of the results obtained from the comparisons above for the complex 1brc. The docking was performed starting from the superimposed structures of the enzyme trypsin (1bra) and its inhibitor APPI (1 aap) upon the complex 1brc, but far apart to 20 nm. Actually, in the association of the two molecules, an obvious conformational change occurs on the Arg15 side chain of the inhibitor APPI, which can be found by comparing the bound and unbound structures of the inhibitor APPI. Comparing the two docked structures [Fig.1(B) and (C)] using the modified and original molecular models with the experimental structure [Fig.1(A)], we can see that the docking using the modified molecular model tolerates the appropriate overlap between Arg15 side chain of APPI and Trp215 of trypsin, but the major clash between them appears when the docking is performed with the Janins original molecular model. This means that our modified molecular model can allow the side chain flexibility of the surface residues reasonably.

 

Fig.1       The structures of the complex 1 brc

(A) The experimental structure. (B) The docked structure with the modified molecular model. (C) The docked structure with the Janins original molecular model.

 

Table 2 shows the results obtained from all the docking simulations. The columns A, B and C in Searching and Filtering list the numbers of the native-like structures retained after the searching and filtering stages. The numbers in the column A were obtained by docking with the original molecular model and filtering according to the criteria of the geometric matching. The column B presents the results of the docking simulations using the modified molecular model and the geometric matching filter. The column C gives the corresponding data for the docking simulations using the modified molecular model and the double filter. A docked conformation is taken as a native-like structure if its root mean square deviation (RMSD) of the backbone atoms (N, Ca, C, O) from the experimental structure is less than 0.4 nm.

 

Table 2   Docking results

 

Searching and Filtering+

Scoring*

Case

A

B

C

Rank

RMSD (nm)

BOUND

1CHOXX

74

62

97

3

0.13

2SICXX

11

11

8

4

0.18

1ACBXX

25

18

38

1

0.35

2SNIXX

21

15

27

1

0.26

2PTCXX

14

5

23

1

0.27

1TECXX

24

12

35

2

0.31

PSEUDOCUNBOUND

1UDIFX

29

53

67

6

0.20

1JHLXF

-

5

21

22

0.38

1TABFX

81

70

83

1

0.27

1BRCFX

7

19

24

1

0.36

1GLAXF

2

9

29

3

0.34

1TGSFX

22

16

52

6

0.23

3HFLXF

-

2

9

20

0.35

UNBOUND

1BRCFF

-

4

14

1

0.39

1FSSFF

41

66

74

3

0.32

1FQ1FF

-

10

12

41

0.32

1CSEFF

37

40

51

2

0.38

1MAHFF

69

80

79

1

0.06

1MLCFF

-

-

6

104

0.37

2PTCFF

15

6

34

4

0.13

1AHWFF

-

5

16

18

0.21

2KAIFF

9

18

30

1

0.23

1CHOFF

31

40

49

1

0.10

1MDAFF

-

7

7

4

0.25

1BRBFF

15

20

29

6

0.06

1CGIFF

37

49

44

15

0.38

+, The numbers of native-like structures generated and retained according to different molecular models and filtering methods. *, The highest ranking position of a native-like structure and the corresponding RMSD (unit in nm) relative to the experimental structure. C, indicates no native-like structures were found.

 

Compared with the results of column A in Table 2, the data in column B indicate that obvious improvement is obtained for the pseudo-unbound and unbound docking and drawbacks occur in the bound docking when the docking is performed with the modified molecular model. Much more native-like solutions appear in column B compared with that in column A for most cases of the pseudo-unbound and the unbound docking. Moreover, some native-like solutions are found for 1JHLXF, 3HFLXF, 1BRCFF, 1FQ1FF, 1AHWFF and 1MDAFF with the modified molecular model. However, no native-like solutions are captured for these cases with the original molecular model. When the modified molecular model and the double filter are used in the docking simulations, the effect of the searching and filtering is improved for the three kinds of docking patterns (see column C).

For the bound docking, because the interfaces of the two molecules from which the docking started already fit well, the treatment of the molecular flexibility is unnecessary. This can explain why no improvement is obtained for the bound docking simulations through softening the molecular surface.

Although the softening of the molecular surface might be important to capture native-like solutions in the unbound docking simulations, it also has the effect of reducing the difference in geometric complementarity between correct and incorrect solutions. Therefore, the geometric matching filter may not be sufficient to reliably eliminate the incorrect solutions. It is necessary to improve the filtering criterion of the geometric complementarity. The residue pairing preferences are the statistical results based on a nonredundant database of 621 protein-protein interfaces and describe well the physicochemical and structural preferences at protein-protein interfaces. Therefore, the introduction of the residue pairing preferences into the filtering criteria makes much more native-like solutions obtained for the three kinds of docking patterns.

2.2   Scoring putative complexes

Table 2 also lists the ranking positions (in Scoring) of the first native-like structures for all the 26 complexes and the corresponding RMSD relative to the experimental structures. The first native-like structures of 20 out of 26 complexes are ranked within top 10. For 1BRCFF, although there is a major clash between the Arg15 side chain of APPI and the Trp215 of trypsin, the native-like structure is still found and ranked first. Fig.2 shows the comparisons between the experimental structures of the complexes and the best-ranked native-like predictions of 1CSEFF and 2KAIFF reported in Table 2. It is clear that the binding sites are all satisfactorily identified. This indicates that the scoring function including electrostatic, desolvation and van der Waals energy terms is relatively successful in distinguishing correct binding modes from incorrect ones.

 

Fig.2       Superposition of the experimental structures of two protein-protein complexes and the best ranked native-like predictions reported in Table 2

(A) 1CSEFF; (B) 2KAIFF. The thick lines, Ca trace of the experimental structure; the thin line, Ca trace of the predicted structure.

 

3    Conclusions

It should be pointed out that the docking simulations in this paper are based on the assumption that the binding region on one of the two proteins is known. In the spherical polar coordinates used in this work, this information is given as a simple constraint in just one or two of the angular degrees of freedom. Execution time can be reduced to several minutes by applying these constraints before docking. Ritchie and Kemp have also used the same coordinates in their docking algorithm[10] and successfully predicted the structures of some protein-protein complexes. In their test, when the search ranges of two angle degrees of freedom are limited to +/- 30 around the active site, the first native-like structures of 7 out of 18 complexes are ranked the top 10[10]. In this paper, the first native-like conformations of 20 out of 26 tested complexes are ranked the top 10. This indicates that our algorithm captures some important factors in the protein-protein association and can provide useful help for the study of the molecular recognition.

In summary, our soft docking algorithm has some advantages: (1) the modified molecular model can improve the simulation result for the unbound protein-protein docking; (2) the double filtering technique can retain much more native-like structures and increase the successful probability of predicting the structures of protein-protein complexes; (3) the scoring function based on the binding free energy can effectively distinguish the correct structures from the incorrect ones. However, this method also has a few of shortcomings. For instance, the partial search of binding space is obviously of limitation for the docking simulations in which no information about the binding site is known. In addition, the desolvation free energy is not calculated accurately. The work for improving our docking algorithm is currently underway.

 

References

1     Cherfils J, Duquerroy S, Janin J. Protein-protein recognition analyzed by docking simulation. Proteins, 1991, 11: 271-280

2     Chothia C, Novotny J, Bruccoleri R, Karplus M. Domain association in immunoglobulin molecules: The packing of variable domains. J Mol Biol, 1985, 186: 651-663

3     Janin J, Chothia C. The structure of protein-protein recognition sites. J Biol Chem, 1990, 265(27): 16027-16030

4     Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl Acad Sci USA, 1996, 93(1): 13-20

5     Xie ZQ, Ding DF, Xu GJ. Delineation of continuous domain in proteins by differences of free energy. Acta Biochim Biophys Sin, 2001, 33(4): 386-394

6     Halperin I, Ma B, Wolfson H, Nussinov R. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins, 2002, 47 409-443

7     Lo Conte L, Chothia C, Janin J. The atomic structure of protein-protein recognition sites. J Mol Biol, 1999, 285: 2177-2198

8     Cherfils J, Janin J. Protein docking algorithms: Simulating molecular recognition. Curr Opin Struct Biol, 1993, 3: 265-269

9     Jiang F, Kim SH. Soft docking : Matching of molecular surface cubes. J Mol Biol, 1991, 219: 79-102

10    Ritchie DW, Kemp GJ. Protein docking using spherical polar Fourier correlations. Proteins, 2000, 39: 178-194

11   Palma PN, Krippahl L, Wampler JE, Moura JJ. Bigger: A new (soft) docking algorithm for predicting protein interactions. Proteins, 2000, 39: 372-384

12    Sandak B, Nussinov R, Wolfson HJ. An automated computer vision and robotics-based technique for 3-D flexible biomolecular docking and matching. Comput Appl Biosci, 1995, 11: 87-99

13    Vakser IA. Protein docking for low-resolution structures. Protein Eng, 1995, 8: 371-377

14Walls PH, Sternberg MJ. New algorithm to model protein-protein recognition based on surface complementarity. Applications to antibody-antigen docking. J Mol Biol, 1992, 228: 277-297

15    Shoichet BK, Kuntz ID. Protein docking and complementarity. J Mol Biol, 1991, 221: 327-346

16    Glaser F, Steinberg DM, Vakser IA, Ben Tal N. Residue frequencies and pairing preferences at protein-protein interfaces. Proteins, 2001, 43: 89-102

17    Lin SL, Nussinov R, Fischer D, wolfson HJ. Molecular surface representations by sparse critical points. Proteins, 1994, 18: 94-101

18    Gabb HA, Jackson RM, Sternberg MJ. Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol, 1997, 272: 106-120

19    Zhao S, Goodsell DS, Olson AJ. Analysis of a data set of paired uncomplexed protein structures: New metrics for side-chain flexibility and model evaluation. Proteins, 2001, 43: 271-279

20    van der Spoel D, van Buuren AR, Apol E, Meulenhoff PJ, Sijbers ALTM, Hess B, Feenstra KA et al. Biomolecular Simulation: The GROMACS User Manual. Groningen, Netherlands: Biomos, 1991

21    van Gunsteren WF, Billeter SR, Eising AA, Hnenberger PH, Krger R, Mark AE, Scott WRP et al. Biomolecular Simulation: The GROMOS96 Manual and User Guide, Zrich, Switzerland: Hochschulverlag AG an der ETH, 1996

22    Zhang C, Vasmatzis G, Cornette JL, DeLisi C. Determination of atomic desolvation energies from the structures of crystallized proteins. J Mol Biol, 1997, 267(3): 707-726

 


Received: May 20, 2002 Accepted: September 9, 2002
This work was supported by grants from the Natural Science Foundation of China (No.29992590-2, 30170230 and 10174005) and Beijing Natural Science Foundation(No.5032002)
*Corresponding author: Tel, 86-10-67392724; Fax, 86-10-67391738; e-mail, [email protected]