Bioinformatics is a multidisciplinary discipline that creates methods and software tools for storing, extracting, organizing, and interpreting biological data. To analyze biological data, a combination of bioinformatics and computer science, statistics, physics, chemistry, mathematics, and engineering is useful. Currently, this method is growing rapidly because it is cheap and quite faster than experimental approaches. Computational biology tools such as protein modeling (e.g., Swiss Model, Easy Modeller, and Modeller), molecular dynamic simulation (e.g., Gromacs and Amber), and docking (e.g., Autodock version 4.2, AutodockVina, Swissdock, and Haddock) helped design substrate-based drugs to study the interaction between the target proteins (cancer cell proteins) and ligand (phytocomponents).
The aim of conducting this research is to initiate three-dimensional (3D) models of lung cancer line proteins (EGFR, K-ras oncogene, and TP53) and to guesstimate their binding affinities with curcumin, ellagic acid, and quercetin via docking approach.
1.2 Methodology
1.2.1 Sequence of Protein
The entire amino acid sequence of the EGFR (GI: 110002567), K-ras oncogene protein (GI: 186764), and TP53 (GI: 1233272225) were obtained from National Center for Biotechnology Information Center for Biotechnology Information (NCBI). Next, EGFR consists of 464 amino acids, K-ras oncogene protein contains 188 amino acids, while TP53 consists of 346 amino acids.
1.2.2 Homology Modeling
As of now, the 3D models of EGFR, K-ras oncogene protein, and TP53 are not available in Protein Data Bank (PDB). As a result, the models were started with Swiss Model [12] and then visualized with PyMol [13].
1.2.3 Physiochemical Characterization
The physical and chemical characters of the protein structures were analyzed by using the ExPASy ProtParam Proteomics tool [14]. Besides that, hydrophobic and hydrophilic were foreseen by ColorSeq analysis [15]. Furthermore, The ESBRI program [16] was used to reveal salt bridges in protein structures, while the Cys Rec program was used to count the number of disulfide bonds [17].
1.2.4 Determination of Secondary Models
The secondary structural properties were discovered using the Alignment Self-Optimized Prediction Process (SOPMA) [18].
1.2.5 Determination of Stability of Protein Structures
PROCHECK was used to test the protein models [19]. ProQ [20], ERRAT [21], and Verify3D programs were used to conduct further research [22].
1.2.6 Identification of Active Site
The 3D model of EGFR, K-ras oncogene protein, and TP53 were submitted to active site-prediction server [23] in order to discover their binding sites.
1.2.7 Preparation of Ligand Model
The tertiary structure of the quercetin, curcumin, and ellagic acid are not openly accessible. The whole sequence of quercetin, curcumin, and ellagic acid were attained from PubChem, National Center for Biotechnology Information (2017) [24].
1.2.8 Docking of Target Protein and Phytocompound
The 3D structure of EGFR was docked with quercetin, curcumin, and ellagic acid by using BSP-Slim server [25]. In addition, the best docking complex model was chosen based on the lowest binding score. The similar docking method was carried out between the other two protein models and phytocompounds (quercetin, curcumin, and ellagic acid).
1.3 Results and Discussion
1.3.1 Determination of Physiochemical Characters
The isoelectric point (pI) value quantified for EFGR (pI > 7) specify basic feature while the pI for k-ras and TP53 (pI < 7) exhibit acidic. Besides that, the molecular weight of EFGR, k-ras oncogene protein, and TP53 are 50,343.70, 21,470.62, and 38,532.60 Daltons, respectively. The extent of light being by absorbed by protein at a specific wavelength was used to calculate the extinction coefficient of TYR, TRP, and CYS residues where for EGFR is 38,305 M/cm, k-ras oncogene protein is 12,170M/cm, and TP53 is 43,025 M/cm. In addition, –R is the negatively (ASP + GLU) and +R is the positively charged (ARG + LYS) residues in the amino acid sequence. The total of –R and +R for each protein model was described in Table 1.1. According to the instability index of ExPASy ProtParam, EGFR proteins are classified as stable because the instability index for both proteins are less than 40 while K-ras oncogene protein and TP53 is categorized as unstable as the instability index is more than 40. The instability index for EGFR is 35.56, K-ras oncogene protein is 43.95, and TP53 is 80.17. On top of that, the very low grand average of hydropathicity (GRAVY) index (a (negative value GRAVY) of EGFR, K-ras oncogene protein, and TP53 denotes their hydrophilic nature (Table 1.1). Apart from that, EFGR, K-ras and TP53 have more polar residues (41.52%, 53.33%, and 45.29%) than non-polar residues (26.74%, 30.0%, and 27.35%) which were determined using Color Protein Sequence.
Table 1.1 Physiochemical characters of EGFR, K-ras, and TP53 proteins as determined by ExPASy ProtParam program.
Protein | Length | Molecular weight (kDa) | Pl | –R | +R | Extinction coefficient | Instability index | Aliphatic index | GRAVY |
EGFR | 460 | 50,343.70 | 7.10 | 49 | 49 | 38,305 | 35.56 | 72.91 | –0.269 |
KRAS | 180 | 21,470.62 | 8.18 | 29 | 31 | 12,170 | 43.95 | 77.18 | –0.559 |
TP53 | 340 | 38,532.60 | 5.64 | 41 | 33 | 43,025 | 80.17 | 63.99 | –0.592 |
Furthermore, the structure and function of the protein can be affected by salt bridges. Thus, salt bridge disruption minimizes the stability of protein [26]. Next, it is also associated with regulation, molecular recognition, oligomerization, flexibility, domain motions, and thermostability [27, 28].
The greater number of arginine in the protein model enhances the stability of the protein. This is happens through the electrostatic interactions between their guanidine