GB Virus C/Hepatitis G Virus Envelope Glycoprotein E2: Computational Molecular Features and Immunoinformatics Study

Introduction: GB virus C (GBV-C) or hepatitis G virus (HGV) is an enveloped, RNA positive-stranded flavivirus-like particle. E2 envelope protein of GBV-C plays an important role in virus entry into the cytosol, genotyping and as a marker for diagnosing GBV-C infections. Also, there is discussion on relations between E2 protein and gp41 protein of HIV. The purposes of our study are to multi aspect molecular evaluation of GB virus C E2 protein from its characteristics, mutations, structures and antigenicity which would help to new directions for future researches. Evidence Acquisition: Briefly, steps followed here were; retrieving reference sequences of E2 protein, entropy plot evaluation for finding the mutational /conservative regions, analyzing potential Glycosylation, Phosphorylation and Palmitoylation sites, prediction of primary, secondary and tertiary structures, then amino acid distributions and transmembrane topology, prediction of T and B cell epitopes, and finally visualization of epitopes and variations regions in 3D structure. Results: Based on the entropy plot, 3 hypervariable regions (HVR) observed along E2 protein located in residues 133-135, 256-260 and 279-281. Analyzing primary structure of protein sequence revealed basic nature, instability, and low hydrophilicity of this protein. Transmembrane topology prediction showed that residues 257-270 presented outside, while residues 234- 256 and 271-293 were transmembrane regions. Just one N-glycosylation site, 5 potential phosphorylated peptides and two palmitoylation were found. Secondary structure revealed that this protein has 6 α-helix, 12 β-strand 17 Coil structures. Prediction of T-cell epitopes based on HLA-A*02:01 showed that epitope NH3-LLLDFVFVL-COOH is the best antigen icepitope. Comparative analysis for consensus B-cell epitopes regarding transmembrane topology, based on physico-chemical and machine learning approaches revealed that residue 231- 296 (NH2- EARLVPLILLLLWWWVNQLAVLGLPAVEAAVAGEVFAGPALSWCLGLPVVSMILGLANLVLYFRWL-COOH) is most effective and probable B cell epitope for E2 protein. Conclusions: The comprehensive analysis of a protein with important roles has never been easy, and in case of E2 envelope glycoprotein of HGV, there is no much data on its molecular and immunological features, clinical significance and its pathogenic potential in hepatitis or any other GBV-C related diseases. So, results of the present study may explain some structural, physiological and immunological functions of this protein in GBV-C, as well as designing new diagnostic kits and besides, help to better understandingE2 protein characteristic and other members of Flavivirus family, especially HCV.


Introduction
In 1995 and 1996, different isolates of the same new enveloped, RNA positive-stranded flavivirus-like particles with a genomic size of about 9.3 Kb, were isolated by two independent research groups, which named GB virus C (GBV-C) and hepatitis G virus (HGV), respectively. This RNA contains an open reading frame (ORF) which encodes polyprotein with about 2900 amino acids length. By viral/host proteases the polyprotein of GB virus C is cleaved into structural proteins (include; Core, E1 and E2) and nonstructural proteins (include; NS2, NS3, NS4, NS5a and NS5b) (1,2). Until now, 6 genotypes were reported in different geographical regions of the world (3). This virus could transmit parentally through different routes (1,4) and is common in some parts of the world such as Iran (5). Overview of HGV infection in Iranian different population revealed that HGV coinfection is highly prevalent among patients and blood donors infected with HIV or HCV, and negative HIV, HCV and HBV populations are a low risk group for HGV infection. There is intermediate frequency among patients on hemodialysis, and those with thalassemia, IVDUs, and leukemia (5,6). Occupational infection offers the lowest rates, and does not need to monitor blood donors before transfusion (5).
There are evidences on reducing HCV-related liver morbidity associated with GB virus C (GBV-C) and inhibitory effect of GB virus C on HCV/HIV viremia, survival, a lower mortality rate, slower disease progression in patients with coinfection and also, GBV-C could play role as a predictor for hospital acquired infection (7,8). Interferonalpha treatment caused a marked but usually transient reduction in serum GBV-C/HGV RNA, and ribavirin had, at most, a modest antiviral effect (9).
E2 envelope protein of GB virus C plays role in virus entry into the cytosol, genotyping (10), the ideal targets for vaccine development, and a marker to diagnose GBV-C infections (11), and besides, the concomitance between E2 protein and gp41 protein of HIV-1 affects protein folding and whether it forms a non active complex with gp41-FP. In primates (Chimpanzees model in HCV) it has been reported that purified recombinant envelope glycoproteins (E1 and E2) had potential to protect against challenge with homologous virus, therefore these proteins are the ideal targets for vaccine development (11).
Nowadays, viral-related bioinformatics analysis tools are powerful approaches to predict molecular features such as similarity, glycosylation/phosphorylation/ Palmitoylation sites, epitope recognition, protein primary secondary/ tertiary structures of proteins encoded in viral genomes (12).
One of the branches of bioinformatics is Immunoinformatics or computational immunology which has emerged recently as an important field in the analysis, immune function modeling and prediction of both B and T cell epitopes, novel vaccines designing and allergenicity analysis (13,14).
Glycoprotein glycosylation characteristics are known to be in association with changes of virulence, cellular tropism in enzymes, and survival of viruses (15). Palmitoylation is an important lipid modification (16), which enhances the protein surface hydrophobicity, membrane affinity and aggregation, modulating proteins' membrane trafficking, stability, and cell signaling (17,18).Protein phosphorylation has role in regulating physiological functions of virus proteins in replication and as-sembly processes (19).
Different structure prediction approaches with different reliability simplify the discovery process in biology, and provide a structural framework for new hypotheses. They were also continuously developed and evaluated (20,21). Understandings of a protein structure provide deep insight into its interaction with other proteins and small molecules. On the other hand, protein interactions define the protein function, and its biological role in an organism. So, protein structures and structural features prediction is a fundamental area of computational biology (22). To date, there is no data on computational molecular features and Immunoinformatics study of GB virus C E2 protein; although, there are a lot of reports about HCV E2 protein analysis (23)(24)(25)(26)(27)(28).
The purposes of our study are to multi aspect molecular evaluation of GB virus C E2 protein from its characteristics, mutations, structures and antigenicity. These valuable information would help to new directions for future research such as designing diagnostic kits and help to better understanding similarities or differences of biological features of GB C virus in comparison with other members of the Flavivirus family, especially Hepatitis C virus (HCV). The interplay between experimental and computational biology has enormous benefits and providing invaluable Information in many different areas of the sciences.

Retrieving Reference Sequences of E2 Protein
Complete putative E2 (Accession number (AC) NP_803203) of GB virus C/Hepatitis G virus mentioned as a reference sequence in National Center for Biotechnology Information (NCBI) Databases (http://ncbi.nlm.nih. gov/) was retrieved. In bioinformatics analyzing a reference sequence (RefSeq) is mostly preferred causes that well annotated and nucleotide sequence (DNA, RNA) and its protein products are available and reliable.

Mutational/Conservative Regions
We retrieved 100 sequences of E2 protein of GB virus C from NCBI by direct searching. Obtained sequences were aligned, analyzed and trimmed in Bioedit 7.7.9 software. Subsequently, short sequences and areas with ambiguous alignment were excluded. Then, Entropy values (Hx) were measured. This analysis measures variation at each amino acid position in the set of aligned sequences. Results are shown in Figure 1

Analyzing Primary Structure of E2 Protein, Amino Acid Distributions, and Transmembrane Topology
The primary protein structure of E2 (e.g. length, Molecular weight (Mw), Isoelectric point (pI) and amino acid distribution) was arranged in Table 1 by utilizing Expasy tools (http://web.expasy.org/protparam/). For amino acid distribution evaluation we used lrrfinder server (http:// www.lrrfinder.com/lrrfinder.php). Finally, transmembrane topology of E2 protein was checked by using TM-HMM server ( 29 ).

Analysis of N-glycosylation, Potential Phosphorylation and Palmitoylation Sites
We used NetNGlyc 1.0 server (http://www.cbs.dtu.dk/services/NetNGlyc) and NetPhos 2.0 server (http://www.cbs. dtu.dk/services/NetPhos.) to predict N-Glycosylation and Phosphorylation sites in E2 protein. These two servers are both taking advantage of artificial neural networks (ANN) for this prediction. NetNGlyc 1.0 server examines the sequence context of Asn-Xaa-Ser/Thr sequences and the NetPhos 2.0 server predicts serine, threonine and tyrosine phosphorylation sites. Palmitoylation sites were predicted with the medium threshold frequency by using services at http://csspalm.biocuckoo.org/prediction. php, in particular CSS-Palm 2.0 software.

Prediction of Secondary Structure of E2 Protein
The secondary structure of the protein was evaluated by using bioinformatics tools available on the website; http://npsa-pbil.ibcp.fr. The method of GOR4 (http://npsapbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=npsa_ gor4. html) was used to identify the alpha helices, beta strands, and coil residues.

Prediction of Tertiary Structure of E2 Protein
As we could not find any matches in SWISS-PROT for E2 to analyze functional and structural motifs, we used SCRATCH suite (http://www.igb.uci.edu/) combines machine learning methods, evolutionary information, fragment libraries and energy functions to predict protein structural features and tertiary structures. The 3D model is visualized by the Swiss-Pdb Viewer software.

Prediction of T-cell Epitopes
IEDB (Immuno Epitope Database) server website (http:// tools.immuneepitope.org/mhci/) provides access to predictions of peptide binding to MHC class I molecules.
It estimates IC50 values for peptides binding to specific MHC molecules. List box for selecting the prediction method allows to use different MHC class I binding prediction methods such as Artificial Neural Networks (ANN), Stabilized Matrix Method (SMM), SMM with a Peptide MHC Binding Energy Covariance matrix (SMMPM-BEC), Scoring Matrices derived from Combinatorial Peptide Libraries (Comblib_Sidney2008), Consensus method (e.g. ANN, SMM, and CombLib), and NetMHCpan.
HLA-A*0201 is the most frequent allele and also the first human HLA allele for which peptide binding prediction was developed (30). Therefore, predictions of epitopes were checked for this allele.

Prediction of Linear B-cell Epitope Based on Physico-Chemical Profiles
E2 protein antigenicity prediction was checked based on hydrophobicity, assessment of solvent accessibility regions, flexibility, secondary structure (Beta-Turn prediction), and Kolaskar and Tongaonkar method (31). Kolaskar and Tongaonkar prediction method needs more attention, as is based on a semi empirical approach, developed on physic-chemical properties of amino acid residues (i.e. hydrophilicity, accessibility and flexibility). This approach has the efficiency to detect antigenic pep-tides with about 75% accuracy. To achieve these goals we exploit Bcepred server (32). The accuracy of prediction in this server models varies from 52.92% to 57.53% based on various properties. The highest accuracy obtained for this server was 58.70% at threshold 2.38 when it combined four amino acid profiles (hydrophilicity, flexibility, polarity and exposed surface).

B-cell Epitope Prediction by Machine Learning Approaches
Several methods using machine learning approaches have been introduced. The hybrid method applied in this study is composed of hidden Markov model, Feed forward and recurrent neural network, subsequence kernel based SVM and SVM which are used in BepiPred (33), AB-CPred (34), BCPred (35) and ABCPred, respectively.

Comparative Analysis of Consensus Epitope for B-cell, Visualization of Epitopes and Variations in 3D Structure
Finally, we compared all the analyses mentioned above to interpret unique molecular features and Immunoinformatics of this protein. Also, the predicted B-cell epitopes were evaluated whether they were present in outer transmembrane regions, using TMHMM results. Epitopes exposed on the surface of the membrane were selected and subjected to further analysis. Moreover, variations represented in entropy plot were checked in 3D model.

Homology Models Validation
The quality evaluation of the modeled structure is an essential step in homology modeling. The geometric estimation of the modeled 3D structure (tertiary structure) was performed using the Ramachandran plot (http:// mordred.bioc.cam.ac.uk/~rapper/rampage.php). Ramachandran plots is The two-dimensional (2D) scatter plots of φ, ψ (or torsional angles) which tests whether the model structure is stereo-chemically stable and the number of outliers (36). The plot included three regions; the favored, allowed and outlier regions.

Entropy Plot for Finding the Mutational and Conservative Sites
Based on the entropy plot, 3 hyper variable regions (HVR) observed along E2 protein that located in residues 133-135, 256-260 and 279-281. HVR are regions in sequence with highest variation in different isolates of virus. Besides, highest conservation observed at amino acids 152-168 and 183-248. Residue 256-260 is located in outer membrane region of E2 protein (see 4.2.), and this variability may help GB virus C to escape immune response of its host.

Analyzing Primary Structure, Amino Acid Distribution of E2 Protein and Transmembrane Topology
Summarized obtained data from Expasy ProtParam tool presented in Table 1.
An average length of protein sequence and molecular weight of constructs were mentioned in the Table 1. Isoelectric point (pI) is the pH point in which the protein surface is covered with charge, but net charge of protein is zero. Isoelectric point (pI) is important to estimate solubility, and the mobility in an electric field is zero. The calculated isoelectric points (pI) were 8.69 for this protein. The computed value more than 7 indicates that the E2 protein has basic nature. The instability index provides the estimation of the stability of protein in in-vitro.
This protein is classified as unstable regarding instability index. The high aliphatic index (100.58) reflects that E2 protein is stable for a variety of temperature ranges. The Grand Average Hydropathicity (GRAVY) values had positive results (0.333), which indicates the low hydrophilicity of protein and low interaction of the protein with surrounding water molecules.
In physicochemical analysis, it was revealed that the most abundant amino acid residues were glutamic and glycine.
Distribution of amino acid frequency in E2 protein showed that hydrophobic residues are more frequent than hydrophilic residues, and also, negative R-group to positive R-group ( Figure 2). So, most part of this protein is hydrophobic and locates in membrane.  Figure 2 shows that hydrophobic residues are significantly more frequent, as it reflects the hydrophobicity nature of most parts of E2 protein.
Analysis of transmembrane topology using the TM-HMM online server found that residues 257-270 presented outside while residues 234-256 and 271-293 were transmembrane region, and residues 1-233 and 294-312 were inside the core region of the protein (Figure 3). Also, this analysis would help to select efficient and effective B-cell epitopes.

Analyzing Potential Glycosylation, Phosphorylation and Palmitoylation Sites
Just one N-glycosylation site (residue 73) was found in E2 protein of GB virus C (Figure 4 and Table 2). Potential phosphorylation sites analysis revealed that there were 5 Serine and Threonine potential phosphorylated peptides in the E2 protein ( Table 2). Details of phosphoryla-tion analysis were depicted in Figure 4. We found both of glycosylation and phosphorylation sites located inside of the membrane region of E2 protein.  To account for the possible Palmitoylation sites we applied CSS-PALM 3.0 software by choosing medium threshold (Table 3). Results showed two palmitoylation sites in this protein which are near each other. Palmitoylation sites are located inside of this protein regarding TMHMM online server.

Protein Secondary Structure Prediction
As it shown in Figure 5, six α-helix, 12 β-strandexist in E2 protein of GB virus C.
Finally calculating Coils (Beta turns) revealed 17 coil re-gion in E2 structure. Outer membrane region predicted by TMHMM online server has α-helix (dominant structure), small β-strand as well as coil structure.Transmembrane regions have α-helix predominantly.

Physic-Chemical Properties
In Figure 6 we evaluated the existence of linear B-cell epitopes in E2 protein sequence based on physico-chemical properties. Details of these predictions are arranged in Table 4.

Antigenic propensity
Regions with antigenic propensity scale upper 1 are antigenic regions. Threshold, average, maximum and minimum antigenicity were 1.000, 1.058, 1.259, and 0.866 respectively. Window size and center position were 7 and 4, respectively.

Prediction Epitopes Based on Machine Learning Approaches
B-cell epitope prediction based on machine learning approaches were performed using BCPRED server, where criteria were set to have 75% specificity and ABCpred 65.93% accuracy with fixed length of 20 and 16 amino ac-ids (Table 5).Higher score of the peptide means the higher probability as an epitope.

Comparative Analysis for Consensus Epitopes for B-cell and 3D Structure of E2 Protein
Prediction of B-cell epitopes regarding transmembrane topology (especially outer membrane region), based on physico-chemical properties and machine learning approaches showed that this protein has different regions with potential of immunogenicity, but machine learning method by BCPREDS (specificity 80%) and ABCpred specificity (85%) could not predict epitopes in range of 257-270 (outer membrane region of protein). These servers had a consensus epitope in approximate region of 230-253 that is in transmembrane region based on TMHMM server prediction. In physico-chemical approaches the best performance was seen by Kolaskar-Tongaonkar algorithm in which a part of epitope Residue 231-296 (fragment of NH2-EARLVPLILLLLWWWVNQLAVLGLPAVEAAVAGEVFAG-PALSWCLGLPVVSMILGLANLVLYFRWL-COOH) was located in outer and transmembrane of E2 protein (Figure 8). These epitopes are optimal for immunization and diagnostic programs.

Validation Modeled Structure by Ramachandran Plot Assessment
3D model of the E2 protein with a total number of 310 amino acids was validated using the Ramachandran plot. Assessment of the plot (Figure 9) revealed that 90.4% of residues (281 amino acids) are in the favored regions, 4.5% residues (26 amino acids) in allowed regions and 4.8% residues (15 amino acids) are in the outlier region. The overall percentage of residues in favored and allowed region was 94.9. Therefore, the modeled structure is suitable.

Discussion
Here we provided deep insight into the computational molecular features and Immunoinformatics characteristic of E2 protein of GBV-C/HGV by using various bioinformatics techniques.
GBV-C and HGV are closely related isolates of the same virus, with more than 95 percent sequence homology (37). GBV-C and HGV are reported to have a mutation rate lower than the 1.4-1.9 × 10-3 base substitutions per site per year reported for HCV (38,39). RNA virus genomes (due to the lack of proofreading ability of their RNA-dependent RNA polymerase) have special potential to undergo mutation at high frequencies, and under selective pressures rapidly generate populations of viral variants. Such variability helps to evading of virus from clearance by both T-and B-cell immunity (40).
Three different HVR (HVR1133-135, HVR2256-260 and HVR3279-281) observed along E2 protein. Besides, residue HVR2256-260 located in outer membrane region of E2 protein. Different researchers suggest that HCV hypervariable region 1 (HVR1) is located in a spanning of 27-31 (or 25-30 in some reports) residues at E2 glycoprotein which is the main target of the anti-HCV neutralizing response and hence plays an important role in providing viral persistence (41,42). Substitutions of amino acid in HVR1 during HCV infection provide groups of genetically related variants named quasi species (43), that some of these mutants have potential to escape immune response and persist after sero-conversion (42). Much of HCV variability is concentrated in the HVR1 region, therefore for designing more successful vaccine it is needed to induce a broad spectrum, and more cross-reactive response against many HVR1 simultaneously, which bioinformatics could achieve this goal (44).
Sequence analysis of the transmembrane topology of HCV E2 in details and its importance are widely discussed (45). These studies revealed that mutations rarely occur at transmembrane sites and there are high conservation, although there is variation in outer membrane region (these conservation of residues are crucial for the viral specific functions) (45)(46)(47). In our study, analysis of transmembrane topology using the TMHMM online server for GBV-C envelope E2 revealed that residues 257-270 presented outside while residues 234-256 and 271-293 were transmembrane regions. Finding modifications sites, patterns and number of important viral protein such as; N-glycosylation, palmitoylation, phosphorylation etc. have an enormous effects on foldings, entry functions, viral transportation/replication/assembly, infectivity, pathogenicity, immunogenicity as well as it may explain different virulence between different isolates of a virus and also viral genus (48).
In residue 73, N-glycosylation site was found in E2 protein of GB virus C. In case of HCV the ectodomain of envelope glycoproteins E2 has high modification by N-linked glycans and defined 11 potential glycosylation sites (49,50), that E2 glycosylation sites show conservation. Indeed, comprehensive sequence analyses of potential glycosylation sites in E2 indicate that 9 of the 11 sites are strongly conserved (49,50). In this research, phosphorylation sites analysis revealed that there were 5 Serine/ Threonine potential phosphorylated peptides. Both of glycosylation and phosphorylation sites were located inside of the membrane region of E2 protein.
Also, there are reports on in-silico evaluation of glycosylation, phosphorylation and palmitoylation in other viral proteins such as S1 protein from Infectious Bronchitis Virus (IBV), and they finally interpreted that there is differences in number and location of mentioned properties between isolates but most of the glycosylation, phosphorylation and Palmitoylation sites were conserved within specific genotypes (51). These conserved residues are crucial for the viral specific functions. Also, our results showed positions 38 and 42 palmitoylated in E2 protein of GB virus C. Several studies reported evaluation of palmitoylation sites in influenza virus, HIV-1, Semliki Forest virus and Infectious Bronchitis Virus (51), and revealed impact of palmitoylation on viral biology and functions.
Structure prediction approaches have been continuously developed and they greatly accelerated and simplified discovery of biological features of macromolecule and provided a structural framework for novel and innova-tive hypotheses. It might notice that different methods have different reliability, and this subject has to be taken into account while using their results and compare the prediction with an experimental result (21). Six α-helix, 12 β-strand and 17 Coils structure were present in E2 protein of GB virus C. Outer membrane region has α-helix (dominant structure), small β-strand as well as coil structure. Transmembrane regions have α-helix predominantly.
The data extracted from the three-dimensional structure of a protein is essential for understanding and solving the details of its molecular function, and gives valuable knowledge for the development of effective rational strategies for experiments such as findings disease related mutations, site directed mutagenesis, or vaccine and drug design based on protein structure ( 22 ). In this work, we visualized positions of variability and epitopes in 3D structure (Figure 8).
By comparative analysis of B-cell epitopes between physico-chemical and machine learning approaches regarding 3D/secondary structure and outer membrane region, the best performance was seen by Kolaskar-Tongaonkar algorithm. This epitope was residue 231-296 (fragment of NH3-EARLVPLILLLLWWWVNQLAVLGLPAVEAAV-AGEVFAGPALSWCLGLPVVSMILGLANLVLYFRWL-COOH) ( Figure 8). So, this epitope is optimal for immunization and diagnostic methods.
The comprehensive analysis of a protein with important roles has never been easy, especially when we attempt to make statements from different aspects about this protein. In case of E2 envelope glycoprotein of HGV, there is no much data on its molecular and immunological features, clinical significance and its pathogenic potential in hepatitis or any other GBV-C related diseases. So, results of the present study may explain some of its structural, physiological and immunological functions in GBV-C virus, as well as help to better understanding E2 protein potential of other members of Flavivirus family, especially HCV.

Seyed Moayed Alavian, Hossein Keyvani, Mohammad
Hassan Motedayen, and Alireza Sazmand prepared the manuscript and provided assurance regarding the scientific content. Abbas Roayaei checked the article from computational aspects. All authors contributed to the final version of the manuscript.

Financial Disclosure
The authors have no financial disclosures to declare and no conflicts of interest to report.

Funding/Support
There was no support for this research.