Immunoinformatics approach for multi-epitope vaccine design against structural proteins and ORF1a polyprotein of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2)

Background The lack of effective treatment against the highly infectious SARS-CoV-2 has aggravated the already catastrophic global health issue. Here, in an attempt to design an efficient vaccine, a thorough immunoinformatics approach was followed to predict the most suitable viral proteins epitopes for building that vaccine. Methods The amino acid sequences of four structural proteins (S, M, N, E) along with one potentially antigenic accessory protein (ORF1a) of SARS-CoV-2 were inspected for the most appropriate epitopes to be used for building the vaccine construct. Several immunoinformatics tools were used to assess the antigenicity (VaxiJen server), immunogenicity (IEDB immunogenicity tool), allergenicity (AlgPred), toxigenicity (ToxinPred server), interferon-gamma inducing capacity (IFNepitope server), and the physicochemical properties of the construct (ProtParam tool). Results The final candidate vaccine construct consisted of 468 amino acids, encompassing 29 epitopes. The CTL epitopes that passed the antigenicity, allergenicity, toxigenicity and immunogenicity assessment were four epitopes from S protein, one from M protein, two from N protein, 12 from the ORF1a polyprotein and none from E protein. While the HTL epitopes that passed the antigenicity, allergenicity, toxigenicity and INF-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma$$\end{document}γ were one from S protein, three from M protein, six from the ORF1a polyprotein and none from N and E proteins. All the vaccine properties and its ability to trigger the humoral and cell-mediated immune response were validated computationally. Molecular modeling, docking to TLR3, simulation, and molecular dynamics were also carried out. Finally, a molecular clone using pET28::mAID expression plasmid vector was prepared. Conclusion The overall results of the study suggest that the final multi-epitope chimeric construct is a potential candidate for an efficient protective vaccine against SARS-CoV-2.


Introduction
In early December 2019, an acute respiratory disease of unknown etiology emerged in Wuhan, China, which subsequently found to be caused by a highly contagious coronavirus. The virus was initially described as 2019-nCoV and later named by the International Committee on Taxonomy of Viruses (ICTV) as Severe Acute Respiratory Syndrome Coronavirus − 2 (SARS-CoV-2), while the World Health Organization (WHO) named the disease Coronavirus disease − 19   [1][2][3][4][5]. Within the rst three months after its discovery, the disease spread to more than 100 countries and caused more than 4,000 death worldwide [6]. On the 11th of March, 2020, the WHO categorized the newly discovered disease as a protein of 419 amino acid (Accession No. QIH45060.1) and ORF1a polyprotein of 4405 amino acid (Accession No. QJQ84087.1) were retrieved from NCBI protein database (https://www.ncbi.nlm.nih.gov/protein) in FASTA format.

Population coverage
The prediction of worldwide population coverage of the selected epitopes for MHC-I and MHC-II alleles was carried out using population coverage tool of IEDB (http://tools.iedb.org/population/) [31], calculating the coverage for class I and class II separately and combined.

Construction of multiepitope vaccine sequence
To ensure e cient vaccine construction and proper epitope separation, all candidate epitopes were joined together using linkers. The B-cell epitope and CTL epitopes were linked with AAY linker, and HTL epitopes were linked together and to the CTL epitopes with GPGPG linker. To facilitate future conjugation of the multi-epitope vaccine construct with a carrier protein, a cysteine residue was added at the N-terminal [35]. Furthermore, a four amino acid (EPEA) tag was added at the C-terminal for e cient puri cation [36]. The vaccine construct was subjected to further analysis to assess its antigenicity with VaxiJen 2.0 server, allergenicity with AlgPred server, physicochemical properties with ProtParam tool (https://web.expasy.org/protparam/) [37].

In silico molecular cloning
The amino acid sequence for the candidate vaccine was then subjected to reverse translation and codon optimization with JAVA codon adaptation tool (Jcat) (http://www.jcat.de) [45]. The DNA sequence was then used for in silico molecular cloning with expression plasmid vector pET28::mAID [46] using Snapgene software.

T -cell epitopes prediction
The initial screening of amino acid sequences of all 5 proteins for antigenicity, showed a score greater than the threshold value of 0.4 indicating probable antigens, these sequences were then submitted to NetCTL server to predict possible CTL epitopes, which resulted in 37 possible epitopes for S. protein, out of which 14 showed no toxicity and 8 positive immunogenicity score. Ultimately, the top 4 epitopes were selected for inclusion in the multi-epitope vaccine construct. For M. protein, 10 epitopes were predicted, 5 epitopes were non-toxin and one showed a positive immunogenicity score. For E. protein, 3 epitopes were predicted, two of which showed an antigenicity score higher than the threshold value and non-toxic, but neither showed a positive immunogenicity score, hence, not included in the vaccine construct.
For N. protein, 9 epitopes were predicted, 6 showed an antigenicity score higher than the threshold value, all 6 predicted epitopes were non-toxin, of which, ve showed positive immunogenicity score, and only the top two were selected to be included in the construct. For the nonstructural polyproteins, on the other hand, 170 epitopes were predicted, of which 96 showed antigenicity score higher than the threshold, and the best 12 were selected based on toxigenicity and immunogenicity results, Table 1. The HTL epitopes prediction with MHC-II binding tool of IEDB and based on percentile rank less than 10, resulted in 17 epitopes for S. protein, 12 were non-allergenic, 10 were non-toxic and a single epitope showed a positive interferon-gamma induction result. For M. protein were the predicted HTL epitopes were 55, out of which 43 were non-allergenic antigenic non-toxic epitopes, and only 3 epitopes showed positive interferon-gamma induction results. None of the predicted HTL epitopes of N. protein showed interferon-gamma positive results, therefore none were in the vaccine construct. Similarly, all HTL epitopes predicted for E. protein failed to pass either the antigenicity, allergenicity, or interferon-gamma induction assessment. Out of 96 HTL predicted epitopes for polyproteins of ORF1a, only 6 epitopes passed the antigenicity, allergenicity, toxigenicity, and interferon-gamma induction assessment, results are shown in Table 2.

B-cell epitopes prediction
The B-cell epitopes are an important part of the multi-epitope vaccine because recognition of these epitopes by B lymphocytes elicit antibody production, which is a key process in adaptive immunity. For all ve proteins, linear B-cell epitopes were predicted using Bepipred Linear Epitope Prediction 2.0 method, Emini surface accessibility prediction method, and Kolaskar & Tongaonkar antigenicity method, these methods were selected because they assess properties that are important for predicting potential epitopes, such as antigenicity, surface accessibility, and exibility. The resultant plots were then inspected for overlapping regions showing epitopes by the three methods. The only protein to show such an overlapping region was N. protein with a sequence of 10 amino acids from 380-390. The results of all amino acid sequences are shown in Fig. 1

Multi-epitope vaccine construction
For the construction of the nal vaccine construct, the most appropriate predicted epitopes were selected, this included one B-cell linear epitope from N. protein, 4 CTL and 3 HTL epitopes from S. protein, one CTL and 2 HTL epitopes from M. protein, 2 CTL epitopes from N. protein, 12 CTL and 6 HTL epitopes from ORF1a. These epitopes were joined together with two types of linkers, AAY for linear B-cell and CTL epitopes, and GPGPG for HTL epitopes, with cysteine residue at the N-terminal and EPEA tag at Cterminal, this yielded the following 468 amino acid peptide chain:

Vaccine modeling and structure analysis
Based on the amino acid sequence of the vaccine construct, the result of the PSIpred server revealed different secondary structures. This is considered a primary step towards predicting the threedimensional structure of the protein. Figure 4 The 3D protein model was then predicted with two modeling approaches; threading model with IntFOLD server and Ab initio modeling with the trRosetta server, the resultant models were then analyzed with Ramachandran plot and ProSA-web z-score based on X-ray crystallography and NMR analysis, the bestpredicted model showed 98% of the residues in the favorable region in Ramachandran plot, Fig. 4A, and z-score of -6.01, determined by x-ray crystallography, Fig. 4B.
The statistics of non-bonded interactions between different atom types, and then the error function value was plotted against a position of a-9 residue sliding window, calculated by comparison with statistics from highly re ned structures, carried out using ERRAT server, and the calculated error value obtained was 81.928, which falls well below 91% indicating a relatively average overall quality for the selected protein model, this is can be justi ed by the fact that the modeling process was carried out using ab initio modeling approach. Figure 5A.

Molecular docking and dynamics
The nal vaccine construct was docked with Toll-like receptor 3 (PDB ID: 1ziw) using the FRODOCK server. The docked vaccine-receptor complex was then prepared for simulation using a protein-prep wizard and Pymol software using the default settings, the molecular dynamics simulation was then carried out using the Desmond tool and Superpose1.0 server (http://superpose.wishartlab.com) for calculating the root mean square deviation (RMSD) value of 3.78 which suggests a relatively poor binding pose at the site of the receptor and vaccine binding. Figure 5B.

Immune response simulation
Measuring the immune response is a pivotal step for vaccine designing, this contingent on a number of algorithms that make use of mathematical models to illustrate the ne details of the immunological process. In the present study, the C-ImmSim server was used to simulate immune response with the candidate vaccine construct. Simulation with this tool focuses on B-cell epitope binding, class I and II HLA epitope binding, and the binding of the T-cell receptor to HLA-peptide complexes, the tool then details the dynamics of immune cells populations and the molecules involved in the immune response [47].
The simulation results showed an increased and sustained level of B-memory and active cells, and a high level of IgM, which represents the primary response against the antigen, this suggests effective humoral response, Fig. 6. A & B. T helper cell population showed very promising results, as the levels of memory helper cells and active T helper cells remained high for the entire period of simulation, suggesting prolonged humoral and cell-mediated immune response, Fig. 6. C & D. The results of the T cytotoxic cell population steady level of the memory cells, while the active cell population showed an increased level throughout the stimulation period, Fig. 6. E & F. The result of different immunoglobulin isotopes showed high level in the rst two weeks followed by a gradual decline, similar result was shown by interferon-gamma level, this can be viewed as a positive point, hence, the rst two weeks are considered detrimental for the course and outcome of the disease. [48] Fig. 6. G & H.

In silico molecular cloning
The DNA sequence produced by Jcat showed a GC content of 56% and a codon adaptation index of 1.0, which indicate a stable DNA sequence and a high level of protein expression. Figure 7.

Discussion
The current Covid-19 pandemic associated with SARS-CoV-2 infection is the third coronavirus outbreak in the last 20 years besides the severe acute respiratory syndrome (SARS) and the Middle East respiratory syndrome (MERS). SARS-CoV-2 shows relatively higher transmissibility as compared to other emerging viruses such as H7N9 and MERS-CoV [49,50]. This entails the imperative search for effective vaccine and treatment in addition to the protective and social distancing measures to contain and control the disease. The immunoinformatics approach provides a promising tool for designing and exploring potential vaccines against bacterial, parasitic, and viral diseases [51]. In this study, a multi-epitope vaccine was constructed using all the virus structural proteins and the largest non-structural polyprotein [52]. These proteins were selected based on suggestions from previous studies [53,54]. Unlike the single subunit vaccine, the multi-epitope vaccine is believed to induce a better and more protective immune response [55]. In the present study antigenic, non-allergenic, and non-toxic epitopes were identi ed and used for the construction of the nal candidate vaccine. All ve proteins were studied for potential epitopes, however, none of the peptides from the envelope protein (E) was eligible for the selection in the nal vaccine construct, due to either lack of antigenicity or the allergenicity and toxicity of these peptides. For the nal vaccine construct, CTL, HTL, line B-cell epitopes were linked together using AAY, and GPGPG linkers which provide proper proteasomes cleavage sites for different immune cells [56] which will ultimately enhance the antigen presentation process by binding TAP transporters [57]. Furthermore, linking of CTL epitopes from different proteins together forms epitopes on a string which is believed to enhance the immunogenicity of CTL epitopes [58]. To the N-terminal of the vaccine construct a cysteine residue was added to facilitate the binding of this vaccine to protein carrier [35], and to the C-terminal, a small peptide of four amino acids EPEA was added to enable downstream puri cation process [36]. The candidate vaccine construct consists of 486 amino acid, which is an ideal vaccine length, hence, larger proteins are presented by dendritic cells leading to stronger T-cell immune response [57], while extremely short peptides may induce tolerance and anergy by directly binding MHC molecules of non-professional antigen-presenting cells [59]. Determination of the secondary structure of the protein is a pivotal step towards the prediction of its three-dimensional structure, therefore, the secondary structure of the candidate vaccine was determined using PSIpred server, followed by structure re nement, and protein modeling. Two approaches were used for modeling the protein, threading approach, and ab initio approach, the best resultant model was selected based on the Ramachandran plot and z-score analyses.
The docking of the vaccine and TLR-3 showed a possible hydrophilic interaction [60], this interaction indicates a possible recognition of the vaccine by APC speci c receptor, which in turn promotes the immune response [61]. The physicochemical properties of the vaccine construct. The results of immune response simulation showed very promising results, with a sustained response for the cells involved in the humoral and cell-mediated immunity against SARS-CoV-2.
The conventional methods of vaccine development are very costly and time-consuming, alternatively, the immunoinformatics approach has attracted the attention as an ideal method for designing lessexpensive, rapid, e cient, multi-epitope vaccines. However, experimental validation is of utmost importance to ensure the safety and e cacy of the resultant vaccine.

Conclusion
The highly contagious nature of SARS-CoV-2 left the entire world population with no option but to wait for the production of a safe and protective vaccine to break the chain of infection and tackle the spread of this pandemic. It is rather impractical to rely on the conventional methods for producing such a vaccine due to a number of limiting factors. This study is an attempt to design an e cient multi-epitope chimeric subunit vaccine that is capable of mounting a strong immune response by induction of both humoral and cellular mediated immunity, with the help of a large number of immunoinformatics tools. The vaccine construct effectively ful lled the requirements for characteristics such as antigenicity, allergenicity, immunogenicity, physiochemical properties, eliciting the immune response in a simulation model. It is concluded that this novel construct represents a promising candidate for an e cient protective vaccine against SARS-CoV-2.  This is considered a primary step towards predicting the three-dimensional structure of the protein.  this is can be justi ed by the fact that the modeling process was carried out using ab initio modeling approach. Figure 5A. The simulation results showed an increased and sustained level of B-memory and active cells, and a high level of IgM, which represents the primary response against the antigen, this suggests effective humoral response, Figure 6. A & B. T helper cell population showed very promising results, as the levels of memory helper cells and active T helper cells remained high for the entire period of simulation, suggesting prolonged humoral and cell-mediated immune response, Figure 6. C & D. The results of the T cytotoxic cell population steady level of the memory cells, while the active cell population showed an increased level throughout the stimulation period, Figure 6. E & F. The result of different immunoglobulin isotopes showed high level in the rst two weeks followed by a gradual decline, similar result was shown by interferon-gamma level, this can be viewed as a positive point, hence, the rst two weeks are considered detrimental for the course and outcome of the disease. [48] Figure 6. G & H.

Figure 7
The DNA sequence produced by Jcat showed a GC content of 56% and a codon adaptation index of 1.0, which indicate a stable DNA sequence and a high level of protein expression. Figure 7.