Software tools for Biological discovery.
Video Tutorials
User Guide
Research Papers
HIV ToolBox v2.0 User Guide

HIVToolbox v2.0

User Guide (2014-04-14)

1. Introduction

The goal of HIVToolbox is to create an integrated web tool that can be used to investigate how HIV proteins work in a system and also to identify new potential drug targets in HIV. The HIVtoolbox database contains integrated data for all HIV proteins. The application has a sequence window that maps domains, functional sites, protein-protein interactions, structures, and potential functional minimotifs to the protein sequence. The sequence is interactively coupled to six structure viewers that are synchronized. These structure windows shows structures colored by domains, predicted minimotifs, functional sites, protein-protein interactions, sequence conservation, drug resistant mutations, drug binding sites, and immune epitopes. Sequence conservation is calculated for thousands of sequences from different HIV isolates. These windows are integrated with a species/group/clade selected and sequence alignment/PSSM display. HIVtoolbox was built by David Sargeant, Sandeep Deverasetty, and Angel Baleta.

2. Citing HIVToolbox

Citing our papers helps us to get funding to continue this project be adding in new functionality requested by users and updating data on a regular basis. Please cite our papers!

David P. Sargeant, Sandeep Deverasetty, Christy L. Strong, Izua J. Alaniz, Alexandria Bartlett, Nicholas R. Brandon, Frederick A. Brown, Flaviona Bufi, Roxanne P. David, Karlyn M. Dobritch, Horacio P. Guerra, Kelvy S. Levitt, Kiran Mathews, Ray Matti, Dorothea Q. Maza, Sabyasachy Mistry, Austin Pomerantz, Josue Portillo, Viraj R. Rathnayake, Noura Rezapour, Sarah Songao, Sean L. Tuggle, Helen J. Wing, Sandy Yousif, David I. Dorsky, and Martin R. Schiller (2014) The HIVToobox 2 web system integrates sequence, structure, function and mutation analysis., PLoS One (pending)

Sargeant D, Deverasetty S, Luo Y, Baleta AV, Zobrist S, Rathnayake V, Russo JC, Vyas J, Muesing MA, and Schiller MR (2011) HIVToolbox, an integrated web application for investigating HIV, PLoS One 6:e20122 . PMID: 21647445.

3. Selecting an HIV protein or ARV drug

HIV proteins can be chose form the HIVToolbox homepage by selecting the protein from a pull down menu or be selecting a text or shape based object from this page (Fig. 1). Alternatively, pull down menus in the middle of the page can be used to select another HIV protein or a antiretroviral drug.

Figure 1

4. Sequence Windows of HIVToolbox
Figure 2

4a. Sequence Window

  1. Font colors - domains are colored with different colors; black = linker or N-terminus or C-terminus. Domains have the same color scheme as in the left panel of the structure viewer (see section B) and can be hidden from the display menu.

  2. Highlighted fonts - functional elements such as DNA binding sites, phosphorylation sites, etc. These have the same color in the middle panel of the structure window (see section B)

  3. Lines above sequence - each line is for a different chain of a structure. Selecting any chain loads it into the structure viewers (section B). The page is initially loaded with thin lines for all known structures for the protein selected (Fig. 2A). If the checkbox is selected, only non-redundant structures are shown as thicker lines (Fig. 2B)

  4. Thin lines below sequence - binding sites for known protein-protein interactions with the HIV proteins. Selecting a line displays the site in the middle panel of the structure window.

  5. Bars below sequence - these are minimotifs predicted from the Minimotif Miner website. Minimotifs are short contiguous peptide elements in proteins that have a known function. Upward facing lines represent a conserved residue in the minimotif consensus sequence and downward facing lines indicate a degenerate position. Spaces every 10 residues in the sequence are also included as upward facing lines.

Hovering over any of the above item reveals a popup with information for the element selecting any item prints hyperlinked information about the selection to the log windows.

4b. Sequence Alignment Windows

  1. Selecting Isolates - In the Isolate selection menu, the user chooses the species, which populates a group list, the choosing a group populates the subtype list. The sleeting the ClustalW Alignment button the sequence alignment section load these sequences into the sequence alignment and structural display selections. Currently most sequences are for HIV-1 / M group.

  2. Sequence Alignments - Once the ClustalW Alignment button is selected a sequence alignment of the selected sequences is performed, only the PDB sequence, RefSeq sequence, and 20 sequences are shown in the alignment (Fig. 3A). This alignment includes all sequences for known structures of the specific HIV protein.

  3. Position Specific-Scoring Matrices - The PSSM tab shows a Position Specific-Scoring Matrix for all positions in the protein for all sequences selected (Fig. 3B).. The vertical axis shows the single letter amino acid code and horizontal axis shows the sequences of the PDB structure currently loaded and the RefSeq sequence for the specific HIV protein. The numbering scheme for both sequences is shown to facilitate comparison of the protein sequence with the structure sequence, which sometimes do not agree with each other.

  4. Integration with Structure View - Once the structure is selected and isolates are aligned, the slider below the right structural panel can be used to titrate the conservation level. When residues are colored in the structure for conserved or not conserved, the threshold value is also used to color the conserved residues in the PSSM view. Adjusting the threshold, alters the residues that are colored, so that the calculations used to make the plot in the third structural panel can be seen. This also has the advantage that a user can immediately see what types of substitutions are observed. Furthermore, this can be used to readily explore differences between species, groups, and subtypes.

Figure 3a
Figure 3b

5. Structure Analysis Windows
Figure 4
Figure 5

5a. Analyzing conserved functional sites and 3D regions
HIVToolbox structural windows can be used to examine the spatial relationship of functional sites as well as the conservation of these sites. Fig. 5 shows the output for Tat protein. Some selective observations examining this figure are: 1) The ADP-ribosylation site is juxtaposed to the RNA binding site 2) the Tat ubiquitination site is not well conserved. An interesting follow up to this observation would be to examine if it is conserved in certain subtypes as in section 6. Protein-protein interactions sites can also be mapped to the left window by selecting the thin lines under the protein sequence from the Sequence Window (not shown in Fig. 5).

The slider under the right window can be adjusted to change the threshold for conservation. In the example shown the slider was set to 78% so those residues that are gold are conserved at 78% or higher, whereas the blue residues are less well conserved below 78%. The value can be adjusted to any value. Residues in the PSSM scoring matrix are colored yellow as the slider is moved so that the PSSM shows the data used to generate the conservation surface plot. The PSSM also allows one to observe what types of substitutions are common when a residue is not conserved.

5b. Analyzing minimotifs.
There are thousands of short, contiguous sequences that encode critical molecular functions in proteins. Minimotif Miner (MnM) scans a protein sequence for the presence of these sequences, or minimotifs, which have been experimentally confirmed in one or more proteins. This is done by comparison against the MnM database, which includes structured annotations of the key functional details of such minimotifs from literature. MnM defines three distinct classes of such functions: (1) binding to other proteins, nucleic acids, or small molecules, (2) post-translational modification of the minimotif by an enzyme or chemical, and (3) protein trafficking. We use the term “minimotif” to distinguish short, functional, contiguous peptides from other types of motifs (i.e. DNA motifs, structural motifs).

The figures under the sequence in the Sequence Window of HIVtoolbox map minimotifs to the sequences of HIV proteins. The protrusions of the figures pointing up represent conserved sites in the consensus sequence where as those pointed downward are for variable positions any combinations of minimotifs can be selected in the Sequence Window which shows information about the minimotif in the Log Window and plots the motif on the surface of the protein in the left Structure Window.

Figure 6

We provide a sample analysis of the HIV capsid protein (Fig. 6). Shown one of two pairs of capsid protein hexamers that stack on top of each other. Two minimotifs were selected in the sequence window. One motif colored orange is a potential Calmodulin binding site that is hidden when the second hexamer of the capsid is added. The second motif is for a sequence in Tat that is known to bind p53. Note that these sites can be easily related to known functions in capsid (center Structure Window) and the conservation of these sites is readily observed in the right Structure Window. Chains can easily be shown or added through the checkboxes below the center Structure Window.

Figure 7

5c. Analyzing drug resistant mutations.
The new Drug Resistance Structure window (Fig. 7A) is initially loaded with a default structure for each protein:ARV complex, if one exists in the PDB. The DRMs in the drug resistance display are colored by a new DRM classification scheme (Table 1) where red = primary (a DRM that can cause observable resistance by itself), pink = primary set (a group of mutations that can cause resistance when the occur together), green = beneficial (a mutation that increases drug susceptibility), dark green = beneficial set (a set of mutations that together increase drug susceptibility), and purple = secondary set (which is one or more mutations that can enhance resistance when combined with a primary or primary set of mutations).

The Drug Resistance Mutation display also has a drop-down selection menu that allows selection of DRMs for a single drug to be displayed (Fig. 7A). The known DRMs are listed in the Drug Resistance Mutation log window with their position, drug, mutation, classification type, and hyperlink(s) to primary reference(s); rows are colored by resistance classification type. The table is interactive, where selecting the DRM identifies the location of the mutation in the Drug Resistant Mutation window with a temporary flash. Concurrently, the DRM is centered and zoomed to show the DRM (Fig. 7A). The DRMs for all ARV drugs are shown upon the initial loading of protein selected from the menu. A menu selector can be used to select a specific drug, and Load DRM button at the bottom of the Table enables loading of the selected ARV drugs.

Table 1 - Definitions for drug resistance mutation classifications
Type of DRMDefinition
PrimaryCauses resistance without any other mutations
Primary setTwo or more mutations that cause resistance only in the presence of other primary set mutation(s)
SecondaryEnhances resistance caused by a primary mutation
Resistance precursorA mutation that has no effect on resistance, but must occur prior to another primary or primary set of mutations
BeneficialA mutation that prevents or reduces resistance
Beneficial setTwo or more mutations that when occurring simultaneously prevent or reduce resistance

Figure 8

5d. Analyzing drug binding sites.
The new Drug Binding Sites structure window shows a surface plot with drug-binding site residues (Fig. 8A). The residues are colored like the DRMs, except that contact residues, for which there are no known drug resistance mutations, are colored orange. The drug is shown as a wireframe figure. A distance threshold can be selected from a pulldown menu below the Drug Binding Site Log window and then loaded (Fig. 8B). This threshold is for residues with an atom that makes contact with an atom of a bound drug within a specific distance. The distance threshold can be varied between 2.75 Å and 4.0 Å in 0.25 Å increments. The Drug Binding Site Log window shows the protein chain and position, distance to the closest atom in the drug, whether it is a known DRM, and the DRM classification type. Each row is colored by the class of DRM. Selection of the residue in the table shows the location of the residue in the structure window with a temporary flash, and also re-centers and zooms the structure to show the binding site residue.

Figure 9

5E. Analyzing immune epitopes
The Immune Epitope structure window colors positive immune epitopes in the surface of an HIV protein structure (Fig. 9A). Immune epitopes and their identifiers from the HIV Immune Epitope database 2.0 can be selected from a pulldown menu above the window or by selecting the epitope from the Epitopes Log window (Fig. 9B) [25]. If the shift key is held down while selecting multiple epitopes from the log window, multiple epitopes can be shown concurrently. The table also has the epitope ID and hyperlink to the entry in the Immune Epitope Database.

6. Analyzing sequence variability among groups and subtypes

HIVToolbox can used to examine variations among HIV groups and subtypes. The selection menu is used to choose species, group and subtype. Note that almost all sequences are of the M group. Once selected the residue conservation displayed in the right structural window is updated and the number of isolates used for the calculation is shown. The protein sequence alignment window is also updated, as well as the position specific-scoring matrix. An example for how this function can be used is shown for the analysis of four different potential CK2 phosphorylation sites in Integrase. The % conservation for each site is shown as analyzed in each HIV-1 subtype (Table 2).

Table 2 - Sequence conservation of CK2 sites in different strains of HIV

CK2 sites in INSite 1: 66-69Site 2: 93 -96Site 3: 195-198 Site 4: 283-286
(3787 isolates)
[ST]T -99%T-99%S-96 %; C-1 %; V-1%; T-2% G-49%; S-49%; V-1%
[DE]G-1%; E-99%E-96%; D-2%K-1%; E-98%V-1%; T-1%; D-94%; N-3%; E-1%
(20 isolates)
[ST]T-100%T-100%;S-100%G-81%; S-14%; D-5%
[DE]E-100%E-100%E-100%D-95%; N-5%
(149 Isolates)
[ST]T-99%; N-1%T-99%; N-1%; L-1%S-93%; V-1%; T-5% G-71%; S-26%; M-2%; D-1%
[DE]S-1%; L-1%; E-98%; Q-1%G-1%; T-1%; E-97%; Q-2%E-98%; G-1%; C-1%D-90%; N-6%; E-1%;
(1443 Isolates)
[ST]T-99%T-99%S-96%; C-1%; T-1%G-9%; S-89%; M-1%; R-1%
[DE]E-99%; G-1%D-1%; E-98%G-1%; D-1%; E-98%D-95%; N-2%; E-1%
(544 Isolates)
[ST]T-99%T-99%S-97%; C-1%; V-1%; T-1%G-87%; S-9%; V-1%; D-2%
[DE]E-99%G-1%; D-3%; E-96%G-1%; K-1%;E-98%G-1%; M-1%;D-96%; N-2%
(85 Isolates)
[ST]T-100%T-100%;S-98%; T-1%G-5%; S-94%;
[DE]G-2%; E-98%D-1%; E-99%E-99%; R-1%D-98%; N-1%
(3 Isolates)
[ST]T-100%T-100%S-100%G-75%; S-25%
[DE]E-100%E-100%E-100%D-75%; N-25%
(39 Isolates)
[ST]T-98%; K-3%T-98%; Y-3%S-98%; I-3%G-88%; S-10%
[DE]E-98%; W-3%D-13%; E-88%;T-3%; E-98%D-90%; N-8%
(8 Isolates)
[ST]T-100%T-100%S-100%G-78%; S-11%
[DE]E-100%E-100%E-89%; *del-11%D-100%
(49 Isolates)
[ST]T-100%T-100%S-100%G-88%; S-12%
[DE]E-100%E-98%; Q-2%E-100%D-96%; E-4%
(5 Isolates)
[ST]T-100%T-100%S-100%G-50%; S-50%;
(6 Isolates)
[ST]T-71%; *del-29%T-71%;*del-29%A-29%; S-71%G-57%; S-14%; P-14%; Q-14%;
[DE]D-29%; E-71%E-71%; *del-29%G-29%; E-71%;G-14%; D-71%;Q-14%;
(2 Isolates)
[ST]T-67%; R-33%T-100%S-67%; L-33%G-33%; S-33%;
[DE]M-33%; E-67%G-33%; E-67%E-67%; Q-33%;D-67%;

7. Types of Workflows

7a. New workflows enabled in HIVToolbox2

  • Workflows #1-16: Six integrated structural viewers make it easy to compare different types of data with regard to sequence, structure, function, sequence conservation, drug resistance and immune epitopes. The 16 different types of pairwise comparisons enabled are shown in Table 3. One example from these 16 workflows is shown for a HIV protease:Saquinavir complex in Fig. 4. This example of multiple comparisons shows that the T82 residue (arrows) is in a region that is not conserved (panel C – blue residues are not conserved) that is outside the active site (panel B) is a beneficial mutation (panels D, G – green) that makes contact with the drug (panels E, H) and is an immune epitope #40375 (panel F).

  • Workflow #17: Predicted effectors of HIV protein multimerization. Most HIV proteins form multimers required for their activity (Table 4). We considered that multimerization could potentially be regulated by other functional sites in proteins. Therefore, we looked for functional sites within the multimerization interface in different structures of HIV proteins. We noticed a common pattern where phosphorylation sites were present at sites of subunit interactions in structures of Vif, Rev, Tat, and Matrix multimers [26–29]. We identified some protein-protein interaction sites in Nef, Rev, Vif, and Vpr that overlap with the multimerization interface. Thus, they may be involved in HIV protein oligomerization and activity [26,27,30,31]. The Protein Sequence window can be used to investigate known and predicted minimotifs that overlap with HIV protein oligomerization sites.

  • Workflow #18: Identification of overlapping or non-overlapping functionalities to generate new hypotheses. Consolidation and integration of the functional information in HIVToolbox2 can facilitate experimental design and interpretation. One of the best examples of how coordination of data can be used to generate new hypotheses comes from examination of Tat with HIVToolbox2 (Fig. 5). The HIV Tat transcription factor is a potential drug target [32]. Examination of the Tat sequence shows a functional hotspot between residues 15-57 (Fig. 5C, blue shaded box). In this region, there are binding sites for ~30 different proteins and multiple types and sites of posttranslational modifications (PTMs). These residues are some of the mostly highly conserved regions in Tat (Fig. 5B). There are several examples in this region of Tat where functional sites are known to compete with each other [33].

    Structure mapping of sites on Tat with HIVToolbox2 (Fig. 5A) allows evaluation of which proteins or PTMs have residues that overlap other sites. These are expected to be competitive functions, in many cases. Several previously unknown examples of such functional overlaps are easily recognized. The Cyclin T1 and CDK9 binding sites overlap with an ADP ribosylation site. Tat also binds p53, which overlaps with several sites (Karopherin beta, Proteosome alpha 1, and DNA directed RNA polymerase II binding sites, as well as RNA binding site, and protein methylation sites and acetylation sites). From a compatibility perspective, the p53 and TBP associated factor 1 binding sites are adjacent to, but don't overlap with, the Tat dimerization site and Cyclin T binding sites. However, the TBP and p53 do have overlapping residues. There are far too many combinations to discuss here. But clearly, this tool is a source for better understanding the multiple roles of Tat. HIV2Toolbox2 helps interpret results as demonstrated by examining the hot spot region of Tat.

  • Workflow #19: Known and predicted minimotifs in HIV proteins. HIV Rev binds the Rev Response Element (RRE) in the HIV RNA genome and facilitates transport of the genomic RNA from the nucleus to the cytosol. Rev has known sequence elements associated with dimerization, phosphorylation, methylation, RNA binding, and ubiquitination. We examined Rev for minimotifs to demonstrate the utility of this type of workflow. The region of Rev between P76-L83 seems to be multifunctional, binding four different proteins. This region is not in the dimerization site or other functional sites. This region of Rev binds ArfGAP, a protein involved in nuclear export [34]. The nuclear export function seems to have redundancy with an overlapping NLP1 binding site, which serves as a bridge protein to bind Exportin 1 for nuclear export [35]. These are consistent with the known roles of Rev in export of the genomic HIV. This region also binds to prothymosin α, a protein involved in transcription, and Sam68, another RNA binding protein that is involved in HIV genomic RNA export, as well as in translational regulation of HIV RNA [36]. Given that there are four different binding proteins for this site, and that Rev forms dimers, it is currently unclear if Rev forms heterotetramers with two of its binding partners, and, if so, with which pairs of proteins. This is may be an important facet of Rev function.

  • Workflow #20: Global resistance landscapes. As an example of a global resistance landscapes, we examined HIV protease inhibitors using HIVToolbox2 (Fig. 7). This type of analysis demonstrates the utility of both the new DRM classification scheme and the HIVToolbox2 tool. When we examine the distribution of the DRMs on the protease surface plots for all FDA approved drugs that target HIV protease, several resistance patterns become apparent. All known primary mutations are in the drug-binding pockets of the drugs. Primary set mutations contain residues that are either in the binding pocket or immediately juxtaposed, but only on one face of the protease. Beneficial or beneficial set mutations are clustered near the active site but in a region overlapping with the primary set mutations. Secondary-set mutations generally overlap with a region containing primary set mutations. Mutations are observed in the active site and in residues that form a flap covering the active site, but never in the dimerization residues. The active site, flap, and dimerization site residues are highly conserved, whereas many residues in the primary set and beneficial regions have lower conservation levels (as little as 85% in ~50,000 HIV-1 protease sequences).

  • Workflow #21: Examining amino acid frequencies by HIV subtype. A useful feature of HIVToolbox2 is that it enables the ability to view mutations and their frequencies in specific viral subtypes. This can be accomplished for any known amino acid in an HIV protein by using the pulldown menus at the bottom of the Sequence window, selecting the Clustal Alignment in the Sequence Alignment section, and then selecting the PSSM. The frequencies are calculated from the data in the Los Alamos HIV Sequence database, which features data that is not collected in a single standardized epidemiological study, but does provide a rough snapshot of mutation prevalence in each subtype.

Table 3 - Example of use cases 1-16 enabled by HIVToolbox2

Use caseWindow RelationshipsExample
1*Motif/Domains vs. Functional sites/Protein-Protein InteractionsThe DNA primer binding site is in the RVT connect domain of RT.
2*Motif/Domains vs. ConservationThe RT domain has the highest conservation when compared to the thumb and connect domains.
3*Functional sites/Protein-Protein Interactions vs. conservationMany functional and protein interaction sites in Tat are conserved in >90% of 2482 sequences
4Motif/Domains vs. DRMsThe only DRM in the thumb domain of RT is the L283I beneficial set mutation for Efavirenz
5Motif/Domains vs. Drug binding sitesThe Nevirapine binding site is in the RVT domain of RT
6Motif/Domains vs. Immune epitopes The entire p24 domain of capsid has immune epitopes except for residues 93-98, 100 and 220. Some are involved in inter-monomer contacts.
7functional sites/Protein-protein Interactions vs. conservationThe S16 phosphorylation sites and K28 acetylation site are completely conserved in 2482 Tat sequences.
8Functional sites/protein-protein interactions vs. DRMsThe S230R secondary set DRM in Integrase is a residue involved in DNA binding.
9Functional sites/Protein-Protein Interactions vs. drug binding sitesEpitopes 1180, 2835,1292, 13675 and 14143 are in the RNase domain of p66 RT
10Functional sites/Protein-Protein Interactions vs. immune epitopesEpitopes 69437, 69439, 59975 are in dATP binding site of RT.
11DRMs vs. conservationWhen compared to ~50,000 virus sequences, beneficial mutations N88S 2% and I50L <1%. Primary I47A <1%, I50V <1%, I54L/M<1%, I84V 3% Primary set I54V is in 88%.
12Drug binding sites vs. conservationMost APV binding site residues are highly conserved which the exception of I84 and G48 ~2% that later is not a primary mutation
13Immune epitopes vs. conservationEpitope 32326 is highly conserved but some subtypes show modest conservation of I46 and M54
14DRMs vs. Drug binding sitesMost DRMs are in residues within 4Â? of atoms in the Amprenavir drug; however there are notable exceptions of beneficial mutation N88S and several secondary mutations. There are also a number of drug binding site residues where DRMs have not been observed.
15DRMS vs. Immune epitopesFor Amprenavir and protease several immune epitopes overs lab with import DRMS I84V, primary; L76V and V32I primary set are contained in epitope 40375; M46I primary set, Beneficial mutation I50L; primary, I50V or I54L/M are contained in epitope 32326
16Drug binding sites vs. immune epitopesFor Amprenavir and protease immune epitopes 40375 and 32326 contain many binding site residues and also involve residues that contact the drug

7b. Example analyses of Integrase

Figure 10

To demonstrate different types of analysis supported by HIVToolbox, Integrase (IN) was analyzed as a case study. A partial description of the analysis presented in Sargeant et al., 2011 is provided. IN is a well-studied multidomain and oligomeric viral protein that is essential for integrating viral DNA into the host genome, for viral infectivity, and for which potent inhibitors of its strand transfer function are chemotherapeutically available. One of the advantages of using HIVToolbox is that data from many separate studies can be readily interpreted simultaneously.

More than 10 different studies have identified different sets of IN residues that bind DNA. Mapping all DNA binding residues onto the structures of IN shows a cluster of DNA binding residues near the active site (Fig. 10C). However, there are several other scattered clusters throughout IN. Comparison of the structure of the IN dimer shows that DNA binding residues in this binding groove continue into the juxtaposed catalytic domain of the dimer (Fig. 11A). The continuity of these additional DNA interacting residues (D207, K111, K136, E138, K215) [RefSeq: NP_705928] only becomes apparent in the dimer and because we are analyzing all DNA binding residues at once.

Figure 11

Analysis of IN with HIVToolbox reveals that there is a striking overlap of clusters of DNA binding residues with several nuclear import motifs (Figs. 10B, 10C, 11C). Karyopherin α5 binds three regions on the surface of IN dimers. One of these sites overlaps almost entirely with the LEDGF binding site, whereas the other nuclear import sites overlap with DNA binding sites (Fig. 11C), thus competition for these sites would be expected, but has not yet been explored. Importin 7 binding requires two sites in the CTD; analysis with HIVToolbox reveals that these sites overlap with the cluster of residues that bind the viral LTR DNA. Consistent with the overlapping sites, the levels of viral genome are reduced >50% when the Importin 7 motifs are mutated. However, analysis with HIVToolbox reveals that one of the Importin 7 sites overlaps with DNA binding residues. It is clear that the effect of karyopherins on binding of viral DNA needs to be considered in interpretation of their effects on nuclear import and binding LEDGF. This relationship becomes clear when HIVToolbox is used for interpretation. Nup153 is also implicated in nuclear import of IN, but its binding site within IN is not yet known.

The spatial arrangement of the nuclear import motifs on the surface of a full-length IN dimer is striking. The five known nuclear import motifs are spatially contiguously aligned like a ‘zipper’ along the surface of the dimer, with two Karyopherin α5 sites located on one subunit, in trans with one Karyopherin α5 and two Importin 7 motifs on the opposing subunit (Fig. 11C). Some Karyopherin sites in these subunits are buried in the IN tetramer; however, two of the 5-motif zippers are located along the surface.

While Karyopherin α5 and Importin 7 both serve roles in nuclear import, they likely would compete with binding of IN to the HIV-1 LTR DNA and to LEDGF. Presumably, these karyopherins would block these functional sites in the cytosol, but become activated after import of IN into the nucleus. It is not surprising given so many IN nuclear import motifs, which are likely redundant, that a recent re-evaluation found none to be required for nuclear import.

Regarding the analysis of nuclear import minimotifs, since there is no structure of full length IN, these analyses also involved a number of different IN structural models that were generated by superposition of common regions in experimental IN structures. The models are available on the HIVToolbox website shown as dashed lines above the protein sequence in the Sequence Window.

7c. Example analyses of Protease Resistance Landscape
A new feature in HIVToolbox is the ability to view DRMs mapped onto the surface of protein structures. Fig. 12 shows a comparison of DRMs for various FDA-approved HIV protease inhibitors. This analysis, when combined with an extended DRM classification scheme, reveals an anatomy of resistance in protease. Each type of DRM is localized to a specific region of protease. Furthermore, drug resistance mutations have not yet been observed near the dimerization or nitrosylation sites. The observation of such a global pattern is not easily recognized without the visual mining enabled by HIVToolbox2. We note that the region covered by 4 protease immune epitopes is inclusive of the regions that have primary and primary set mutations. This resistance anatomy may prove useful for pharmaceutical companies in designing future ARVs that are less susceptible to drug resistance.

Figure 12

8. Working with Jmol
  • Zoom: Hoover over the figure, press shift, drag vertically up to zoom in and vertically down to zoom out.

  • Rotate: To rotate you must hoover over the figure, and drag.

  • Translate: To translate you must hoover over the figure, press shift, double click, and drag.

  • Color an Atom: To color an atom right double click, in the pull down menu select “style” and then “atom”, and then select “color,” and then the color.

Jmol has a lot of flexibility and a command line console can be used to control many aspect of the structure presentation. More detailed instructions, videos, and tutorials can be found at:

9. Entity-relationship diagram and data sources
Table 4 - Sources of data in the HIVToolbox MySQL database

Table NameData TypeSource
hiv_protein_annotationAnnotations for HIV-1 proteins and effects on host proteinsNCBI/PubMed & NCBI/Protein
hiv_protein_sequence_conservationList of each residue in HIV-1 proteins and associated conservation in existing dataSelf-generated from NCBI/RefSeq data
hiv_protein_sequence_featureList of interesting sites on HIV-1 proteins (sites for dimerization, binding, cleavage, etc.)NCBI/PubMed, HIV PPI database, & *RCSB/PDB
ref_groupHIV-1 groups and accession numbersNCBI/RefSeq
ref_hiv_isolates_pssmPosition-specific scoring matrix data generated by ClustalW from HIV-1 isolate sequences.Calculated from Los Alamos/HIV and NCBI/PubMed
ref_hiv_protein_aliasHIV-1 proteins and synonym namesNCBI/PubMed
ref_hiv_protein_domainList of domains in each HIV-1 protein and location of each domain in its proteinNCBI/Conserved Domains
List of interaction sites for HIV-1 proteins and residue positions for each interaction siteNCBI/PubMed HIV PPI database
ref_hiv_protein_structureList of structures in HIV-1 proteins and data about each structureRCSB/PDB
ref_hiv_protein_structure_chainSequence information for structures of HIV-1 proteinsRCSB/PDB
Positional information about structures of HIV-1 proteinsNCBI/RefSeq & RCSB/PDB
ref_host_proteinList of HIV-1 host proteins, their sequences, and whether or not the protein is required for HIV-1 replicationNCBI/RefSeq Literature
ref_host_protein_bio_mol_go_termList of term types used in HIV-1 databases GeneOntology AmiGO
ref_isolate_hiv_protein_sequenceList of HIV-1 isolates, their sequences, accession numbers, date and country of infection, patient codes, and source database codeNCBI/Protein & Los Alamos/HIV
ref_retroviridaeList of retroviruses, accession numbers, and links to articlesNCBI/Taxonomy
ref_subtypeList of subtypes of HIV-1 and associated group of subtypeNCBI/Taxonomy
ref_swissprotList of Swissprot IDs and associated gene symbolsUniProt/UniProtKB
ref_swissprot_pdbList of PDB ID?s and corresponding Swissprot IDsUniProt/UniProtKB & RCSB/PDB
MnM databasePredicted minimotifsMinimotif Miner
Drug binding sitesList of calculated chain, residue, distance, and DRM is located in the drug binding information log windowCalculated
Drug Resistant MutationsList of DRM, Drug, Type, and Pubmed linksInternal Database
Immune EpitopesList of ID, sequence, and sourceHIV Immune Epitope Database 2.0

* Sequence features that are multimerization interfaces were calculated in Molmol based on residues that were less than 3.25 Å away from at least one residue in another subunit.

Sequences are from Los Alamos HIV database and National Center for Biotechnology Information (NCBI). Protein-Protein interactions are from the primary literature of HIV Protein-Protein Interaction website hosted by NCI. Protein Structures are from the Protein Data Bank. Sequence functional features were annotated from the primary literature. Domains were from the NCBI/Conserved domain database. Minimotif predictions are from Minimotif Miner.

Figure 13

10. Database statistics
Table 5 - Statistics for data in the HIVToolbox database

Data typeNumber
HIV-1 proteins24
HIV-1 residues3137
HIV-1 protein isolate sequences203,810
HIV-1 protein-protein interactions313
HIV-1 experimental structures621
HIV-1 experimental structure chains1,356
HIV-1 model structures6
HIV-1 model chains34
HIV-1 protein domains49
HIV-1 putative motifs5,312
Experimentally determined motifs198
Host proteins2,096
Required host proteins755
HIV-1 protein functional elements mapped560
HIV-1 isolates with homology data153,000
HIV-1 position specific-scoring matrices104
Major DRMs155
Minor DRMs33
Primary DRMs186
Primary set DRMS368
Beneficial DrMs12
Beneficial set DRMs21
Secondary set DRMs83
Resistance Precursor DRMs1
Ambiguous DRMs274
Total non-ambiguous DRMs671
Sequence features316
Protein-protein interactions1,453
Predicted and known motifs6,373
HIV proteins (processed)24
FDA approved drugs27

11. Glossary
  • HIV: Human immunodeficiency virus. It is a retrovirus that weakens your immune system by destroying important cells that fight disease and infection (More Info)

  • Drug Resistance: mutations in the virus prevent existing HIV drugs from responding to the virus, meaning the drugs are less effective and do not stop the virus from multiplying (More Info)

  • Drug Resistance Mutation (DRM): a mutation or change that occurs in the HIV genome that reduces a drugs ability to work, the mutations confer the resistance to the drugs (More Info)

  • Immune Epitope: the molecular structures recognized by the receptors of the adaptive immune system (More Info)

  • Anti-Retroviral (ARV): drugs that act against the retrovirus HIV and interfere with steps in the replication (More Info)

  • Trans-Activator of Transcription (Tat): a viral regulatory protein that stimulates HIV transcription elongation by directing the host cellular transcriptional elongation factors (More Info)

  • Protease: an enzyme that hydrolyzes or cuts proteins and is important in the final steps of HIV maturation. (More Info)

  • Reverse Transcriptase: an enzyme found in HIV that creates double stranded DNA using viral RNA as a template and host tRNA as primers. (More Info)

  • Subtype: A subgroup of genetically related HIV-1 viruses. HIV-1 can be classified into four groups: M Group, N Group, O Group, and P Group. Viruses within each group can then be further classified by subtype. For example, the HIV-1 M group includes at least nine subtypes: A1, A2, B, C, D, F1, F2, G, H, J, and K. (More Info)

  • Protein Structure: The three dimensional coordinates of the atoms within macromolecules made of protein. (More Info)

  • Integrase: enzymes that mediate unidirectional site-specific recombination between two DNA recognition sequences (More Info)