As most labs have been shut down due to COVID-19, it has been pretty hard to do research the past few months. Instead, I have been using this time for reading, writing, and familiarising myself with online bioinformatics software. Normally I get a bit scared when I hear about computational biology or chemistry, largely because the world of coding and modelling is still quite foreign to me. So it has come as a relief to find online bioinformatics software that is easy to use and interpret. I thought that it would be a good idea to run through some of the things I have been using recently, the databases they rely on and explain how to use them and interpret their results.
The Protein Data Bank (PDB) is a phenomenal database of over 160,000 protein crystal structures and all the bioinformatic software that I have used relies heavily on this great tool. The PDB is easily searchable and its greatest limitation is that you have to hope someone has determined the structure of your target protein. The PDB stores protein structure entries as 4 figure code, for example the SARS-CoV-2 spike protein is stored under 6VXX and this is its PDB ID. On the structure’s page you can then get lots of information about your target protein including, the year it was determined, the paper associated with it, its crystal score, its UniProt ID (more on this later) and perhaps coolest of all, the 3D view of your protein. The 3D viewer is great for understanding more about the actual structure of the protein you are interested in. You will be able to see the alpha helices, beta sheets, hydrogen bonds, bound cofactors etc. You can change the view from cartoon to molecular surface which is a great way to visualise active sites and deep binding pockets (which is especially nice if the structure comes bound with a ligand, you will see exactly how it fits in). The PDB is a great starting place for protein based bioinformatics projects.
UniProt is the second bioinformatics tool on the list and like the PDB, it is absolutely massive. There are so many functions and uses for UniProt and I only know some, so hopefully I can do it justice. UniProt will provide you with a wealth of information about your protein of interest, provided the information exists. When I do enzymatic studies, I use bovine chymotrypsin, and you can see its UniProt entry here. On the page you can see that there is information such as its active site residues, where it acts in relation to the cell, post-translational modifications, multiple solved protein structures and the amino acid sequence. So obviously this is a quick and simple way to see what your enzyme does, what it acts on and where it does this. UniProt also has an align tool which essentially lines up proteins of your choice and tells you what residues are conserved, and which ones are biologically similar (pretty handy for approximating active sites). The align tool also draws a phylogenetic tree which is kind of cool, but I haven’t had a use for it yet. All in all it is great for getting more information about your protein of interest.
- PoPMuSiC and HoTMuSiC
PoPMuSiC and HoTMuSiC are two pieces of protein stability prediction software which are incredibly useful beyond being brilliantly named. Both pieces of software are available as webservers at this link. PoPMuSiC is a protein stability predictor in terms of changes to the free energy of folding (ΔGF)upon point mutation, whereas HoTMuSiC predicts changes to protein melting temperature (ΔTm) upon point mutation. To use the software you have to enter the PDB ID for your target protein (as it’s a structure based predictor), you can then select to either get the results from a single, user defined mutation, or you can do a systematic run which provides you with the stability changes for each amino acid residue replaced by the 19 other possibilities. As predicting greater stability was the original aim for the software, it also produces a handy graphical output of the sum of the stabilising mutations at each respective residue. If you happen to know the melting temperature of your protein you can enter it into the HoTMuSiC software, which slightly increases the accuracy of the results. Some of the papers from the makers of the software can found here, here and here so that you can see some of the work you can do with the software.
SCooP is another piece of protein stability software, made by the same makers of PoPMuSiC and HoTMuSiC. SCooP is different in that it will provide you with a predicted protein stability curve along with values for the upper melting temperature, the free energy of folding (at room temp), the heat capacity and the enthalpy change. What I really like about SCooP is that it is very visual, I really like being able to see something tangible and it is extremely intuitive. Here is a little demonstration of how it works which you can try for yourself using carbonic anhydrase as our example protein. Go to the site and enter these two PDB IDs, 4COQ and 5HPJ. These two proteins are homologous but come from different organisms. From the results see if you can tell which protein comes from an organism which lives in the cold depths of the ocean, and which one lives beside boiling hot water. You will see for yourself that SCooP is a great tool and the paper for it can be found here.
ProteinVolume is a bioinformatics tool which allows you to calculate the total volume, Van der Waals volume, void volume and packing density of your target protein. This is a great piece of software which is again incredibly easy to use. All you need to do is upload a PDB file to the server click go and wait for your results. The makers of the software have published some interesting pieces of work using the software which you can find here, here, and here.
This is just a short list of the bioinformatics tools and software that I have been using during the lockdown. There is plenty more out there, and websites such as OmicX and bio.tools are great resources for finding the right tool for your project.
I don’t think I’ve been totally converted to the ways of bioinformatics as I can’t wait to get back into the lab. I am however excited to keep up to date with bioinformatics as a field as I think the tools that come from it can really supplement my work and will hopefully provide directions for future experimental work and maybe even collaborations.