Home

Awesome

<div align="center"> <img src="imgs/intro.png"><br> <h1>How to Create a Protein</h1> </div>

Learning Objectives

  1. Understand what proteins are, what they do, what they are made of, and how their 3D shape dictates function
  2. General understanding of protein folding and how it is dictated by amino acids properties
  3. Understand the case for protein design, the large design space, and what are its potential applications

Overview

If you prefer a video instead of text, watch this!

<div align="center"> <a href="https://www.youtube.com/watch?v=Am45c83iLg4"> <img src="imgs/video.png" style="width:500px; height:auto;"></a> </div>

Background

Proteins are the architects of life on earth. They perform virtually all chemical reactions within a cell, from DNA replication, cell-to-cell signalling, energy production, and photosynthesis. They vary in physical consistency, as they are present in the shell of turtles, muscle fibres, and elastic tissue.

Proteins are polymers made of 20 building blocks called amino acids, or residues, each with unique structure and chemical properties.

<div align="center"> <img src="imgs/protein.png"> </div>

These amino acids interact to fold into distinct 3D structure:

  1. the primary structure of a protein is simply its linear sequence of amino acids.
  2. The secondary structure arises when backbone atoms begin to interact locally to form $\alpha$-helices (helical) and $\beta$-sheets (flat)
  3. The tertiary structure refers to a protein's 3D shape and arises due to interactions with the side-chains (also R group)
  4. Protein subunits interact (non-covalently) to form the quaternary structure.

Pictured above is Ubiquitin, a protein that contains both $\alpha$ and $\beta$ elements. As the name suggests, it is found virtually everywhere in eukaryotic organisms and performs important functions, such as signalling that a protein is ready for degradation. There are several ubiqutin-binding proteins, for example Vacuolar Protein Sorting-Associated Protein VPS23 (pictured in complex with Ubiquitin), which is involved in transporting proteins inside the cell.

Protein Design

Protein design aims to engineer or redesign proteins for improving stability, acquiring new functions, or increased binding specificity.

For a protein with 200 residues, there exists 2^260 (20^200) possible sequences. This number is larger than the number of people alive (8^9), the number of people that ever lived (1x10^11), and the number of atoms in the universe (1x10^80). Combined. Therefore, a large portion of the protein universe is waiting to be explored that could help improve current protein-drugs, design antibodies to help us cure disease, or creating new materials.

Typically, designers are interested in:

  1. Generating 3D shapes with specific functions.
  2. Generating the sequence of amino acids that folds into the desired 3D structur.

In this course, you will create new proteins by combinding fragments. In a real design setting, you might instead use tools like RFDiffusion to generate 3D shapes, or TIMED to generate sequences.

Timetable

ActivityDuration (mins)Description
Introduction5Brief introduction to the course
Speed Friending Ice Breaker20Students chat for 3 minutes - 5 times. Each person shares something about themselves and the people they met.
How to Create a Protein25Introduction to Proteins and Protein Design
BREAK10BREAK
Design your Protein60See section "Designing Proteins"

Designing Proteins

Goals

  1. Have fun (not optional)
  2. Create a cool protein (option 1)
    1. Requirements:
      1. It has to be a combination of more than one designed units, meaning using Chroma to create a protein that looks like a letter is not enough.
  3. Create sequences that fold into these structures (option 2).
    • Requirements:
      1. Use at least 2 different proteins and a loop
      2. Cannot use sequences from the example structure or any structure in the same CATH group
      3. Cannot use Chroma or inverse folding software (e.g. TIMED, ProteinMPNN)
PictureExample PDBCATH NameFold TypeDifficulty
HELIX-LOOP-HELIX--$\alpha$1
SHEET-LOOP-SHEET--$\beta$1
<img src="imgs/3u7u.png" style="height:200px;">3U7URibbonMainly $\beta$1
(HELIX-LOOP-HELIX)4--$\alpha$2
(SHEET-LOOP-SHEET)4--$\beta$2
<img src="imgs/3RO3.png" style="height:200px;">3RO3Alpha HorseshoeMainly $\alpha$2
HELIX-LOOP-SHEET-LOOP-SHEET-LOOP-HELIX <br> <img src="imgs/1bnh_small.png" style="height:200px;">1BNH-$\alpha\beta$2
<img src="imgs/9ANT.png" style="height:200px;">9ANTOrthogonal BundleMainly $\alpha$3
<img src="imgs/1ten.png" style="height:200px;">1TENBeta SandwichMainly $\beta$3
<img src="imgs/1bnh_big.png" style="height:200px;">1BNHalpha/beta horseshoe$\alpha\beta$4

NB: In the table above, the sequence always starts from blue to red.

Approach 1: Frankeinstein

  1. Go to https://www.rcsb.org
  2. Click the number of structures available
  3. Scroll to find structures that interest you. You can try keywords like "Enzyme", "Bacteria", "Human"
  4. Click on the structure of interest
  5. Click on "Structure" to open the 3D structure
  6. Identify the 3D area that you are interested in, for example, this helix
  7. To copy that helix we will need to extract the amino acid sequence
    1. Option 1: You can do this manually
    2. Option 2: Click on "Download Files" and download the FASTA Sequence
      1. You can open this with a text editor and copy and paste
  8. Copy the extracted sequence to a text file (text editor, word, etc.). For example, in this case I have the sequence "RHPGNFGADAQGAMNKALELFRKDIAAKYKELGY"
  9. Repeat to find another structure of interest.
  10. Combine the structures creatively. For example, to create a HELIX-LOOP-HELIX, you could do:
    1. Helix 1: "RHPGNFGADAQGAMNKALELFRKDIAAKYKELGY"
    2. Loop: "GGGGS"see the Loops section below
    3. Helix 2: "RHPGNFGADAQGAMNKALELFRKDIAAKYKELGY"
  11. Now you'll have a full sequence: "RHPGNFGADAQGAMNKALELFRKDIAAKYKELGYGGGGSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGY"
  12. Use AlphaFold3 (limit to 20 structures per day) or ESMFold (no limits) to fold the sequence. Did you get what you were expecting? If not, can you think of a reason why?

Approach 2: Chroma

  1. Go to https://colab.research.google.com/github/generatebio/chroma/blob/main/notebooks/ChromaDemo.ipynb
  2. Click on "Get API Key" 1.
  3. Agree and add your the contact information
    1. <img src="imgs/Screenshot%202024-07-15%20at%2010.05.26.png" style="width:600px; height:auto;">
  4. Copy the Token <img src="imgs/Screenshot%202024-07-15%20at%2010.06.02.png" style="width:600px; height:auto;">
  5. Paste it and run the cell by Clicking on the "Play" Button (or Shift+Enter)
  6. Explore different options available!

Approach 3: Combine!

Use both approaches to come up with something cool!

Loops

Loops are connecting elements between parts of proteins. As you've seen from the presentation, these can have several conformations. You can use sequences such as these as connectors between your $\alpha$-helices and $\beta$-sheets (Source: https://doi.org/10.1007/s00253-015-6985-3)

3D Printing

Once you have created your .pdb file with , you can export it to .stl using PyMol (free for academic use). To do this:

  1. Open PyMol
  2. Open your .pdb file
  3. In the menu on the right, click the "H" under the "All" object
  4. Select "everything" to hide everything
    1. <img src="imgs/Screenshot%202024-07-15%20at%2010.26.23.png" style="width:300px; height:auto;">
  5. Your protein is now hidden as shown below:
  6. In the menu on the right, click on "S" and select "Surface"
    1. <img src="imgs/Screenshot%202024-07-15%20at%2010.26.42.png" style="width:300px; height:auto;">
  7. Your protein will look like this:
  8. Click on File > Export Image As > STL
    1. <img src="imgs/Screenshot%202024-07-15%20at%2010.26.59.png" style="width:300px; height:auto;">
  9. Save the file in a convenient location

STL files can then be opened in slicers (PrusaSlicer, Ultimaker Cura...) for 3D printing. Make sure to use supports. For better results, use PVA (soluble) supports.

Hall of Fame

Have you followed this tutorial and designed your protein? Send us the .pdb file by opening an issue or by emailing Leo at name.lastname@ed.ac.uk (name = leonardo, lastname=castorina).

FAQ

Q: How can I learn more about Protein Design and AI?

A: The video mentioned in the overview section is a good introduction to proteins. Then you could watch my TEDx talk or listen to the The Digital Twin podcast episode (also on Spotify).

For a more detailed read, you might want to have a look at the Code Repository of TIMED, read the blogpost How to Solve the Protein Folding Problem: AlphaFold2 ), or read the paper.

Q: What should I study to get into this field?

A: The protein design problem is very hard and multidisciplinary problem. Therefore, there is no one "degree" that is right or wrong. I would generally recommend STEM subjects such as Biochemistry, Computer Science, Mathematics, Physics/Engineering etc...

In my opinion however, it is easier to talk about skills that you need:

Generally, PhDs are very common in research, however, you can have a feel of what research is like in Undergrad (internships/dissertation), Postgrad (MSc / MScR ie. Master by Research).

Q: Right... Then, what did you do to get into this field?

A: I studied Biochemistry (BSc Hons) at the University of Edinburgh. I then continued on with Biomedical AI CDT to do my MScR and (now) PhD.

You can have a look at my CV here.

Q: What grades do you need to get into this field?

A: Grades don't matter as much as you think. I myself was not an excellent student at university and have in fact failed courses that were supposed to be super easy. The main point here is that grades are (in theory) an indication of how much time you have spent studying and understanding a subject. However, in the real world, problems come in all sorts of way or form and may not look like anything you have seen before.

What is important is that you try your best and find something you're passionate about. The name of the university does not matter much.

Q: What advice would you give to someone starting university?

A: First, don't be afraid to ask for help. It's ok not to be okay and don't be afraid to talk about your feelings. Then, make friends and meet people. Have fun and enjoy the good parts of university.

I wish I had spent less time trying to optimise my performance to get the grade I wanted, and more time enjoying learning. In the grand schemes of things, the most important skills to have are curiosity and motivation.

Credits