Home

Awesome

This README describes the scripts used for the sequence analysis in:
Major antigenic site B of human influenza H3N2 viruses has an evolving local fitness landscape

ANALYSIS FOR H3N2 ANTIGENIC SITE B DEEP MUTATIONAL SCANNING

This study aims to understand how the local fitness landscape of antigenic site B in human H3N2 HA evolves in the past 50 years. The repository here describes the analysis for the deep mutational scanning experiment that focuses on HA1 residues 156, 158, 159, 190, 193, 196 in six different genetic backgrounds, namely A/Hong Kong/1/1968 (HK68), A/Bangkok/1/1979 (Bk79), A/Beijing/353/1989 (Bei89), A/Moscow/10/1999 (Mos99), A/Brisbane/10/2007 (Bris07), and A/North Dakota/26/2016 (NDako16).

REQUIREMENTS

INPUT FILE

ANALYSIS PIPELINE

  1. ./script/EpiB_fastq_to_fitness.py: Converts raw reads to variant counts and fitness measures.
  2. ./script/EpiB_clean_mut.py: Filter mutants of interest
    • Input files:
      • result/EpiB_MultiMutLib_*.tsv
    • Output files:
      • result/EpiB_Index_*.tsv
  3. ./script/combine_data.jl: Re-calculate mutant fitness. Written by Jakub Otwinowski
  4. ./script/EpiB_fit_to_pref.py: Preference normalization
  5. ./script/EpiB_PrefEvol.py: Amino acid sequences of HA residues 156, 158, 159, 190, 193, and 196 in naturally occurring strains were extracted
  6. ./script/EpiB_AnalyzeParam.py: Classify the parameters for the additive fitness effect and pairwise epistatic effect into "positive" or "negative" based on the 95% confidence interval
  7. ./script/EpiB_seq_comparison.py: Compute the pairwise sequence identities among strains

PLOTTING

  1. ./script/Plot_CompareRep.R: Compare mutant fitness (i.e. enrichment ratio) from replicates
  2. ./script/Plot_CompareLib.R: Compare mutant fitness from different genetic backgrounds
  3. ./script/EpiB_SeqLogGen.py: Generate sequence logo based on mutant preference
  4. ./script/EpiB_network.py: Plot fitness landscape (network graph)
  5. ./script/Plot_TrackPref.R: Plot the normalized preference of naturally occurring sequences in different genetic backgrounds
  6. ./script/Plot_Inf_class_summary.R: Plot the distribution of "positive", "negative", and "mixed" parameters as pie charts
  7. ./script/Plot_Inf_heatmap_overall.R: Plot heatmap summarizing the number of "positive" and "negative" parameters among all six genetic backgrounds of interest
  8. ./script/Plot_Inf_heatmap_specific.R: Plot heatmap showing the number of "positive" and "negative" parameters in each of the six genetic backgrounds of interest
  9. ./script/Plot_seq_dist.R: Plot the relationship between pairwise correlation of fitness landscape and pairwise sequence identity
  10. ./script/Plot_TrackFreq.R: Plot the frequency of different haplotypes over time
  11. ./script/Plot_KLdist.R: Plot the relationship between KL distance and preference in different genetic backgrounds