Plotting functions in PIASO#

This notebook will demonstrate the different plotting functions and their application to sample data.

[1]:
path = '/home/vas744/Analysis/Python/Packages/PIASO'
import sys
sys.path.append(path)
[2]:
import piaso
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
import warnings
/home/vas744/.local/lib/python3.9/site-packages/networkx/utils/backends.py:135: RuntimeWarning: networkx backend defined more than once: nx-loopback
  backends.update(_get_backends("networkx.backends"))
[3]:
sc.set_figure_params(dpi=80,dpi_save=300, color_map='viridis',facecolor='white')
from matplotlib import rcParams
rcParams['figure.figsize'] = 4, 4
[4]:
warnings.simplefilter(action='ignore', category=FutureWarning)

Load the data#

Let us load a 20k subsampled version of the Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD) project dataset described in detail in Gabitto et. al. (2024). We will be using the scRNA-seq data from the dataset in this tutorial.

Download the subsampled dataset from Google Drive: https://drive.google.com/file/d/1EdRA0ECvPlEnNaOzKqj19GrEtucB7hmE/view?usp=drive_link

The original data is available on https://portal.brain-map.org/explore/seattle-alzheimers-disease.

[5]:
!/home/vas744/Software/gdrive files download --overwrite --destination /n/scratch/users/v/vas744/Data/Public/PIASO 1EdRA0ECvPlEnNaOzKqj19GrEtucB7hmE
Downloading SEA-AD_RNA_MTG_subsample_excludeReference_20k_piaso.h5ad
Successfully downloaded SEA-AD_RNA_MTG_subsample_excludeReference_20k_piaso.h5ad
[6]:
adata=sc.read('/n/scratch/users/v/vas744/Data/Public/PIASO/SEA-AD_RNA_MTG_subsample_excludeReference_20k_piaso.h5ad')
[7]:
adata
[7]:
AnnData object with n_obs × n_vars = 20000 × 36601
    obs: 'sample_id', 'Neurotypical reference', 'Donor ID', 'Organism', 'Brain Region', 'Sex', 'Gender', 'Age at Death', 'Race (choice=White)', 'Race (choice=Black/ African American)', 'Race (choice=Asian)', 'Race (choice=American Indian/ Alaska Native)', 'Race (choice=Native Hawaiian or Pacific Islander)', 'Race (choice=Unknown or unreported)', 'Race (choice=Other)', 'specify other race', 'Hispanic/Latino', 'Highest level of education', 'Years of education', 'PMI', 'Fresh Brain Weight', 'Brain pH', 'Overall AD neuropathological Change', 'Thal', 'Braak', 'CERAD score', 'Overall CAA Score', 'Highest Lewy Body Disease', 'Total Microinfarcts (not observed grossly)', 'Total microinfarcts in screening sections', 'Atherosclerosis', 'Arteriolosclerosis', 'LATE', 'Cognitive Status', 'Last CASI Score', 'Interval from last CASI in months', 'Last MMSE Score', 'Interval from last MMSE in months', 'Last MOCA Score', 'Interval from last MOCA in months', 'APOE Genotype', 'Primary Study Name', 'Secondary Study Name', 'NeuN positive fraction on FANS', 'RIN', 'cell_prep_type', 'facs_population_plan', 'rna_amplification', 'sample_name', 'sample_quantity_count', 'expc_cell_capture', 'method', 'pcr_cycles', 'percent_cdna_longer_than_400bp', 'rna_amplification_pass_fail', 'amplified_quantity_ng', 'load_name', 'library_prep', 'library_input_ng', 'r1_index', 'avg_size_bp', 'quantification_fmol', 'library_prep_pass_fail', 'exp_component_vendor_name', 'batch_vendor_name', 'experiment_component_failed', 'alignment', 'Genome', 'ar_id', 'bc', 'GEX_Estimated_number_of_cells', 'GEX_number_of_reads', 'GEX_sequencing_saturation', 'GEX_Mean_raw_reads_per_cell', 'GEX_Q30_bases_in_barcode', 'GEX_Q30_bases_in_read_2', 'GEX_Q30_bases_in_UMI', 'GEX_Percent_duplicates', 'GEX_Q30_bases_in_sample_index_i1', 'GEX_Q30_bases_in_sample_index_i2', 'GEX_Reads_with_TSO', 'GEX_Sequenced_read_pairs', 'GEX_Valid_UMIs', 'GEX_Valid_barcodes', 'GEX_Reads_mapped_to_genome', 'GEX_Reads_mapped_confidently_to_genome', 'GEX_Reads_mapped_confidently_to_intergenic_regions', 'GEX_Reads_mapped_confidently_to_intronic_regions', 'GEX_Reads_mapped_confidently_to_exonic_regions', 'GEX_Reads_mapped_confidently_to_transcriptome', 'GEX_Reads_mapped_antisense_to_gene', 'GEX_Fraction_of_transcriptomic_reads_in_cells', 'GEX_Total_genes_detected', 'GEX_Median_UMI_counts_per_cell', 'GEX_Median_genes_per_cell', 'Multiome_Feature_linkages_detected', 'Multiome_Linked_genes', 'Multiome_Linked_peaks', 'ATAC_Confidently_mapped_read_pairs', 'ATAC_Fraction_of_genome_in_peaks', 'ATAC_Fraction_of_high_quality_fragments_in_cells', 'ATAC_Fraction_of_high_quality_fragments_overlapping_TSS', 'ATAC_Fraction_of_high_quality_fragments_overlapping_peaks', 'ATAC_Fraction_of_transposition_events_in_peaks_in_cells', 'ATAC_Mean_raw_read_pairs_per_cell', 'ATAC_Median_high_quality_fragments_per_cell', 'ATAC_Non-nuclear_read_pairs', 'ATAC_Number_of_peaks', 'ATAC_Percent_duplicates', 'ATAC_Q30_bases_in_barcode', 'ATAC_Q30_bases_in_read_1', 'ATAC_Q30_bases_in_read_2', 'ATAC_Q30_bases_in_sample_index_i1', 'ATAC_Sequenced_read_pairs', 'ATAC_TSS_enrichment_score', 'ATAC_Unmapped_read_pairs', 'ATAC_Valid_barcodes', 'Number of mapped reads', 'Number of unmapped reads', 'Number of multimapped reads', 'Number of reads', 'Number of UMIs', 'Genes detected', 'Doublet score', 'Fraction mitochondrial UMIs', 'Used in analysis', 'Class confidence', 'Class', 'Subclass confidence', 'Subclass', 'Supertype confidence', 'Supertype (non-expanded)', 'Supertype', 'Continuous Pseudo-progression Score', 'Severely Affected Donor'
    var: 'gene_ids'
    uns: 'APOE4 Status_colors', 'Braak_colors', 'CERAD score_colors', 'Cognitive Status_colors', 'Great Apes Metadata', 'Highest Lewy Body Disease_colors', 'LATE_colors', 'Overall AD neuropathological Change_colors', 'Sex_colors', 'Subclass_colors', 'Supertype_colors', 'Thal_colors', 'UW Clinical Metadata', 'X_normalization', 'batch_condition', 'default_embedding', 'neighbors', 'title', 'umap'
    obsm: 'X_scVI', 'X_umap'
    layers: 'UMIs'
    obsp: 'connectivities', 'distances'

Visualize with a discrete color map#

The discrete color map can be used to visualize the categorical variables such as cell subclass.

[8]:
sc.pl.embedding(adata,
                basis='X_umap',
                color=['Subclass'],
                palette=piaso.pl.color.d_color3,
                legend_fontoutline=2,
                legend_fontweight=5,
                cmap='Spectral_r',
                size=10,
                frameon=False)
../_images/notebooks_01_showThePlotFunction_12_0.png
[9]:
sc.pl.embedding(adata,
                basis='X_umap',
                color=['Subclass'],
                palette=piaso.pl.color.d_color3,
                legend_fontoutline=2,
                legend_fontsize=7,
                legend_fontweight=5,
                legend_loc='on data',
                cmap='Spectral_r',
                size=10,
                frameon=False)
../_images/notebooks_01_showThePlotFunction_13_0.png

Visualize with a continuous color map#

The continuous color map can be used to visualize the continuous variables such as gene expression.

[10]:
sc.pl.umap(adata,
           color=['GAD1', 'PVALB', 'SST', 'PDGFRA'],
           cmap=piaso.pl.color.c_color1,
           legend_fontsize=10,
           legend_fontoutline=3,
           ncols=2,
           size=30,
           frameon=False)
../_images/notebooks_01_showThePlotFunction_15_0.png
[11]:
sc.pl.umap(adata,
           color=['GAD1', 'PVALB', 'SST', 'PDGFRA'],
           cmap=piaso.pl.color.c_color4,
           legend_fontsize=10,
           legend_fontoutline=3,
           ncols=2,
           size=30,
           frameon=False)
../_images/notebooks_01_showThePlotFunction_16_0.png

Split the UMAP by condition#

[12]:
mapping_dict=dict(zip(adata.obs['CERAD score'], adata.obs['CERAD score']))
[13]:
mapping_dict={'Absent': 'Absent',
 'Sparse': 'Disease',
 'Moderate': 'Disease',
 'Frequent': 'Disease'}
[14]:
adata.obs['Condition']=adata.obs['CERAD score'].map(mapping_dict)
[15]:
sc.pl.embedding(adata,
                basis='X_umap',
                color=['Condition'],
                palette=piaso.pl.color.d_color1,
                legend_fontoutline=2,
                legend_fontweight=5,
                cmap='Spectral_r',
                size=10,
                frameon=False)
../_images/notebooks_01_showThePlotFunction_21_0.png

Visualizing the expression of a specific gene across two different experimental conditions#

[16]:
piaso.pl.plot_embeddings_split(adata,
                               color='ADGRG7',
                               layer=None,
                               splitby='Condition',
                               color_map=piaso.pl.color.c_color1,
                               size=100,
                               frameon=False,
                               legend_loc=None,)
../_images/notebooks_01_showThePlotFunction_23_0.png
[17]:
piaso.pl.plot_embeddings_split(adata,
                               color='FEZF2',
                               layer=None,
                               splitby='Condition',
                               color_map=piaso.pl.color.c_color1,
                               size=100,
                               frameon=False,
                               legend_loc=None,)
../_images/notebooks_01_showThePlotFunction_24_0.png

Visulaizing continuous variables (Number of UMIs) across different experimental conditions#

[18]:
piaso.pl.plot_embeddings_split(adata,
                               color='Number of UMIs',
                               layer=None,
                               splitby='Condition',
                               color_map=piaso.pl.color.c_color4,
                               size=100,
                               frameon=False,
                               legend_loc=None)
../_images/notebooks_01_showThePlotFunction_26_0.png

Visualizing categorical variables (Cell subclass) across different experimental conditions#

[19]:
piaso.pl.plot_embeddings_split(adata,
                               color='Subclass',
                               layer=None,
                               splitby='Condition',
                               size=100,
                               frameon=False,)
../_images/notebooks_01_showThePlotFunction_28_0.png

We can visualize these cell subclasses alongside the subclass legend. To enhance clarity, the ncol parameter can be used to adjust the number of subplots per row.

[20]:
piaso.pl.plot_embeddings_split(adata,
                               color='Subclass',
                               layer=None,
                               splitby='Condition',
                               size=100,
                               frameon=False,
                               ncol=1)
../_images/notebooks_01_showThePlotFunction_30_0.png
[21]:
piaso.pl.plot_embeddings_split(adata,
                               color='Subclass',
                               layer=None,
                               splitby='Condition',
                               size=100,
                               frameon=False,
                               legend_fontsize=7,
                               legend_fontoutline=2,
                               legend_fontweight=5,
                               legend_loc='on data')
../_images/notebooks_01_showThePlotFunction_31_0.png

Violin plots of grouped data#

Visulaize how the groups in the dataset vary across a list of different features. Here, you can plot a different subplot for each feature of interest.

[22]:
piaso.pl.plot_features_violin(adata,
                              feature_list=[ 'CALB2', 'GAD2', 'SATB2', 'TBR1', 'Number of UMIs'],
                              width_single=8,
                              height_single=2.3,
                              groupby='Subclass',
                              show_grid=False)
../_images/notebooks_01_showThePlotFunction_33_0.png

Adding the horizontal grid line

[23]:
piaso.pl.plot_features_violin(adata,
                              feature_list=[ 'CALB2', 'GAD2', 'SATB2', 'TBR1', 'Number of UMIs'],
                              width_single=8,
                              height_single=2.3,
                              groupby='Subclass')
../_images/notebooks_01_showThePlotFunction_35_0.png

The height and widhts of these plots can be adjusted according to the number of features and groups you have for better visualization.

[24]:
piaso.pl.plot_features_violin(adata,
                              feature_list=[ 'CALB2', 'GAD2', 'SATB2', 'TBR1', 'Number of UMIs'],
                              width_single=20,
                              height_single=3,
                              groupby='Subclass',
                              show_grid=False)
../_images/notebooks_01_showThePlotFunction_37_0.png

Violin plots of entire data#

We can visualize the entire dataset as a single group, adjusting the dot size to plot all the cells onto the violin plot.

[25]:
piaso.pl.plot_features_violin(adata,
                              feature_list=[ 'Genes detected'],
                              width_single=4,
                              height_single=2.0,
                              size=1)
../_images/notebooks_01_showThePlotFunction_40_0.png

We can visualize the distributions of the cells across various features together.

[26]:
piaso.pl.plot_features_violin(adata,
                              feature_list=[ 'MALAT1', 'GAD2', 'Number of UMIs'],
                              width_single=4,
                              height_single=2.3,
                              size=0.5,
                              show_grid=False)
../_images/notebooks_01_showThePlotFunction_42_0.png

Save output plots#

Use the save parameter to specific the file name and path to save:

[27]:
piaso.pl.plot_embeddings_split(adata,
                               color='ADGRG7',
                               layer=None,
                               splitby='Condition',
                               color_map=piaso.pl.color.c_color1,
                               size=100,
                               frameon=False,
                               legend_loc=None,
                               save='./PIASO_UMAP_split_by_condition.pdf')
../_images/notebooks_01_showThePlotFunction_44_0.png
Figure saved to:  ./PIASO_UMAP_split_by_condition.pdf
[28]:
piaso.pl.plot_features_violin(adata,
                              feature_list=[ 'CALB2', 'GAD2', 'SATB2', 'TBR1', 'Number of UMIs'],
                              width_single=8,
                              height_single=2.3,
                              groupby='Subclass',
                              show_grid=False,
                              save='./violin_plot_piaso.pdf')
../_images/notebooks_01_showThePlotFunction_45_0.png
Figure saved to: ./violin_plot_piaso.pdf

Combine categorical variables#

[29]:
mapping_dict={'Absent': 'Control',
 'Sparse': 'Disease',
 'Moderate': 'Disease',
 'Frequent': 'Disease'}
[30]:
adata.obs['Condition']=adata.obs['CERAD score'].map(mapping_dict)
[31]:
adata.obs['SubclassXCondition'] = piaso.pp.getCrossCategories(adata.obs, 'Subclass', 'Condition', )

By combining multiple categories, while maintaining the original orders of the categories, you could show the gene expression changes at cell type level across different conditions.

[32]:
sc.pl.dotplot(
    adata,
    var_names=['SERPINE1'],
    groupby='SubclassXCondition',
    swap_axes=True,
    cmap=piaso.pl.color.c_color4
)
../_images/notebooks_01_showThePlotFunction_51_0.png

We can also switch the order in which we combine the categorical variables.

[33]:
adata.obs['ConditionXSubclass'] = piaso.pp.getCrossCategories(adata.obs, 'Condition', 'Subclass',)
[34]:
sc.pl.dotplot(
    adata,
    var_names=['SERPINE1'],
    groupby='ConditionXSubclass',
    swap_axes=True,
    cmap=piaso.pl.color.c_color4
)
../_images/notebooks_01_showThePlotFunction_54_0.png

Count the number of cells in each category#

[35]:
piaso.pp.table(adata.obs['Subclass'])
[35]:
{'Microglia-PVM': 638,
 'Lamp5': 653,
 'OPC': 496,
 'L5/6 NP': 299,
 'Oligodendrocyte': 1663,
 'L2/3 IT': 4826,
 'Vip': 1551,
 'Lamp5 Lhx6': 305,
 'L5 IT': 1797,
 'L4 IT': 2388,
 'Pvalb': 1299,
 'Astrocyte': 1080,
 'L6 IT': 605,
 'Sst': 780,
 'L6 IT Car3': 354,
 'VLMC': 86,
 'L6b': 224,
 'Pax6': 140,
 'L6 CT': 231,
 'Sncg': 329,
 'L5 ET': 37,
 'Chandelier': 157,
 'Sst Chodl': 31,
 'Endothelial': 31}

You can use the rank parameter to order the results, and the ascending parameter to determine the order of the ranking.

[36]:
piaso.pp.table(adata.obs['Subclass'], rank = True, ascending = True)
[36]:
{'Sst Chodl': 31,
 'Endothelial': 31,
 'L5 ET': 37,
 'VLMC': 86,
 'Pax6': 140,
 'Chandelier': 157,
 'L6b': 224,
 'L6 CT': 231,
 'L5/6 NP': 299,
 'Lamp5 Lhx6': 305,
 'Sncg': 329,
 'L6 IT Car3': 354,
 'OPC': 496,
 'L6 IT': 605,
 'Microglia-PVM': 638,
 'Lamp5': 653,
 'Sst': 780,
 'Astrocyte': 1080,
 'Pvalb': 1299,
 'Vip': 1551,
 'Oligodendrocyte': 1663,
 'L5 IT': 1797,
 'L4 IT': 2388,
 'L2/3 IT': 4826}
[37]:
piaso.pp.table(adata.obs['Subclass'], rank = True, ascending = False)
[37]:
{'L2/3 IT': 4826,
 'L4 IT': 2388,
 'L5 IT': 1797,
 'Oligodendrocyte': 1663,
 'Vip': 1551,
 'Pvalb': 1299,
 'Astrocyte': 1080,
 'Sst': 780,
 'Lamp5': 653,
 'Microglia-PVM': 638,
 'L6 IT': 605,
 'OPC': 496,
 'L6 IT Car3': 354,
 'Sncg': 329,
 'Lamp5 Lhx6': 305,
 'L5/6 NP': 299,
 'L6 CT': 231,
 'L6b': 224,
 'Chandelier': 157,
 'Pax6': 140,
 'VLMC': 86,
 'L5 ET': 37,
 'Sst Chodl': 31,
 'Endothelial': 31}

In order to store the results as a dataframe, set as_dataframe = True.

[38]:
subclass_df = piaso.pp.table(adata.obs['Subclass'], rank = True, ascending = False, as_dataframe = True)