zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Fork 0

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

24 KiB

Raw Blame History

Seaborn Function Reference

This document provides a comprehensive reference for all major seaborn functions, organized by category.

Relational Plots

scatterplot()

Purpose: Create a scatter plot with points representing individual observations.

Key Parameters:

data - DataFrame, array, or dict of arrays
x, y - Variables for x and y axes
hue - Grouping variable for color encoding
size - Grouping variable for size encoding
style - Grouping variable for marker style
palette - Color palette name or list
hue_order - Order for categorical hue levels
hue_norm - Normalization for numeric hue (tuple or Normalize object)
sizes - Size range for size encoding (tuple or dict)
size_order - Order for categorical size levels
size_norm - Normalization for numeric size
markers - Marker style(s) (string, list, or dict)
style_order - Order for categorical style levels
legend - How to draw legend: "auto", "brief", "full", or False
ax - Matplotlib axes to plot on

Example:

sns.scatterplot(data=df, x='height', y='weight',
                hue='gender', size='age', style='smoker',
                palette='Set2', sizes=(20, 200))

lineplot()

Purpose: Draw a line plot with automatic aggregation and confidence intervals for repeated measures.

Key Parameters:

data - DataFrame, array, or dict of arrays
x, y - Variables for x and y axes
hue - Grouping variable for color encoding
size - Grouping variable for line width
style - Grouping variable for line style (dashes)
units - Grouping variable for sampling units (no aggregation within units)
estimator - Function for aggregating across observations (default: mean)
errorbar - Method for error bars: "sd", "se", "pi", ("ci", level), ("pi", level), or None
n_boot - Number of bootstrap iterations for CI computation
seed - Random seed for reproducible bootstrapping
sort - Sort data before plotting
err_style - "band" or "bars" for error representation
err_kws - Additional parameters for error representation
markers - Marker style(s) for emphasizing data points
dashes - Dash style(s) for lines
legend - How to draw legend
ax - Matplotlib axes to plot on

Example:

sns.lineplot(data=timeseries, x='time', y='signal',
             hue='condition', style='subject',
             errorbar=('ci', 95), markers=True)

relplot()

Purpose: Figure-level interface for drawing relational plots (scatter or line) onto a FacetGrid.

Key Parameters: All parameters from scatterplot() and lineplot(), plus:

kind - "scatter" or "line"
col - Categorical variable for column facets
row - Categorical variable for row facets
col_wrap - Wrap columns after this many columns
col_order - Order for column facet levels
row_order - Order for row facet levels
height - Height of each facet in inches
aspect - Aspect ratio (width = height * aspect)
facet_kws - Additional parameters for FacetGrid

Example:

sns.relplot(data=df, x='time', y='measurement',
            hue='treatment', style='batch',
            col='cell_line', row='timepoint',
            kind='line', height=3, aspect=1.5)

Distribution Plots

histplot()

Purpose: Plot univariate or bivariate histograms with flexible binning.

Key Parameters:

data - DataFrame, array, or dict
x, y - Variables (y optional for bivariate)
hue - Grouping variable
weights - Variable for weighting observations
stat - Aggregate statistic: "count", "frequency", "probability", "percent", "density"
bins - Number of bins, bin edges, or method ("auto", "fd", "doane", "scott", "stone", "rice", "sturges", "sqrt")
binwidth - Width of bins (overrides bins)
binrange - Range for binning (tuple)
discrete - Treat x as discrete (centers bars on values)
cumulative - Compute cumulative distribution
common_bins - Use same bins for all hue levels
common_norm - Normalize across hue levels
multiple - How to handle hue: "layer", "dodge", "stack", "fill"
element - Visual element: "bars", "step", "poly"
fill - Fill bars/elements
shrink - Scale bar width (for multiple="dodge")
kde - Overlay KDE estimate
kde_kws - Parameters for KDE
line_kws - Parameters for step/poly elements
thresh - Minimum count threshold for bins
pthresh - Minimum probability threshold
pmax - Maximum probability for color scaling
log_scale - Log scale for axis (bool or base)
legend - Whether to show legend
ax - Matplotlib axes

Example:

sns.histplot(data=df, x='measurement', hue='condition',
             stat='density', bins=30, kde=True,
             multiple='layer', alpha=0.5)

kdeplot()

Purpose: Plot univariate or bivariate kernel density estimates.

Key Parameters:

data - DataFrame, array, or dict
x, y - Variables (y optional for bivariate)
hue - Grouping variable
weights - Variable for weighting observations
palette - Color palette
hue_order - Order for hue levels
hue_norm - Normalization for numeric hue
multiple - How to handle hue: "layer", "stack", "fill"
common_norm - Normalize across hue levels
common_grid - Use same grid for all hue levels
cumulative - Compute cumulative distribution
bw_method - Method for bandwidth: "scott", "silverman", or scalar
bw_adjust - Bandwidth multiplier (higher = smoother)
log_scale - Log scale for axis
levels - Number or values for contour levels (bivariate)
thresh - Minimum density threshold for contours
gridsize - Grid resolution
cut - Extension beyond data extremes (in bandwidth units)
clip - Data range for curve (tuple)
fill - Fill area under curve/contours
legend - Whether to show legend
ax - Matplotlib axes

Example:

# Univariate
sns.kdeplot(data=df, x='measurement', hue='condition',
            fill=True, common_norm=False, bw_adjust=1.5)

# Bivariate
sns.kdeplot(data=df, x='var1', y='var2',
            fill=True, levels=10, thresh=0.05)

ecdfplot()

Purpose: Plot empirical cumulative distribution functions.

Key Parameters:

data - DataFrame, array, or dict
x, y - Variables (specify one)
hue - Grouping variable
weights - Variable for weighting observations
stat - "proportion" or "count"
complementary - Plot complementary CDF (1 - ECDF)
palette - Color palette
hue_order - Order for hue levels
hue_norm - Normalization for numeric hue
log_scale - Log scale for axis
legend - Whether to show legend
ax - Matplotlib axes

Example:

sns.ecdfplot(data=df, x='response_time', hue='treatment',
             stat='proportion', complementary=False)

rugplot()

Purpose: Plot tick marks showing individual observations along an axis.

Key Parameters:

data - DataFrame, array, or dict
x, y - Variable (specify one)
hue - Grouping variable
height - Height of ticks (proportion of axis)
expand_margins - Add margin space for rug
palette - Color palette
hue_order - Order for hue levels
hue_norm - Normalization for numeric hue
legend - Whether to show legend
ax - Matplotlib axes

Example:

sns.rugplot(data=df, x='value', hue='category', height=0.05)

displot()

Purpose: Figure-level interface for distribution plots onto a FacetGrid.

Key Parameters: All parameters from histplot(), kdeplot(), and ecdfplot(), plus:

kind - "hist", "kde", "ecdf"
rug - Add rug plot on marginal axes
rug_kws - Parameters for rug plot
col - Categorical variable for column facets
row - Categorical variable for row facets
col_wrap - Wrap columns
col_order - Order for column facets
row_order - Order for row facets
height - Height of each facet
aspect - Aspect ratio
facet_kws - Additional parameters for FacetGrid

Example:

sns.displot(data=df, x='measurement', hue='treatment',
            col='timepoint', kind='kde', fill=True,
            height=3, aspect=1.5, rug=True)

jointplot()

Purpose: Draw a bivariate plot with marginal univariate plots.

Key Parameters:

data - DataFrame
x, y - Variables for x and y axes
hue - Grouping variable
kind - "scatter", "kde", "hist", "hex", "reg", "resid"
height - Size of the figure (square)
ratio - Ratio of joint to marginal axes
space - Space between joint and marginal axes
dropna - Drop missing values
xlim, ylim - Axis limits (tuples)
marginal_ticks - Show ticks on marginal axes
joint_kws - Parameters for joint plot
marginal_kws - Parameters for marginal plots
hue_order - Order for hue levels
palette - Color palette

Example:

sns.jointplot(data=df, x='var1', y='var2', hue='group',
              kind='scatter', height=6, ratio=4,
              joint_kws={'alpha': 0.5})

pairplot()

Purpose: Plot pairwise relationships in a dataset.

Key Parameters:

data - DataFrame
hue - Grouping variable for color encoding
hue_order - Order for hue levels
palette - Color palette
vars - Variables to plot (default: all numeric)
x_vars, y_vars - Variables for x and y axes (non-square grid)
kind - "scatter", "kde", "hist", "reg"
diag_kind - "auto", "hist", "kde", None
markers - Marker style(s)
height - Height of each facet
aspect - Aspect ratio
corner - Plot only lower triangle
dropna - Drop missing values
plot_kws - Parameters for non-diagonal plots
diag_kws - Parameters for diagonal plots
grid_kws - Parameters for PairGrid

Example:

sns.pairplot(data=df, hue='species', palette='Set2',
             vars=['sepal_length', 'sepal_width', 'petal_length'],
             corner=True, height=2.5)

Categorical Plots

stripplot()

Purpose: Draw a categorical scatterplot with jittered points.

Key Parameters:

data - DataFrame, array, or dict
x, y - Variables (one categorical, one continuous)
hue - Grouping variable
order - Order for categorical levels
hue_order - Order for hue levels
jitter - Amount of jitter: True, float, or False
dodge - Separate hue levels side-by-side
orient - "v" or "h" (usually inferred)
color - Single color for all elements
palette - Color palette
size - Marker size
edgecolor - Marker edge color
linewidth - Marker edge width
native_scale - Use numeric scale for categorical axis
formatter - Formatter for categorical axis
legend - Whether to show legend
ax - Matplotlib axes

Example:

sns.stripplot(data=df, x='day', y='total_bill',
              hue='sex', dodge=True, jitter=0.2)

swarmplot()

Purpose: Draw a categorical scatterplot with non-overlapping points.

Key Parameters: Same as stripplot(), except:

No jitter parameter
size - Marker size (important for avoiding overlap)
warn_thresh - Threshold for warning about too many points (default: 0.05)

Note: Computationally intensive for large datasets. Use stripplot for >1000 points.

Example:

sns.swarmplot(data=df, x='day', y='total_bill',
              hue='time', dodge=True, size=5)

boxplot()

Purpose: Draw a box plot showing quartiles and outliers.

Key Parameters:

data - DataFrame, array, or dict
x, y - Variables (one categorical, one continuous)
hue - Grouping variable
order - Order for categorical levels
hue_order - Order for hue levels
orient - "v" or "h"
color - Single color for boxes
palette - Color palette
saturation - Color saturation intensity
width - Width of boxes
dodge - Separate hue levels side-by-side
fliersize - Size of outlier markers
linewidth - Box line width
whis - IQR multiplier for whiskers (default: 1.5)
notch - Draw notched boxes
showcaps - Show whisker caps
showmeans - Show mean value
meanprops - Properties for mean marker
boxprops - Properties for boxes
whiskerprops - Properties for whiskers
capprops - Properties for caps
flierprops - Properties for outliers
medianprops - Properties for median line
native_scale - Use numeric scale
formatter - Formatter for categorical axis
legend - Whether to show legend
ax - Matplotlib axes

Example:

sns.boxplot(data=df, x='day', y='total_bill',
            hue='smoker', palette='Set3',
            showmeans=True, notch=True)

violinplot()

Purpose: Draw a violin plot combining boxplot and KDE.

Key Parameters: Same as boxplot(), plus:

bw_method - KDE bandwidth method
bw_adjust - KDE bandwidth multiplier
cut - KDE extension beyond extremes
density_norm - "area", "count", "width"
inner - "box", "quartile", "point", "stick", None
split - Split violins for hue comparison
scale - Scaling method: "area", "count", "width"
scale_hue - Scale across hue levels
gridsize - KDE grid resolution

Example:

sns.violinplot(data=df, x='day', y='total_bill',
               hue='sex', split=True, inner='quartile',
               palette='muted')

boxenplot()

Purpose: Draw enhanced box plot for larger datasets showing more quantiles.

Key Parameters: Same as boxplot(), plus:

k_depth - "tukey", "proportion", "trustworthy", "full", or int
outlier_prop - Proportion of data as outliers
trust_alpha - Alpha for trustworthy depth
showfliers - Show outlier points

Example:

sns.boxenplot(data=df, x='day', y='total_bill',
              hue='time', palette='Set2')

barplot()

Purpose: Draw a bar plot with error bars showing statistical estimates.

Key Parameters:

data - DataFrame, array, or dict
x, y - Variables (one categorical, one continuous)
hue - Grouping variable
order - Order for categorical levels
hue_order - Order for hue levels
estimator - Aggregation function (default: mean)
errorbar - Error representation: "sd", "se", "pi", ("ci", level), ("pi", level), or None
n_boot - Bootstrap iterations
seed - Random seed
units - Identifier for sampling units
weights - Observation weights
orient - "v" or "h"
color - Single bar color
palette - Color palette
saturation - Color saturation
width - Bar width
dodge - Separate hue levels side-by-side
errcolor - Error bar color
errwidth - Error bar line width
capsize - Error bar cap width
native_scale - Use numeric scale
formatter - Formatter for categorical axis
legend - Whether to show legend
ax - Matplotlib axes

Example:

sns.barplot(data=df, x='day', y='total_bill',
            hue='sex', estimator='median',
            errorbar=('ci', 95), capsize=0.1)

countplot()

Purpose: Show counts of observations in each categorical bin.

Key Parameters: Same as barplot(), but:

Only specify one of x or y (the categorical variable)
No estimator or errorbar (shows counts)
stat - "count" or "percent"

Example:

sns.countplot(data=df, x='day', hue='time',
              palette='pastel', dodge=True)

pointplot()

Purpose: Show point estimates and confidence intervals with connecting lines.

Key Parameters: Same as barplot(), plus:

markers - Marker style(s)
linestyles - Line style(s)
scale - Scale for markers
join - Connect points with lines
capsize - Error bar cap width

Example:

sns.pointplot(data=df, x='time', y='total_bill',
              hue='sex', markers=['o', 's'],
              linestyles=['-', '--'], capsize=0.1)

catplot()

Purpose: Figure-level interface for categorical plots onto a FacetGrid.

Key Parameters: All parameters from categorical plots, plus:

kind - "strip", "swarm", "box", "violin", "boxen", "bar", "point", "count"
col - Categorical variable for column facets
row - Categorical variable for row facets
col_wrap - Wrap columns
col_order - Order for column facets
row_order - Order for row facets
height - Height of each facet
aspect - Aspect ratio
sharex, sharey - Share axes across facets
legend - Whether to show legend
legend_out - Place legend outside figure
facet_kws - Additional FacetGrid parameters

Example:

sns.catplot(data=df, x='day', y='total_bill',
            hue='smoker', col='time',
            kind='violin', split=True,
            height=4, aspect=0.8)

Regression Plots

regplot()

Purpose: Plot data and a linear regression fit.

Key Parameters:

data - DataFrame
x, y - Variables or data vectors
x_estimator - Apply estimator to x bins
x_bins - Bin x for estimator
x_ci - CI for binned estimates
scatter - Show scatter points
fit_reg - Plot regression line
ci - CI for regression estimate (int or None)
n_boot - Bootstrap iterations for CI
units - Identifier for sampling units
seed - Random seed
order - Polynomial regression order
logistic - Fit logistic regression
lowess - Fit lowess smoother
robust - Fit robust regression
logx - Log-transform x
x_partial, y_partial - Partial regression (regress out variables)
truncate - Limit regression line to data range
dropna - Drop missing values
x_jitter, y_jitter - Add jitter to data
label - Label for legend
color - Color for all elements
marker - Marker style
scatter_kws - Parameters for scatter
line_kws - Parameters for regression line
ax - Matplotlib axes

Example:

sns.regplot(data=df, x='total_bill', y='tip',
            order=2, robust=True, ci=95,
            scatter_kws={'alpha': 0.5})

lmplot()

Purpose: Figure-level interface for regression plots onto a FacetGrid.

Key Parameters: All parameters from regplot(), plus:

hue - Grouping variable
col - Column facets
row - Row facets
palette - Color palette
col_wrap - Wrap columns
height - Facet height
aspect - Aspect ratio
markers - Marker style(s)
sharex, sharey - Share axes
hue_order - Order for hue levels
col_order - Order for column facets
row_order - Order for row facets
legend - Whether to show legend
legend_out - Place legend outside
facet_kws - FacetGrid parameters

Example:

sns.lmplot(data=df, x='total_bill', y='tip',
           hue='smoker', col='time', row='sex',
           height=3, aspect=1.2, ci=None)

residplot()

Purpose: Plot residuals of a regression.

Key Parameters: Same as regplot(), but:

Always plots residuals (y - predicted) vs x
Adds horizontal line at y=0
lowess - Fit lowess smoother to residuals

Example:

sns.residplot(data=df, x='x', y='y', lowess=True,
              scatter_kws={'alpha': 0.5})

Matrix Plots

heatmap()

Purpose: Plot rectangular data as a color-encoded matrix.

Key Parameters:

data - 2D array-like data
vmin, vmax - Anchor values for colormap
cmap - Colormap name or object
center - Value at colormap center
robust - Use robust quantiles for colormap range
annot - Annotate cells: True, False, or array
fmt - Format string for annotations (e.g., ".2f")
annot_kws - Parameters for annotations
linewidths - Width of cell borders
linecolor - Color of cell borders
cbar - Draw colorbar
cbar_kws - Colorbar parameters
cbar_ax - Axes for colorbar
square - Force square cells
xticklabels, yticklabels - Tick labels (True, False, int, or list)
mask - Boolean array to mask cells
ax - Matplotlib axes

Example:

# Correlation matrix
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f',
            cmap='coolwarm', center=0, square=True,
            linewidths=1, cbar_kws={'shrink': 0.8})

clustermap()

Purpose: Plot a hierarchically-clustered heatmap.

Key Parameters: All parameters from heatmap(), plus:

pivot_kws - Parameters for pivoting (if needed)
method - Linkage method: "single", "complete", "average", "weighted", "centroid", "median", "ward"
metric - Distance metric for clustering
standard_scale - Standardize data: 0 (rows), 1 (columns), or None
z_score - Z-score normalize data: 0 (rows), 1 (columns), or None
row_cluster, col_cluster - Cluster rows/columns
row_linkage, col_linkage - Precomputed linkage matrices
row_colors, col_colors - Additional color annotations
dendrogram_ratio - Ratio of dendrogram to heatmap
colors_ratio - Ratio of color annotations to heatmap
cbar_pos - Colorbar position (tuple: x, y, width, height)
tree_kws - Parameters for dendrogram
figsize - Figure size

Example:

sns.clustermap(data, method='average', metric='euclidean',
               z_score=0, cmap='viridis',
               row_colors=row_colors, col_colors=col_colors,
               figsize=(12, 12), dendrogram_ratio=0.1)

Multi-Plot Grids

FacetGrid

Purpose: Multi-plot grid for plotting conditional relationships.

Initialization:

g = sns.FacetGrid(data, row=None, col=None, hue=None,
                  col_wrap=None, sharex=True, sharey=True,
                  height=3, aspect=1, palette=None,
                  row_order=None, col_order=None, hue_order=None,
                  hue_kws=None, dropna=False, legend_out=True,
                  despine=True, margin_titles=False,
                  xlim=None, ylim=None, subplot_kws=None,
                  gridspec_kws=None)

Methods:

map(func, *args, **kwargs) - Apply function to each facet
map_dataframe(func, *args, **kwargs) - Apply function with full DataFrame
set_axis_labels(x_var, y_var) - Set axis labels
set_titles(template, **kwargs) - Set subplot titles
set(kwargs) - Set attributes on all axes
add_legend(legend_data, title, label_order, **kwargs) - Add legend
savefig(*args, **kwargs) - Save figure

Example:

g = sns.FacetGrid(df, col='time', row='sex', hue='smoker',
                  height=3, aspect=1.5, margin_titles=True)
g.map(sns.scatterplot, 'total_bill', 'tip', alpha=0.7)
g.add_legend()
g.set_axis_labels('Total Bill ($)', 'Tip ($)')
g.set_titles('{col_name} | {row_name}')

PairGrid

Purpose: Grid for plotting pairwise relationships in a dataset.

Initialization:

g = sns.PairGrid(data, hue=None, vars=None,
                 x_vars=None, y_vars=None,
                 hue_order=None, palette=None,
                 hue_kws=None, corner=False,
                 diag_sharey=True, height=2.5,
                 aspect=1, layout_pad=0.5,
                 despine=True, dropna=False)

Methods:

map(func, **kwargs) - Apply function to all subplots
map_diag(func, **kwargs) - Apply to diagonal
map_offdiag(func, **kwargs) - Apply to off-diagonal
map_upper(func, **kwargs) - Apply to upper triangle
map_lower(func, **kwargs) - Apply to lower triangle
add_legend(legend_data, **kwargs) - Add legend
savefig(*args, **kwargs) - Save figure

Example:

g = sns.PairGrid(df, hue='species', vars=['a', 'b', 'c', 'd'],
                 corner=True, height=2.5)
g.map_upper(sns.scatterplot, alpha=0.5)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot, kde=True)
g.add_legend()

JointGrid

Purpose: Grid for bivariate plot with marginal univariate plots.

Initialization:

g = sns.JointGrid(data=None, x=None, y=None, hue=None,
                  height=6, ratio=5, space=0.2,
                  dropna=False, xlim=None, ylim=None,
                  marginal_ticks=False, hue_order=None,
                  palette=None)

Methods:

plot(joint_func, marginal_func, **kwargs) - Plot both joint and marginals
plot_joint(func, **kwargs) - Plot joint distribution
plot_marginals(func, **kwargs) - Plot marginal distributions
refline(x, y, **kwargs) - Add reference line
set_axis_labels(xlabel, ylabel, **kwargs) - Set axis labels
savefig(*args, **kwargs) - Save figure

Example:

g = sns.JointGrid(data=df, x='x', y='y', hue='group',
                  height=6, ratio=5, space=0.2)
g.plot_joint(sns.scatterplot, alpha=0.5)
g.plot_marginals(sns.histplot, kde=True)
g.set_axis_labels('Variable X', 'Variable Y')

24 KiB Raw Blame History

Seaborn Function Reference

Relational Plots

scatterplot()

lineplot()

relplot()

Distribution Plots

histplot()

kdeplot()

ecdfplot()

rugplot()

displot()

jointplot()

pairplot()

Categorical Plots

stripplot()

swarmplot()

boxplot()

violinplot()

boxenplot()

barplot()

countplot()

pointplot()

catplot()

Regression Plots

regplot()

lmplot()

residplot()

Matrix Plots

heatmap()

clustermap()

Multi-Plot Grids

FacetGrid

PairGrid

JointGrid

24 KiB

Raw Blame History