Files
gh-k-dense-ai-claude-scient…/skills/seaborn/references/function_reference.md
2025-11-30 08:30:10 +08:00

24 KiB

Seaborn Function Reference

This document provides a comprehensive reference for all major seaborn functions, organized by category.

Relational Plots

scatterplot()

Purpose: Create a scatter plot with points representing individual observations.

Key Parameters:

  • data - DataFrame, array, or dict of arrays
  • x, y - Variables for x and y axes
  • hue - Grouping variable for color encoding
  • size - Grouping variable for size encoding
  • style - Grouping variable for marker style
  • palette - Color palette name or list
  • hue_order - Order for categorical hue levels
  • hue_norm - Normalization for numeric hue (tuple or Normalize object)
  • sizes - Size range for size encoding (tuple or dict)
  • size_order - Order for categorical size levels
  • size_norm - Normalization for numeric size
  • markers - Marker style(s) (string, list, or dict)
  • style_order - Order for categorical style levels
  • legend - How to draw legend: "auto", "brief", "full", or False
  • ax - Matplotlib axes to plot on

Example:

sns.scatterplot(data=df, x='height', y='weight',
                hue='gender', size='age', style='smoker',
                palette='Set2', sizes=(20, 200))

lineplot()

Purpose: Draw a line plot with automatic aggregation and confidence intervals for repeated measures.

Key Parameters:

  • data - DataFrame, array, or dict of arrays
  • x, y - Variables for x and y axes
  • hue - Grouping variable for color encoding
  • size - Grouping variable for line width
  • style - Grouping variable for line style (dashes)
  • units - Grouping variable for sampling units (no aggregation within units)
  • estimator - Function for aggregating across observations (default: mean)
  • errorbar - Method for error bars: "sd", "se", "pi", ("ci", level), ("pi", level), or None
  • n_boot - Number of bootstrap iterations for CI computation
  • seed - Random seed for reproducible bootstrapping
  • sort - Sort data before plotting
  • err_style - "band" or "bars" for error representation
  • err_kws - Additional parameters for error representation
  • markers - Marker style(s) for emphasizing data points
  • dashes - Dash style(s) for lines
  • legend - How to draw legend
  • ax - Matplotlib axes to plot on

Example:

sns.lineplot(data=timeseries, x='time', y='signal',
             hue='condition', style='subject',
             errorbar=('ci', 95), markers=True)

relplot()

Purpose: Figure-level interface for drawing relational plots (scatter or line) onto a FacetGrid.

Key Parameters: All parameters from scatterplot() and lineplot(), plus:

  • kind - "scatter" or "line"
  • col - Categorical variable for column facets
  • row - Categorical variable for row facets
  • col_wrap - Wrap columns after this many columns
  • col_order - Order for column facet levels
  • row_order - Order for row facet levels
  • height - Height of each facet in inches
  • aspect - Aspect ratio (width = height * aspect)
  • facet_kws - Additional parameters for FacetGrid

Example:

sns.relplot(data=df, x='time', y='measurement',
            hue='treatment', style='batch',
            col='cell_line', row='timepoint',
            kind='line', height=3, aspect=1.5)

Distribution Plots

histplot()

Purpose: Plot univariate or bivariate histograms with flexible binning.

Key Parameters:

  • data - DataFrame, array, or dict
  • x, y - Variables (y optional for bivariate)
  • hue - Grouping variable
  • weights - Variable for weighting observations
  • stat - Aggregate statistic: "count", "frequency", "probability", "percent", "density"
  • bins - Number of bins, bin edges, or method ("auto", "fd", "doane", "scott", "stone", "rice", "sturges", "sqrt")
  • binwidth - Width of bins (overrides bins)
  • binrange - Range for binning (tuple)
  • discrete - Treat x as discrete (centers bars on values)
  • cumulative - Compute cumulative distribution
  • common_bins - Use same bins for all hue levels
  • common_norm - Normalize across hue levels
  • multiple - How to handle hue: "layer", "dodge", "stack", "fill"
  • element - Visual element: "bars", "step", "poly"
  • fill - Fill bars/elements
  • shrink - Scale bar width (for multiple="dodge")
  • kde - Overlay KDE estimate
  • kde_kws - Parameters for KDE
  • line_kws - Parameters for step/poly elements
  • thresh - Minimum count threshold for bins
  • pthresh - Minimum probability threshold
  • pmax - Maximum probability for color scaling
  • log_scale - Log scale for axis (bool or base)
  • legend - Whether to show legend
  • ax - Matplotlib axes

Example:

sns.histplot(data=df, x='measurement', hue='condition',
             stat='density', bins=30, kde=True,
             multiple='layer', alpha=0.5)

kdeplot()

Purpose: Plot univariate or bivariate kernel density estimates.

Key Parameters:

  • data - DataFrame, array, or dict
  • x, y - Variables (y optional for bivariate)
  • hue - Grouping variable
  • weights - Variable for weighting observations
  • palette - Color palette
  • hue_order - Order for hue levels
  • hue_norm - Normalization for numeric hue
  • multiple - How to handle hue: "layer", "stack", "fill"
  • common_norm - Normalize across hue levels
  • common_grid - Use same grid for all hue levels
  • cumulative - Compute cumulative distribution
  • bw_method - Method for bandwidth: "scott", "silverman", or scalar
  • bw_adjust - Bandwidth multiplier (higher = smoother)
  • log_scale - Log scale for axis
  • levels - Number or values for contour levels (bivariate)
  • thresh - Minimum density threshold for contours
  • gridsize - Grid resolution
  • cut - Extension beyond data extremes (in bandwidth units)
  • clip - Data range for curve (tuple)
  • fill - Fill area under curve/contours
  • legend - Whether to show legend
  • ax - Matplotlib axes

Example:

# Univariate
sns.kdeplot(data=df, x='measurement', hue='condition',
            fill=True, common_norm=False, bw_adjust=1.5)

# Bivariate
sns.kdeplot(data=df, x='var1', y='var2',
            fill=True, levels=10, thresh=0.05)

ecdfplot()

Purpose: Plot empirical cumulative distribution functions.

Key Parameters:

  • data - DataFrame, array, or dict
  • x, y - Variables (specify one)
  • hue - Grouping variable
  • weights - Variable for weighting observations
  • stat - "proportion" or "count"
  • complementary - Plot complementary CDF (1 - ECDF)
  • palette - Color palette
  • hue_order - Order for hue levels
  • hue_norm - Normalization for numeric hue
  • log_scale - Log scale for axis
  • legend - Whether to show legend
  • ax - Matplotlib axes

Example:

sns.ecdfplot(data=df, x='response_time', hue='treatment',
             stat='proportion', complementary=False)

rugplot()

Purpose: Plot tick marks showing individual observations along an axis.

Key Parameters:

  • data - DataFrame, array, or dict
  • x, y - Variable (specify one)
  • hue - Grouping variable
  • height - Height of ticks (proportion of axis)
  • expand_margins - Add margin space for rug
  • palette - Color palette
  • hue_order - Order for hue levels
  • hue_norm - Normalization for numeric hue
  • legend - Whether to show legend
  • ax - Matplotlib axes

Example:

sns.rugplot(data=df, x='value', hue='category', height=0.05)

displot()

Purpose: Figure-level interface for distribution plots onto a FacetGrid.

Key Parameters: All parameters from histplot(), kdeplot(), and ecdfplot(), plus:

  • kind - "hist", "kde", "ecdf"
  • rug - Add rug plot on marginal axes
  • rug_kws - Parameters for rug plot
  • col - Categorical variable for column facets
  • row - Categorical variable for row facets
  • col_wrap - Wrap columns
  • col_order - Order for column facets
  • row_order - Order for row facets
  • height - Height of each facet
  • aspect - Aspect ratio
  • facet_kws - Additional parameters for FacetGrid

Example:

sns.displot(data=df, x='measurement', hue='treatment',
            col='timepoint', kind='kde', fill=True,
            height=3, aspect=1.5, rug=True)

jointplot()

Purpose: Draw a bivariate plot with marginal univariate plots.

Key Parameters:

  • data - DataFrame
  • x, y - Variables for x and y axes
  • hue - Grouping variable
  • kind - "scatter", "kde", "hist", "hex", "reg", "resid"
  • height - Size of the figure (square)
  • ratio - Ratio of joint to marginal axes
  • space - Space between joint and marginal axes
  • dropna - Drop missing values
  • xlim, ylim - Axis limits (tuples)
  • marginal_ticks - Show ticks on marginal axes
  • joint_kws - Parameters for joint plot
  • marginal_kws - Parameters for marginal plots
  • hue_order - Order for hue levels
  • palette - Color palette

Example:

sns.jointplot(data=df, x='var1', y='var2', hue='group',
              kind='scatter', height=6, ratio=4,
              joint_kws={'alpha': 0.5})

pairplot()

Purpose: Plot pairwise relationships in a dataset.

Key Parameters:

  • data - DataFrame
  • hue - Grouping variable for color encoding
  • hue_order - Order for hue levels
  • palette - Color palette
  • vars - Variables to plot (default: all numeric)
  • x_vars, y_vars - Variables for x and y axes (non-square grid)
  • kind - "scatter", "kde", "hist", "reg"
  • diag_kind - "auto", "hist", "kde", None
  • markers - Marker style(s)
  • height - Height of each facet
  • aspect - Aspect ratio
  • corner - Plot only lower triangle
  • dropna - Drop missing values
  • plot_kws - Parameters for non-diagonal plots
  • diag_kws - Parameters for diagonal plots
  • grid_kws - Parameters for PairGrid

Example:

sns.pairplot(data=df, hue='species', palette='Set2',
             vars=['sepal_length', 'sepal_width', 'petal_length'],
             corner=True, height=2.5)

Categorical Plots

stripplot()

Purpose: Draw a categorical scatterplot with jittered points.

Key Parameters:

  • data - DataFrame, array, or dict
  • x, y - Variables (one categorical, one continuous)
  • hue - Grouping variable
  • order - Order for categorical levels
  • hue_order - Order for hue levels
  • jitter - Amount of jitter: True, float, or False
  • dodge - Separate hue levels side-by-side
  • orient - "v" or "h" (usually inferred)
  • color - Single color for all elements
  • palette - Color palette
  • size - Marker size
  • edgecolor - Marker edge color
  • linewidth - Marker edge width
  • native_scale - Use numeric scale for categorical axis
  • formatter - Formatter for categorical axis
  • legend - Whether to show legend
  • ax - Matplotlib axes

Example:

sns.stripplot(data=df, x='day', y='total_bill',
              hue='sex', dodge=True, jitter=0.2)

swarmplot()

Purpose: Draw a categorical scatterplot with non-overlapping points.

Key Parameters: Same as stripplot(), except:

  • No jitter parameter
  • size - Marker size (important for avoiding overlap)
  • warn_thresh - Threshold for warning about too many points (default: 0.05)

Note: Computationally intensive for large datasets. Use stripplot for >1000 points.

Example:

sns.swarmplot(data=df, x='day', y='total_bill',
              hue='time', dodge=True, size=5)

boxplot()

Purpose: Draw a box plot showing quartiles and outliers.

Key Parameters:

  • data - DataFrame, array, or dict
  • x, y - Variables (one categorical, one continuous)
  • hue - Grouping variable
  • order - Order for categorical levels
  • hue_order - Order for hue levels
  • orient - "v" or "h"
  • color - Single color for boxes
  • palette - Color palette
  • saturation - Color saturation intensity
  • width - Width of boxes
  • dodge - Separate hue levels side-by-side
  • fliersize - Size of outlier markers
  • linewidth - Box line width
  • whis - IQR multiplier for whiskers (default: 1.5)
  • notch - Draw notched boxes
  • showcaps - Show whisker caps
  • showmeans - Show mean value
  • meanprops - Properties for mean marker
  • boxprops - Properties for boxes
  • whiskerprops - Properties for whiskers
  • capprops - Properties for caps
  • flierprops - Properties for outliers
  • medianprops - Properties for median line
  • native_scale - Use numeric scale
  • formatter - Formatter for categorical axis
  • legend - Whether to show legend
  • ax - Matplotlib axes

Example:

sns.boxplot(data=df, x='day', y='total_bill',
            hue='smoker', palette='Set3',
            showmeans=True, notch=True)

violinplot()

Purpose: Draw a violin plot combining boxplot and KDE.

Key Parameters: Same as boxplot(), plus:

  • bw_method - KDE bandwidth method
  • bw_adjust - KDE bandwidth multiplier
  • cut - KDE extension beyond extremes
  • density_norm - "area", "count", "width"
  • inner - "box", "quartile", "point", "stick", None
  • split - Split violins for hue comparison
  • scale - Scaling method: "area", "count", "width"
  • scale_hue - Scale across hue levels
  • gridsize - KDE grid resolution

Example:

sns.violinplot(data=df, x='day', y='total_bill',
               hue='sex', split=True, inner='quartile',
               palette='muted')

boxenplot()

Purpose: Draw enhanced box plot for larger datasets showing more quantiles.

Key Parameters: Same as boxplot(), plus:

  • k_depth - "tukey", "proportion", "trustworthy", "full", or int
  • outlier_prop - Proportion of data as outliers
  • trust_alpha - Alpha for trustworthy depth
  • showfliers - Show outlier points

Example:

sns.boxenplot(data=df, x='day', y='total_bill',
              hue='time', palette='Set2')

barplot()

Purpose: Draw a bar plot with error bars showing statistical estimates.

Key Parameters:

  • data - DataFrame, array, or dict
  • x, y - Variables (one categorical, one continuous)
  • hue - Grouping variable
  • order - Order for categorical levels
  • hue_order - Order for hue levels
  • estimator - Aggregation function (default: mean)
  • errorbar - Error representation: "sd", "se", "pi", ("ci", level), ("pi", level), or None
  • n_boot - Bootstrap iterations
  • seed - Random seed
  • units - Identifier for sampling units
  • weights - Observation weights
  • orient - "v" or "h"
  • color - Single bar color
  • palette - Color palette
  • saturation - Color saturation
  • width - Bar width
  • dodge - Separate hue levels side-by-side
  • errcolor - Error bar color
  • errwidth - Error bar line width
  • capsize - Error bar cap width
  • native_scale - Use numeric scale
  • formatter - Formatter for categorical axis
  • legend - Whether to show legend
  • ax - Matplotlib axes

Example:

sns.barplot(data=df, x='day', y='total_bill',
            hue='sex', estimator='median',
            errorbar=('ci', 95), capsize=0.1)

countplot()

Purpose: Show counts of observations in each categorical bin.

Key Parameters: Same as barplot(), but:

  • Only specify one of x or y (the categorical variable)
  • No estimator or errorbar (shows counts)
  • stat - "count" or "percent"

Example:

sns.countplot(data=df, x='day', hue='time',
              palette='pastel', dodge=True)

pointplot()

Purpose: Show point estimates and confidence intervals with connecting lines.

Key Parameters: Same as barplot(), plus:

  • markers - Marker style(s)
  • linestyles - Line style(s)
  • scale - Scale for markers
  • join - Connect points with lines
  • capsize - Error bar cap width

Example:

sns.pointplot(data=df, x='time', y='total_bill',
              hue='sex', markers=['o', 's'],
              linestyles=['-', '--'], capsize=0.1)

catplot()

Purpose: Figure-level interface for categorical plots onto a FacetGrid.

Key Parameters: All parameters from categorical plots, plus:

  • kind - "strip", "swarm", "box", "violin", "boxen", "bar", "point", "count"
  • col - Categorical variable for column facets
  • row - Categorical variable for row facets
  • col_wrap - Wrap columns
  • col_order - Order for column facets
  • row_order - Order for row facets
  • height - Height of each facet
  • aspect - Aspect ratio
  • sharex, sharey - Share axes across facets
  • legend - Whether to show legend
  • legend_out - Place legend outside figure
  • facet_kws - Additional FacetGrid parameters

Example:

sns.catplot(data=df, x='day', y='total_bill',
            hue='smoker', col='time',
            kind='violin', split=True,
            height=4, aspect=0.8)

Regression Plots

regplot()

Purpose: Plot data and a linear regression fit.

Key Parameters:

  • data - DataFrame
  • x, y - Variables or data vectors
  • x_estimator - Apply estimator to x bins
  • x_bins - Bin x for estimator
  • x_ci - CI for binned estimates
  • scatter - Show scatter points
  • fit_reg - Plot regression line
  • ci - CI for regression estimate (int or None)
  • n_boot - Bootstrap iterations for CI
  • units - Identifier for sampling units
  • seed - Random seed
  • order - Polynomial regression order
  • logistic - Fit logistic regression
  • lowess - Fit lowess smoother
  • robust - Fit robust regression
  • logx - Log-transform x
  • x_partial, y_partial - Partial regression (regress out variables)
  • truncate - Limit regression line to data range
  • dropna - Drop missing values
  • x_jitter, y_jitter - Add jitter to data
  • label - Label for legend
  • color - Color for all elements
  • marker - Marker style
  • scatter_kws - Parameters for scatter
  • line_kws - Parameters for regression line
  • ax - Matplotlib axes

Example:

sns.regplot(data=df, x='total_bill', y='tip',
            order=2, robust=True, ci=95,
            scatter_kws={'alpha': 0.5})

lmplot()

Purpose: Figure-level interface for regression plots onto a FacetGrid.

Key Parameters: All parameters from regplot(), plus:

  • hue - Grouping variable
  • col - Column facets
  • row - Row facets
  • palette - Color palette
  • col_wrap - Wrap columns
  • height - Facet height
  • aspect - Aspect ratio
  • markers - Marker style(s)
  • sharex, sharey - Share axes
  • hue_order - Order for hue levels
  • col_order - Order for column facets
  • row_order - Order for row facets
  • legend - Whether to show legend
  • legend_out - Place legend outside
  • facet_kws - FacetGrid parameters

Example:

sns.lmplot(data=df, x='total_bill', y='tip',
           hue='smoker', col='time', row='sex',
           height=3, aspect=1.2, ci=None)

residplot()

Purpose: Plot residuals of a regression.

Key Parameters: Same as regplot(), but:

  • Always plots residuals (y - predicted) vs x
  • Adds horizontal line at y=0
  • lowess - Fit lowess smoother to residuals

Example:

sns.residplot(data=df, x='x', y='y', lowess=True,
              scatter_kws={'alpha': 0.5})

Matrix Plots

heatmap()

Purpose: Plot rectangular data as a color-encoded matrix.

Key Parameters:

  • data - 2D array-like data
  • vmin, vmax - Anchor values for colormap
  • cmap - Colormap name or object
  • center - Value at colormap center
  • robust - Use robust quantiles for colormap range
  • annot - Annotate cells: True, False, or array
  • fmt - Format string for annotations (e.g., ".2f")
  • annot_kws - Parameters for annotations
  • linewidths - Width of cell borders
  • linecolor - Color of cell borders
  • cbar - Draw colorbar
  • cbar_kws - Colorbar parameters
  • cbar_ax - Axes for colorbar
  • square - Force square cells
  • xticklabels, yticklabels - Tick labels (True, False, int, or list)
  • mask - Boolean array to mask cells
  • ax - Matplotlib axes

Example:

# Correlation matrix
corr = df.corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
sns.heatmap(corr, mask=mask, annot=True, fmt='.2f',
            cmap='coolwarm', center=0, square=True,
            linewidths=1, cbar_kws={'shrink': 0.8})

clustermap()

Purpose: Plot a hierarchically-clustered heatmap.

Key Parameters: All parameters from heatmap(), plus:

  • pivot_kws - Parameters for pivoting (if needed)
  • method - Linkage method: "single", "complete", "average", "weighted", "centroid", "median", "ward"
  • metric - Distance metric for clustering
  • standard_scale - Standardize data: 0 (rows), 1 (columns), or None
  • z_score - Z-score normalize data: 0 (rows), 1 (columns), or None
  • row_cluster, col_cluster - Cluster rows/columns
  • row_linkage, col_linkage - Precomputed linkage matrices
  • row_colors, col_colors - Additional color annotations
  • dendrogram_ratio - Ratio of dendrogram to heatmap
  • colors_ratio - Ratio of color annotations to heatmap
  • cbar_pos - Colorbar position (tuple: x, y, width, height)
  • tree_kws - Parameters for dendrogram
  • figsize - Figure size

Example:

sns.clustermap(data, method='average', metric='euclidean',
               z_score=0, cmap='viridis',
               row_colors=row_colors, col_colors=col_colors,
               figsize=(12, 12), dendrogram_ratio=0.1)

Multi-Plot Grids

FacetGrid

Purpose: Multi-plot grid for plotting conditional relationships.

Initialization:

g = sns.FacetGrid(data, row=None, col=None, hue=None,
                  col_wrap=None, sharex=True, sharey=True,
                  height=3, aspect=1, palette=None,
                  row_order=None, col_order=None, hue_order=None,
                  hue_kws=None, dropna=False, legend_out=True,
                  despine=True, margin_titles=False,
                  xlim=None, ylim=None, subplot_kws=None,
                  gridspec_kws=None)

Methods:

  • map(func, *args, **kwargs) - Apply function to each facet
  • map_dataframe(func, *args, **kwargs) - Apply function with full DataFrame
  • set_axis_labels(x_var, y_var) - Set axis labels
  • set_titles(template, **kwargs) - Set subplot titles
  • set(kwargs) - Set attributes on all axes
  • add_legend(legend_data, title, label_order, **kwargs) - Add legend
  • savefig(*args, **kwargs) - Save figure

Example:

g = sns.FacetGrid(df, col='time', row='sex', hue='smoker',
                  height=3, aspect=1.5, margin_titles=True)
g.map(sns.scatterplot, 'total_bill', 'tip', alpha=0.7)
g.add_legend()
g.set_axis_labels('Total Bill ($)', 'Tip ($)')
g.set_titles('{col_name} | {row_name}')

PairGrid

Purpose: Grid for plotting pairwise relationships in a dataset.

Initialization:

g = sns.PairGrid(data, hue=None, vars=None,
                 x_vars=None, y_vars=None,
                 hue_order=None, palette=None,
                 hue_kws=None, corner=False,
                 diag_sharey=True, height=2.5,
                 aspect=1, layout_pad=0.5,
                 despine=True, dropna=False)

Methods:

  • map(func, **kwargs) - Apply function to all subplots
  • map_diag(func, **kwargs) - Apply to diagonal
  • map_offdiag(func, **kwargs) - Apply to off-diagonal
  • map_upper(func, **kwargs) - Apply to upper triangle
  • map_lower(func, **kwargs) - Apply to lower triangle
  • add_legend(legend_data, **kwargs) - Add legend
  • savefig(*args, **kwargs) - Save figure

Example:

g = sns.PairGrid(df, hue='species', vars=['a', 'b', 'c', 'd'],
                 corner=True, height=2.5)
g.map_upper(sns.scatterplot, alpha=0.5)
g.map_lower(sns.kdeplot)
g.map_diag(sns.histplot, kde=True)
g.add_legend()

JointGrid

Purpose: Grid for bivariate plot with marginal univariate plots.

Initialization:

g = sns.JointGrid(data=None, x=None, y=None, hue=None,
                  height=6, ratio=5, space=0.2,
                  dropna=False, xlim=None, ylim=None,
                  marginal_ticks=False, hue_order=None,
                  palette=None)

Methods:

  • plot(joint_func, marginal_func, **kwargs) - Plot both joint and marginals
  • plot_joint(func, **kwargs) - Plot joint distribution
  • plot_marginals(func, **kwargs) - Plot marginal distributions
  • refline(x, y, **kwargs) - Add reference line
  • set_axis_labels(xlabel, ylabel, **kwargs) - Set axis labels
  • savefig(*args, **kwargs) - Save figure

Example:

g = sns.JointGrid(data=df, x='x', y='y', hue='group',
                  height=6, ratio=5, space=0.2)
g.plot_joint(sns.scatterplot, alpha=0.5)
g.plot_marginals(sns.histplot, kde=True)
g.set_axis_labels('Variable X', 'Variable Y')