Classification Functions
The gcs_classification module provides geochemical phase classification
using hierarchical rule-based logic with percentile thresholds.
Main Classification Function
- classify_geochemical_phase(data: DataFrame, sites: List[str], flow_highres: DataFrame | None = None, qcol: str = 'Q_mLs', ccol: str = 'PLI', water_avail_col: str | None = 'scPDSI', window: int = 5, use_highres: bool = True, headex: float = 0.4, tailex: float = 0.2) DataFrame[source]
Classify geochemical phases using percentile-based thresholds and high-resolution Q data.
This function integrates hysteresis analysis, CVc/CVq ratios, C-Q slopes, and high-resolution flow dynamics to classify segments into 6 geochemical phases: - Flushing (F): Rapid mobilization during high flow - Loading (L): Accumulation phase - Chemostatic (C): Stable, low-variability behavior - Dilution (D): Post-flush recovery - Recession (R): Late-cycle decline - Variable (V): Mixed/ambiguous behavior
data :: pd.DataFrame Time series data with columns:: site_id, date, [qcol], [ccol] sites :: list of str Site IDs to analyze flow_highres :: pd.DataFrame, optional High-resolution (hourly) flow data with ‘time’ column and site columns qcol :: str Flow/discharge column name ccol :: str Concentration column name water_avail_col :: str, optional Water availability index column (e.g., scPDSI) window :: int Window size for CVc/CVq calculation use_highres :: bool Whether to use high-resolution flow dynamics (requires flow_highres) headex/tailex :: float Percentage of segment length extending segment before/after for window hysteresis
- Returns :: pd.DataFrame Classification results with columns:
Segment metadata (dates, flows, concentrations)
Behavior classification
C-Q slopes
CVc, CVq, CVc_CVq: Variability metrics
Window-scale hysteresis (window_HI_*)
geochemical_phase: ‘F’, ‘L’, ‘C’, ‘D’, ‘R’, ‘V’
phase_confidence: 0.0-1.0
rules_triggered: Diagnostic information
highres_*: High-resolution flow metrics (if use_highres=True)
Main API function for time series geochemical phase classification.
- Classifies monitoring data into 6 geochemical phases based on:
Window-scale hysteresis indices (HARP, Zuecco, Lloyd)
C-Q slope (power-law exponent)
CVc/CVq variability ratios
Flow dynamics (rising/falling limbs, peaks)
Temporal context (previous phases, trajectories)
The 6 Geochemical Phases:
- F (Flushing)
Rapid mobilization with steep concentration decline during high flow. Characterized by positive C-Q slope and dilution signature.
- L (Loading)
Accumulation phase with concentration rising to maximum. Characterized by negative C-Q slope and enrichment.
- C (Chemostatic)
Low hysteresis, stable behavior with minimal variability. Flat C-Q slope, low CVc/CVq ratio.
- D (Dilution)
Post-flush recovery with declining flow and concentration.
- R (Recession)
Late cycle, low connectivity, both flow and concentration declining.
- V (Variable)
Ambiguous or mixed patterns that don’t fit other categories.
- Returns:
- DataFrame with columns:
site_id: Site identifierstart_date,end_date: Segment temporal boundsgeochemical_phase: One of F, L, C, D, R, Vphase_confidence: 0.0-1.0 confidence scorewindow_HI_zuecco,window_HI_lloyd,window_HI_harp: Hysteresis indicesCVc_CVq: Variability ratiocq_slope_loglog: Power-law exponentQ_position,C_position: Percentile positionshighres_flow_phase: Rising/falling/peak/low (if high-res Q provided)Plus many other diagnostic columns
Segment Classification
- classify_segment_phase(row: Series, percentiles: Dict) Tuple[str, float, List[str]][source]
Classify a single segment into one of 6 geochemical phases.
Classification logic with high-resolution Q dynamics using percentile-based thresholds.
Implements hierarchical rule-based classification for 6 geochemical phases: - Flushing (F): Steep concentration decline during high flow, positive C-Q slope - Loading (L): Concentration rising to maximum, negative C-Q slope - Chemostatic (C): Low hysteresis, stable behavior, flat C-Q slope - Dilution (D): Post-flush recovery, declining flow - Recession (R): Late cycle, low CVc/CVq, both declining - Variable (V): Ambiguous/mixed patterns
row :: pd.Series Segment data with hysteresis, flow, temporal context, and C-Q slope percentiles :: dict Percentile thresholds for classification
- Returns :: tuple (phase, confidence, rules_triggered)
phase: str, one of ‘F’, ‘L’, ‘C’, ‘D’, ‘R’, ‘V’
confidence: float, 0.0-1.0
rules_triggered: list of str, diagnostic information
Classify a single segment into a geochemical phase using hierarchical rules.
Called internally by
classify_geochemical_phase()for each segment. Can be used directly if you have pre-computed segment features.- Returns:
- Tuple of (phase, confidence, rules_triggered):
phase: str, one of ‘F’, ‘L’, ‘C’, ‘D’, ‘R’, ‘V’confidence: float, 0.0-1.0rules_triggered: list of str with diagnostic rule names
Simple C-Q Classification
- classify_cq_behavior_simple(flow_diff: float, conc_diff: float, flow_range: Tuple[float, float], conc_range: Tuple[float, float], threshold_factor: float = 0.01) str[source]
Simple C-Q behavior classifier based on segment changes.
Based on Williams (1989) and Evans & Davies (1998). This is a simpler classification than the main GCS phase classifier above.
- Parameters:
flow_diff (float) – Change in flow between points
conc_diff (float) – Change in concentration between points
flow_range (tuple) – (min, max) flow values for significance testing
conc_range (tuple) – (min, max) concentration values for significance testing
threshold_factor (float) – Relative threshold for significant change
- Returns:
Behavior classification: - ‘connectivity’: Q↑ C↑ (mobilization) - ‘dispersion’: Q↑ C↓ (dilution dominates) - ‘accumulation’: Q↓ C↑ (evaporation/point sources) - ‘recovery’: Q↓ C↓ (system recovery) - ‘quasi-chemostatic’: Q changes, C stable - ‘source variation’: C changes, Q stable - ‘static’: No significant changes
- Return type:
str
Simple Williams (1989) style classification based on C-Q relationship.
Provides basic dilution/enrichment/chemostatic classification without temporal or hysteresis information.
- Returns:
One of: ‘dilution’, ‘enrichment’, ‘chemostatic’, ‘variable’
Note
This is a simplified classifier. For comprehensive analysis, use
classify_geochemical_phase()which integrates hysteresis and temporal dynamics.
Classification Logic
The classification system uses hierarchical rules with percentile-based thresholds:
Percentile Calculation
- Thresholds are computed from the data distribution to be compound-agnostic:
Flow percentiles (33rd, 67th)
Concentration change percentiles (25th, 75th)
C-Q slope thresholds (±0.15, ±0.1)
Hierarchical Rules
- Rules are checked in priority order:
Strong signatures (Flushing, Loading) checked first
Moderate signatures (Chemostatic, Dilution, Recession) next
Variable assigned if no clear pattern
Confidence Scoring
- Based on:
Number of rules triggered
Agreement between indicators
Data quality (sufficient points, valid metrics)
Consistency with temporal context
Multi-Method Integration
Zuecco index is primary (most robust), with Lloyd and HARP as fallbacks.
Scientific Basis
Percentile-Based Thresholds
Compound-agnostic classification using relative positions rather than absolute concentrations. Adapts to different compounds and concentration ranges.
C-Q Slope Integration
- Power-law exponent reveals mechanistic processes:
b > 0: Transport-limited (flushing)
b < 0: Source-limited (loading)
b ≈ 0: Chemostatic buffering
Window-Scale Hysteresis
Captures temporal dynamics correctly by computing hysteresis metrics on moving windows around each segment, not on full time series.
Hierarchical Rules
Prioritizes phase detection to avoid ambiguity and improve classification robustness across different monitoring scenarios.
Usage Examples
Basic Classification:
import hygcs as gcs
classified = gcs.classify_geochemical_phase(
pcd,
sites=['Site1', 'Site2'],
ccol='PLI',
qcol='Q_mLs',
use_highres=False
)
print(classified[['site_id', 'geochemical_phase',
'phase_confidence']].head())
With High-Resolution Flow:
# Hourly Q data improves accuracy
Qx = pd.read_csv('flow_hourly.csv', index_col=0, parse_dates=True)
classified = gcs.classify_geochemical_phase(
pcd,
sites=['Site1'],
flow_highres=Qx,
ccol='PLI',
qcol='Q_mLs',
use_highres=True
)
Filter by Confidence:
# Keep only high-confidence classifications
high_conf = classified[classified['phase_confidence'] > 0.8]
# Investigate low-confidence segments
low_conf = classified[classified['phase_confidence'] < 0.7]
print(low_conf[['site_id', 'geochemical_phase', 'CVc_CVq',
'cq_slope_loglog']])
See Also
Core Analysis Functions - Core analysis functions
Visualization Functions - Visualization of classification results
Quick Start Guide - Quick start guide with examples
Scientific Background - Detailed methodology