Classification Functions

The gcs_classification module provides geochemical phase classification using hierarchical rule-based logic with percentile thresholds.

Main Classification Function

classify_geochemical_phase(data: DataFrame, sites: List[str], flow_highres: DataFrame | None = None, qcol: str = 'Q_mLs', ccol: str = 'PLI', water_avail_col: str | None = 'scPDSI', window: int = 5, use_highres: bool = True, headex: float = 0.4, tailex: float = 0.2) DataFrame[source]

Classify geochemical phases using percentile-based thresholds and high-resolution Q data.

This function integrates hysteresis analysis, CVc/CVq ratios, C-Q slopes, and high-resolution flow dynamics to classify segments into 6 geochemical phases: - Flushing (F): Rapid mobilization during high flow - Loading (L): Accumulation phase - Chemostatic (C): Stable, low-variability behavior - Dilution (D): Post-flush recovery - Recession (R): Late-cycle decline - Variable (V): Mixed/ambiguous behavior

data :: pd.DataFrame Time series data with columns:: site_id, date, [qcol], [ccol] sites :: list of str Site IDs to analyze flow_highres :: pd.DataFrame, optional High-resolution (hourly) flow data with ‘time’ column and site columns qcol :: str Flow/discharge column name ccol :: str Concentration column name water_avail_col :: str, optional Water availability index column (e.g., scPDSI) window :: int Window size for CVc/CVq calculation use_highres :: bool Whether to use high-resolution flow dynamics (requires flow_highres) headex/tailex :: float Percentage of segment length extending segment before/after for window hysteresis

Returns :: pd.DataFrame Classification results with columns:
  • Segment metadata (dates, flows, concentrations)

  • Behavior classification

  • C-Q slopes

  • CVc, CVq, CVc_CVq: Variability metrics

  • Window-scale hysteresis (window_HI_*)

  • geochemical_phase: ‘F’, ‘L’, ‘C’, ‘D’, ‘R’, ‘V’

  • phase_confidence: 0.0-1.0

  • rules_triggered: Diagnostic information

  • highres_*: High-resolution flow metrics (if use_highres=True)

Main API function for time series geochemical phase classification.

Classifies monitoring data into 6 geochemical phases based on:
  • Window-scale hysteresis indices (HARP, Zuecco, Lloyd)

  • C-Q slope (power-law exponent)

  • CVc/CVq variability ratios

  • Flow dynamics (rising/falling limbs, peaks)

  • Temporal context (previous phases, trajectories)

The 6 Geochemical Phases:

F (Flushing)

Rapid mobilization with steep concentration decline during high flow. Characterized by positive C-Q slope and dilution signature.

L (Loading)

Accumulation phase with concentration rising to maximum. Characterized by negative C-Q slope and enrichment.

C (Chemostatic)

Low hysteresis, stable behavior with minimal variability. Flat C-Q slope, low CVc/CVq ratio.

D (Dilution)

Post-flush recovery with declining flow and concentration.

R (Recession)

Late cycle, low connectivity, both flow and concentration declining.

V (Variable)

Ambiguous or mixed patterns that don’t fit other categories.

Returns:
DataFrame with columns:
  • site_id: Site identifier

  • start_date, end_date: Segment temporal bounds

  • geochemical_phase: One of F, L, C, D, R, V

  • phase_confidence: 0.0-1.0 confidence score

  • window_HI_zuecco, window_HI_lloyd, window_HI_harp: Hysteresis indices

  • CVc_CVq: Variability ratio

  • cq_slope_loglog: Power-law exponent

  • Q_position, C_position: Percentile positions

  • highres_flow_phase: Rising/falling/peak/low (if high-res Q provided)

  • Plus many other diagnostic columns

Segment Classification

classify_segment_phase(row: Series, percentiles: Dict) Tuple[str, float, List[str]][source]

Classify a single segment into one of 6 geochemical phases.

Classification logic with high-resolution Q dynamics using percentile-based thresholds.

Implements hierarchical rule-based classification for 6 geochemical phases: - Flushing (F): Steep concentration decline during high flow, positive C-Q slope - Loading (L): Concentration rising to maximum, negative C-Q slope - Chemostatic (C): Low hysteresis, stable behavior, flat C-Q slope - Dilution (D): Post-flush recovery, declining flow - Recession (R): Late cycle, low CVc/CVq, both declining - Variable (V): Ambiguous/mixed patterns

row :: pd.Series Segment data with hysteresis, flow, temporal context, and C-Q slope percentiles :: dict Percentile thresholds for classification

Returns :: tuple (phase, confidence, rules_triggered)
  • phase: str, one of ‘F’, ‘L’, ‘C’, ‘D’, ‘R’, ‘V’

  • confidence: float, 0.0-1.0

  • rules_triggered: list of str, diagnostic information

Classify a single segment into a geochemical phase using hierarchical rules.

Called internally by classify_geochemical_phase() for each segment. Can be used directly if you have pre-computed segment features.

Returns:
Tuple of (phase, confidence, rules_triggered):
  • phase: str, one of ‘F’, ‘L’, ‘C’, ‘D’, ‘R’, ‘V’

  • confidence: float, 0.0-1.0

  • rules_triggered: list of str with diagnostic rule names

Simple C-Q Classification

classify_cq_behavior_simple(flow_diff: float, conc_diff: float, flow_range: Tuple[float, float], conc_range: Tuple[float, float], threshold_factor: float = 0.01) str[source]

Simple C-Q behavior classifier based on segment changes.

Based on Williams (1989) and Evans & Davies (1998). This is a simpler classification than the main GCS phase classifier above.

Parameters:
  • flow_diff (float) – Change in flow between points

  • conc_diff (float) – Change in concentration between points

  • flow_range (tuple) – (min, max) flow values for significance testing

  • conc_range (tuple) – (min, max) concentration values for significance testing

  • threshold_factor (float) – Relative threshold for significant change

Returns:

Behavior classification: - ‘connectivity’: Q↑ C↑ (mobilization) - ‘dispersion’: Q↑ C↓ (dilution dominates) - ‘accumulation’: Q↓ C↑ (evaporation/point sources) - ‘recovery’: Q↓ C↓ (system recovery) - ‘quasi-chemostatic’: Q changes, C stable - ‘source variation’: C changes, Q stable - ‘static’: No significant changes

Return type:

str

Simple Williams (1989) style classification based on C-Q relationship.

Provides basic dilution/enrichment/chemostatic classification without temporal or hysteresis information.

Returns:

One of: ‘dilution’, ‘enrichment’, ‘chemostatic’, ‘variable’

Note

This is a simplified classifier. For comprehensive analysis, use classify_geochemical_phase() which integrates hysteresis and temporal dynamics.

Classification Logic

The classification system uses hierarchical rules with percentile-based thresholds:

  1. Percentile Calculation

    Thresholds are computed from the data distribution to be compound-agnostic:
    • Flow percentiles (33rd, 67th)

    • Concentration change percentiles (25th, 75th)

    • C-Q slope thresholds (±0.15, ±0.1)

  2. Hierarchical Rules

    Rules are checked in priority order:
    • Strong signatures (Flushing, Loading) checked first

    • Moderate signatures (Chemostatic, Dilution, Recession) next

    • Variable assigned if no clear pattern

  3. Confidence Scoring

    Based on:
    • Number of rules triggered

    • Agreement between indicators

    • Data quality (sufficient points, valid metrics)

    • Consistency with temporal context

  4. Multi-Method Integration

    Zuecco index is primary (most robust), with Lloyd and HARP as fallbacks.

Scientific Basis

Percentile-Based Thresholds

Compound-agnostic classification using relative positions rather than absolute concentrations. Adapts to different compounds and concentration ranges.

C-Q Slope Integration

Power-law exponent reveals mechanistic processes:
  • b > 0: Transport-limited (flushing)

  • b < 0: Source-limited (loading)

  • b ≈ 0: Chemostatic buffering

Window-Scale Hysteresis

Captures temporal dynamics correctly by computing hysteresis metrics on moving windows around each segment, not on full time series.

Hierarchical Rules

Prioritizes phase detection to avoid ambiguity and improve classification robustness across different monitoring scenarios.

Usage Examples

Basic Classification:

import hygcs as gcs

classified = gcs.classify_geochemical_phase(
    pcd,
    sites=['Site1', 'Site2'],
    ccol='PLI',
    qcol='Q_mLs',
    use_highres=False
)

print(classified[['site_id', 'geochemical_phase',
                  'phase_confidence']].head())

With High-Resolution Flow:

# Hourly Q data improves accuracy
Qx = pd.read_csv('flow_hourly.csv', index_col=0, parse_dates=True)

classified = gcs.classify_geochemical_phase(
    pcd,
    sites=['Site1'],
    flow_highres=Qx,
    ccol='PLI',
    qcol='Q_mLs',
    use_highres=True
)

Filter by Confidence:

# Keep only high-confidence classifications
high_conf = classified[classified['phase_confidence'] > 0.8]

# Investigate low-confidence segments
low_conf = classified[classified['phase_confidence'] < 0.7]
print(low_conf[['site_id', 'geochemical_phase', 'CVc_CVq',
                'cq_slope_loglog']])

See Also