Quick Start Guide

This guide will help you get started with HyGCS for common analysis tasks.

Import the Package

import hygcs as gcs
import pandas as pd
import numpy as np

Single Event Hysteresis Analysis

For analyzing individual storm events or flushing episodes.

Load Your Data

Your data should contain:

Time column (datetime or numeric)
Discharge/flow column (Q)
Concentration column (C)

Minimum 10-15 points recommended, ideally 20-30 points covering the full event.

# Load event data
data = pd.read_csv('storm_event.csv')

# Ensure time is datetime if needed
data['datetime'] = pd.to_datetime(data['datetime'])

Calculate Hysteresis Metrics

Use the comprehensive wrapper function:

metrics = gcs.calculate_all_hysteresis_metrics(
    data,
    time_col='datetime',
    discharge_col='Q_Ls',
    concentration_col='NO3_mgL'
)

Extract Results

# HARP method results
harp = metrics['harp_metrics']
print(f"HARP Area: {harp['area']:.4f}")
print(f"Peak Q timing: {harp['peaktime_Q']:.2f} days")
print(f"Peak C timing: {harp['peaktime_C']:.2f} days")

# Zuecco method results
zuecco = metrics['zuecco_metrics']
print(f"Zuecco h-index: {zuecco['h_index']:.4f}")
print(f"Zuecco class: {zuecco['hyst_class']}")

# Lloyd/Lawler results
lloyd = metrics['lloyd_metrics']
print(f"Lloyd HInew: {lloyd['mean_HInew']:.4f}")
print(f"Direction: {metrics['classifications']['lloyd_direction']}")

Visualize Individual Methods

# Get full DataFrames for plotting
from hygcs.harp import calculate_harp_metrics, harp_plot
harp_m, harp_df = calculate_harp_metrics(data,
                                          time_col='datetime',
                                          discharge_col='Q_Ls',
                                          concentration_col='NO3_mgL')

fig = harp_plot(harp_df, harp_m)
fig.show()

Time Series Classification

For long-term monitoring data with multiple sampling cycles.

Prepare Your Data

Required columns:

site_id: Site identifier (string)
date: Datetime
Flow column (your choice of name)
Concentration column (your choice of name)

# Load monitoring time series
pcd = pd.read_csv('monitoring_data.csv')
pcd['date'] = pd.to_datetime(pcd['date'])

# Check data structure
print(pcd[['site_id', 'date', 'Q_mLs', 'PLI']].head())

Optional: High-Resolution Flow Data

For improved accuracy, provide hourly flow data:

# Load high-res Q data
Qx = pd.read_csv('flow_hourly.csv', index_col=0, parse_dates=True)

Run Classification

classified = gcs.classify_geochemical_phase(
    pcd,
    sites=['Site1', 'Site2', 'Site3'],
    flow_highres=Qx,      # Optional
    ccol='PLI',           # Your concentration column
    qcol='Q_mLs',         # Your flow column
    use_highres=True      # Set False if no Qx provided
)

Examine Results

# View classification results
print(classified[['site_id', 'start_date', 'end_date',
                  'geochemical_phase', 'phase_confidence']].head(10))

# Check confidence scores
print(f"Mean confidence: {classified['phase_confidence'].mean():.2f}")
print(f"Low confidence (<0.7): {(classified['phase_confidence'] < 0.7).sum()} segments")

Visualize Phase Sequence

# Create phase timeline
fig = gcs.create_phase_sequence_plot(classified, sites=['Site1', 'Site2'])
fig.show()

Create Diagnostic Plots

# CVc/CVq vs C-Q slope space
fig_diag = gcs.create_diagnostic_plot(classified)
fig_diag.show()

Create Hysteresis Plots

# Single site hysteresis loops
fig_hyst = gcs.create_hysteresis_plot(
    classified,
    sites=['Site1'],
    ccol='PLI',
    qcol='Q_mLs',
    compound='PLI',
    conc_unit='µg/L'
)
fig_hyst.show()

CVc/CVq Variability Analysis

Analyze chemostatic vs. chemodynamic behavior.

Compute Rolling Windows

cvc_cvq = gcs.compute_cvc_cvq_windows(
    pcd,
    qcol='Q_mLs',
    ccol='PLI',
    window=5  # Number of samples per window
)

# View results
print(cvc_cvq[['site_id', 'end_date', 'CVc_CVq', 'cq_slope_loglog']].head())

Interpret Results

CVc/CVq > 1: Chemodynamic - concentration varies more than flow
CVc/CVq < 1: Chemostatic - concentration buffered relative to flow
cq_slope_loglog > 0.15: Dilution/flushing signature
cq_slope_loglog < -0.15: Enrichment/loading signature
|cq_slope_loglog| < 0.1: Chemostatic buffering

C-Q Slope Calculation

Calculate power-law exponents for C-Q relationships.

slopes = gcs.compute_cq_slope(
    pcd,
    qcol='Q_mLs',
    ccol='PLI'
)

# View slope results
print(slopes[['site_id', 'end_date', 'cq_slope_loglog',
              'cq_intercept_loglog', 'cq_r2']].head())

Understanding the Output

Hysteresis Metrics

HARP: - area: Loop area magnitude - peaktime_Q, peaktime_C: Peak timings (compare for flushing vs dilution) - residual: End-state deviation

Zuecco: - h_index: Integrated area (positive = clockwise, negative = counter-clockwise) - hyst_class: 0-8 classification (1-4 clockwise variants, 5-8 counter-clockwise)

Lloyd/Lawler: - mean_HInew: Recommended index, range [-1, 1] - mean_HIL: Original Lawler index - Direction: clockwise, counter-clockwise, or weak

Geochemical Phases

F (Flushing): Dilution during high flow, steep C decline
L (Loading): Enrichment, C increases with Q
C (Chemostatic): Buffered, low variability
D (Dilution): Post-flush recovery
R (Recession): Late cycle, declining
V (Variable): Mixed/ambiguous

Confidence Scores

> 0.9: High confidence - robust classification
0.7-0.9: Moderate confidence
< 0.7: Low confidence - investigate further, may need more data

Common Workflows

Compare Multiple Events

events = ['event1.csv', 'event2.csv', 'event3.csv']
results = []

for event_file in events:
    data = pd.read_csv(event_file)
    metrics = gcs.calculate_all_hysteresis_metrics(data, ...)
    results.append({
        'event': event_file,
        'harp_area': metrics['harp_metrics']['area'],
        'zuecco_hi': metrics['zuecco_metrics']['h_index'],
        'lloyd_hi': metrics['lloyd_metrics']['mean_HInew']
    })

summary = pd.DataFrame(results)
print(summary)

Multi-Site Summary Statistics

stats = gcs.create_hysteresis_summary_stats(
    classified,
    sites=['Site1', 'Site2', 'Site3'],
    ccol='PLI',
    hi_method='zuecco'
)

print(stats[['site_id', 'phase_name', 'percentage',
             'mean_duration_days']])

Multi-Compound Comparison

fig = gcs.create_multi_compound_hysteresis_plot(
    pcd, Qx,
    sites=['Site1'],
    ccols=['PLI', 'Ni', 'Zn'],
    compounds=['PLI', 'Ni', 'Zn'],
    conc_units=['µg/L', 'µg/L', 'µg/L'],
    qcol='Q_mLs',
    hi_method='zuecco'
)
fig.show()

Best Practices

Compare Multiple Methods - When all three hysteresis methods agree → high confidence - When they disagree → complex pattern, investigate
Validate with Known Events - Test on reference events with known behavior - Calibrate interpretation to your system
Check Data Quality - Remove obvious outliers - Ensure Q and C are synchronized - Verify no missing values in critical periods
Consider Context - Hysteresis alone doesn’t explain mechanisms - Combine with site knowledge, geology, hydrology - Use C-Q slopes for mechanistic insights
Use Confidence Scores - Low confidence → need more data or ambiguous pattern - High confidence → classification is robust

Troubleshooting

“Not enough data points”

Need at least 5 points for hysteresis, 10+ recommended for robust analysis.

“All NaN results”

Check: - Column names match your data - Data types are numeric - No missing values in the analysis window

“Classification seems wrong”

Verify input data quality
Check time series continuity
Examine diagnostic plots
Compare multiple methods

Next Steps

Explore the Examples notebooks
Read the Core Analysis Functions reference
Check Scientific Background for method details