Quick Start Guide
This guide will help you get started with HyGCS for common analysis tasks.
Import the Package
import hygcs as gcs
import pandas as pd
import numpy as np
Single Event Hysteresis Analysis
For analyzing individual storm events or flushing episodes.
Load Your Data
Your data should contain:
Time column (datetime or numeric)
Discharge/flow column (Q)
Concentration column (C)
Minimum 10-15 points recommended, ideally 20-30 points covering the full event.
# Load event data
data = pd.read_csv('storm_event.csv')
# Ensure time is datetime if needed
data['datetime'] = pd.to_datetime(data['datetime'])
Calculate Hysteresis Metrics
Use the comprehensive wrapper function:
metrics = gcs.calculate_all_hysteresis_metrics(
data,
time_col='datetime',
discharge_col='Q_Ls',
concentration_col='NO3_mgL'
)
Extract Results
# HARP method results
harp = metrics['harp_metrics']
print(f"HARP Area: {harp['area']:.4f}")
print(f"Peak Q timing: {harp['peaktime_Q']:.2f} days")
print(f"Peak C timing: {harp['peaktime_C']:.2f} days")
# Zuecco method results
zuecco = metrics['zuecco_metrics']
print(f"Zuecco h-index: {zuecco['h_index']:.4f}")
print(f"Zuecco class: {zuecco['hyst_class']}")
# Lloyd/Lawler results
lloyd = metrics['lloyd_metrics']
print(f"Lloyd HInew: {lloyd['mean_HInew']:.4f}")
print(f"Direction: {metrics['classifications']['lloyd_direction']}")
Visualize Individual Methods
# Get full DataFrames for plotting
from hygcs.harp import calculate_harp_metrics, harp_plot
harp_m, harp_df = calculate_harp_metrics(data,
time_col='datetime',
discharge_col='Q_Ls',
concentration_col='NO3_mgL')
fig = harp_plot(harp_df, harp_m)
fig.show()
Time Series Classification
For long-term monitoring data with multiple sampling cycles.
Prepare Your Data
Required columns:
site_id: Site identifier (string)date: DatetimeFlow column (your choice of name)
Concentration column (your choice of name)
# Load monitoring time series
pcd = pd.read_csv('monitoring_data.csv')
pcd['date'] = pd.to_datetime(pcd['date'])
# Check data structure
print(pcd[['site_id', 'date', 'Q_mLs', 'PLI']].head())
Optional: High-Resolution Flow Data
For improved accuracy, provide hourly flow data:
# Load high-res Q data
Qx = pd.read_csv('flow_hourly.csv', index_col=0, parse_dates=True)
Run Classification
classified = gcs.classify_geochemical_phase(
pcd,
sites=['Site1', 'Site2', 'Site3'],
flow_highres=Qx, # Optional
ccol='PLI', # Your concentration column
qcol='Q_mLs', # Your flow column
use_highres=True # Set False if no Qx provided
)
Examine Results
# View classification results
print(classified[['site_id', 'start_date', 'end_date',
'geochemical_phase', 'phase_confidence']].head(10))
# Check confidence scores
print(f"Mean confidence: {classified['phase_confidence'].mean():.2f}")
print(f"Low confidence (<0.7): {(classified['phase_confidence'] < 0.7).sum()} segments")
Visualize Phase Sequence
# Create phase timeline
fig = gcs.create_phase_sequence_plot(classified, sites=['Site1', 'Site2'])
fig.show()
Create Diagnostic Plots
# CVc/CVq vs C-Q slope space
fig_diag = gcs.create_diagnostic_plot(classified)
fig_diag.show()
Create Hysteresis Plots
# Single site hysteresis loops
fig_hyst = gcs.create_hysteresis_plot(
classified,
sites=['Site1'],
ccol='PLI',
qcol='Q_mLs',
compound='PLI',
conc_unit='µg/L'
)
fig_hyst.show()
CVc/CVq Variability Analysis
Analyze chemostatic vs. chemodynamic behavior.
Compute Rolling Windows
cvc_cvq = gcs.compute_cvc_cvq_windows(
pcd,
qcol='Q_mLs',
ccol='PLI',
window=5 # Number of samples per window
)
# View results
print(cvc_cvq[['site_id', 'end_date', 'CVc_CVq', 'cq_slope_loglog']].head())
Interpret Results
CVc/CVq > 1: Chemodynamic - concentration varies more than flow
CVc/CVq < 1: Chemostatic - concentration buffered relative to flow
cq_slope_loglog > 0.15: Dilution/flushing signature
cq_slope_loglog < -0.15: Enrichment/loading signature
|cq_slope_loglog| < 0.1: Chemostatic buffering
C-Q Slope Calculation
Calculate power-law exponents for C-Q relationships.
slopes = gcs.compute_cq_slope(
pcd,
qcol='Q_mLs',
ccol='PLI'
)
# View slope results
print(slopes[['site_id', 'end_date', 'cq_slope_loglog',
'cq_intercept_loglog', 'cq_r2']].head())
Understanding the Output
Hysteresis Metrics
HARP:
- area: Loop area magnitude
- peaktime_Q, peaktime_C: Peak timings (compare for flushing vs dilution)
- residual: End-state deviation
Zuecco:
- h_index: Integrated area (positive = clockwise, negative = counter-clockwise)
- hyst_class: 0-8 classification (1-4 clockwise variants, 5-8 counter-clockwise)
Lloyd/Lawler:
- mean_HInew: Recommended index, range [-1, 1]
- mean_HIL: Original Lawler index
- Direction: clockwise, counter-clockwise, or weak
Geochemical Phases
F (Flushing): Dilution during high flow, steep C decline
L (Loading): Enrichment, C increases with Q
C (Chemostatic): Buffered, low variability
D (Dilution): Post-flush recovery
R (Recession): Late cycle, declining
V (Variable): Mixed/ambiguous
Confidence Scores
> 0.9: High confidence - robust classification
0.7-0.9: Moderate confidence
< 0.7: Low confidence - investigate further, may need more data
Common Workflows
Compare Multiple Events
events = ['event1.csv', 'event2.csv', 'event3.csv']
results = []
for event_file in events:
data = pd.read_csv(event_file)
metrics = gcs.calculate_all_hysteresis_metrics(data, ...)
results.append({
'event': event_file,
'harp_area': metrics['harp_metrics']['area'],
'zuecco_hi': metrics['zuecco_metrics']['h_index'],
'lloyd_hi': metrics['lloyd_metrics']['mean_HInew']
})
summary = pd.DataFrame(results)
print(summary)
Multi-Site Summary Statistics
stats = gcs.create_hysteresis_summary_stats(
classified,
sites=['Site1', 'Site2', 'Site3'],
ccol='PLI',
hi_method='zuecco'
)
print(stats[['site_id', 'phase_name', 'percentage',
'mean_duration_days']])
Multi-Compound Comparison
fig = gcs.create_multi_compound_hysteresis_plot(
pcd, Qx,
sites=['Site1'],
ccols=['PLI', 'Ni', 'Zn'],
compounds=['PLI', 'Ni', 'Zn'],
conc_units=['µg/L', 'µg/L', 'µg/L'],
qcol='Q_mLs',
hi_method='zuecco'
)
fig.show()
Best Practices
Compare Multiple Methods - When all three hysteresis methods agree → high confidence - When they disagree → complex pattern, investigate
Validate with Known Events - Test on reference events with known behavior - Calibrate interpretation to your system
Check Data Quality - Remove obvious outliers - Ensure Q and C are synchronized - Verify no missing values in critical periods
Consider Context - Hysteresis alone doesn’t explain mechanisms - Combine with site knowledge, geology, hydrology - Use C-Q slopes for mechanistic insights
Use Confidence Scores - Low confidence → need more data or ambiguous pattern - High confidence → classification is robust
Troubleshooting
“Not enough data points”
Need at least 5 points for hysteresis, 10+ recommended for robust analysis.
“All NaN results”
Check: - Column names match your data - Data types are numeric - No missing values in the analysis window
“Classification seems wrong”
Verify input data quality
Check time series continuity
Examine diagnostic plots
Compare multiple methods
Next Steps
Explore the Examples notebooks
Read the Core Analysis Functions reference
Check Scientific Background for method details