Data Augmentation Overview
SeisPolarity provides a flexible data augmentation system with multiple techniques to improve model robustness and handle imbalanced datasets.
Basic Usage
Using GenericGenerator
1from seispolarity import WaveformDataset, GenericGenerator
2from seispolarity import Demean, Normalize, RandomTimeShift
3
4# Load dataset
5dataset = WaveformDataset(path="data.hdf5", name="SCSN", preload=False)
6
7# Create generator with augmentations
8generator = GenericGenerator(dataset)
9generator.add_augmentations([
10 Demean(),
11 Normalize(amp_norm_type="peak"),
12 RandomTimeShift(max_shift=10)
13])
14
15# Get dataloader
16loader = generator.get_dataloader(batch_size=256, num_workers=4)
Using BalancedPolarityGenerator
For imbalanced datasets with polarity labels:
1from seispolarity import BalancedPolarityGenerator
2from seispolarity import Demean, Normalize
3
4generator = BalancedPolarityGenerator(
5 dataset,
6 strategy="polarity_inversion" # or "min_based"
7)
8generator.add_augmentations([
9 Demean(),
10 Normalize()
11])
Available Augmentation Methods
1. Demean
Remove the mean from waveforms.
1from seispolarity import Demean
2
3augmentation = Demean()
Parameters: None
2. Normalize
Normalize waveforms by amplitude.
1from seispolarity import Normalize
2
3# Normalize by peak amplitude
4augmentation = Normalize(amp_norm_type="peak")
5
6# Normalize by RMS
7augmentation = Normalize(amp_norm_type="rms")
8
9# Normalize by maximum absolute value
10augmentation = Normalize(amp_norm_type="max")
Parameters:
amp_norm_type: Normalization type (“peak”, “rms”, “max”)
3. RandomTimeShift
Randomly shift waveforms in time.
1from seispolarity import RandomTimeShift
2
3# Shift by up to 10 samples
4augmentation = RandomTimeShift(max_shift=10)
Parameters:
max_shift: Maximum number of samples to shift (default: 10)
4. RandomPPickShift
Randomly shift the P-phase pick position.
1from seispolarity import RandomPPickShift
2
3# Shift P-pick by up to 5 samples
4augmentation = RandomPPickShift(max_shift=5)
Parameters:
max_shift: Maximum number of samples to shift (default: 5)
5. BandpassFilter
Apply a bandpass filter to waveforms.
1from seispolarity import BandpassFilter
2
3# Apply 1-20 Hz bandpass filter
4augmentation = BandpassFilter(freqmin=1.0, freqmax=20.0)
Parameters:
freqmin: Minimum frequency (Hz)freqmax: Maximum frequency (Hz)corners: Filter corners (default: 4)zerophase: Whether to use zero-phase filtering (default: True)
6. Detrend
Remove linear trend from waveforms.
1from seispolarity import Detrend
2
3augmentation = Detrend()
Parameters:
type: Detrend type (“linear” or “constant”)
7. PolarityInversion
Randomly invert the polarity of waveforms.
1from seispolarity import PolarityInversion
2
3# 50% probability of inversion
4augmentation = PolarityInversion(p=0.5)
Parameters:
p: Probability of polarity inversion (default: 0.5)
8. DifferentialFeatures
Compute differential features from waveforms.
1from seispolarity import DifferentialFeatures
2
3augmentation = DifferentialFeatures()
Parameters: None
9. ChangeDtype
Change the data type of waveforms.
1from seispolarity import ChangeDtype
2
3# Convert to float32
4augmentation = ChangeDtype(dtype="float32")
Parameters:
dtype: Target data type (“float32”, “float64”, etc.)
10. Stretching
Randomly stretch or compress waveforms.
1from seispolarity import Stretching
2
3# Stretch by up to 10%
4augmentation = Stretching(max_stretch=0.1)
Parameters:
max_stretch: Maximum stretch factor (default: 0.1)
11. DitingMotionLoss
Custom loss function for DiTing motion-based model.
1from seispolarity import DitingMotionLoss
2
3loss_fn = DitingMotionLoss()
Balanced Sampling
Polarity Inversion Strategy
This strategy creates a balanced dataset with equal proportions of Up, Down, and Unknown samples.
1from seispolarity import BalancedPolarityGenerator
2
3generator = BalancedPolarityGenerator(
4 dataset,
5 strategy="polarity_inversion"
6)
How it works:
Each Up and Down sample generates two samples (original + polarity-inverted)
Unknown samples are added to match the total count of (Up + Down) samples
Result: Equal distribution - Up = 1/3, Down = 1/3, Unknown = 1/3
This strategy is recommended for Instance and Txed datasets.
Min-Based Strategy
This strategy samples equally from the minority class.
1from seispolarity import BalancedPolarityGenerator
2
3generator = BalancedPolarityGenerator(
4 dataset,
5 strategy="min_based"
6)
How it works:
Count samples in each class
Determine the minimum count
Sample equally from each class up to the minimum count
Custom Augmentation
Create custom augmentation by subclassing the base class:
1from seispolarity.generate.augmentation import BaseAugmentation
2
3class CustomAugmentation(BaseAugmentation):
4 def __call__(self, waveform, label):
5 # Apply your custom transformation
6 augmented_waveform = self._apply_transformation(waveform)
7 return augmented_waveform, label
8
9 def _apply_transformation(self, waveform):
10 # Your transformation logic here
11 return waveform
12
13# Use it
14generator = GenericGenerator(dataset)
15generator.add_augmentations([
16 CustomAugmentation()
17])
Augmentation Pipeline
Combine multiple augmentations:
1from seispolarity import (
2 Demean,
3 Normalize,
4 RandomTimeShift,
5 BandpassFilter,
6 PolarityInversion
7)
8
9generator = GenericGenerator(dataset)
10generator.add_augmentations([
11 Demean(),
12 Normalize(amp_norm_type="peak"),
13 BandpassFilter(freqmin=1.0, freqmax=20.0),
14 RandomTimeShift(max_shift=10),
15 PolarityInversion(p=0.5)
16])
Data Preprocessing
Standard Preprocessing Pipeline
1from seispolarity import (
2 Demean,
3 Detrend,
4 Normalize,
5 BandpassFilter
6)
7
8generator = GenericGenerator(dataset)
9generator.add_augmentations([
10 Detrend(type="linear"),
11 Demean(),
12 BandpassFilter(freqmin=1.0, freqmax=20.0),
13 Normalize(amp_norm_type="peak")
14])
Training vs Validation
1# Training: include data augmentation
2train_generator = GenericGenerator(train_dataset)
3train_generator.add_augmentations([
4 Demean(),
5 Normalize(),
6 RandomTimeShift(max_shift=10),
7 PolarityInversion(p=0.5)
8])
9
10# Validation: only basic preprocessing
11val_generator = GenericGenerator(val_dataset)
12val_generator.add_augmentations([
13 Demean(),
14 Normalize()
15])
Visualization
Visualize Augmented Samples
1import matplotlib.pyplot as plt
2import numpy as np
3
4# Get original and augmented samples
5original_waveform, label = dataset[0]
6augmented_waveform, _ = generator[0]
7
8# Plot
9fig, axes = plt.subplots(2, 1, figsize=(10, 6))
10axes[0].plot(original_waveform[0])
11axes[0].set_title(f"Original (Label: {label})")
12axes[1].plot(augmented_waveform[0])
13axes[1].set_title("Augmented")
14plt.tight_layout()
15plt.show()
Performance Tips
Order matters: Apply normalization after other augmentations
Use carefully: Not all augmentations are appropriate for all tasks
Validate: Always validate on unaugmented data
Monitor loss: Watch for signs of over-augmentation
Dataset size: Use more augmentation for smaller datasets
Example: Complete Training with Augmentation
1from seispolarity import WaveformDataset, GenericGenerator
2from seispolarity.models import PPNet
3from seispolarity.training import Trainer, TrainingConfig
4from seispolarity import (
5 Demean,
6 Detrend,
7 Normalize,
8 BandpassFilter,
9 RandomTimeShift,
10 PolarityInversion
11)
12
13# Load dataset
14dataset = WaveformDataset(path="data.hdf5", name="SCSN")
15
16# Create generator with augmentations
17generator = GenericGenerator(dataset)
18generator.add_augmentations([
19 Detrend(type="linear"),
20 Demean(),
21 BandpassFilter(freqmin=1.0, freqmax=20.0),
22 Normalize(amp_norm_type="peak"),
23 RandomTimeShift(max_shift=10),
24 PolarityInversion(p=0.3)
25])
26
27# Create model and trainer
28model = PPNet(num_fm_classes=3)
29config = TrainingConfig(
30 batch_size=256,
31 epochs=50,
32 learning_rate=1e-4,
33 device="cuda"
34)
35
36trainer = Trainer(model=model, dataset=generator, config=config)
37trainer.train()
See API Reference for detailed API documentation.