Author: He XingChen
Last Updated: 2026-02-12

Data Augmentation Overview

SeisPolarity provides a flexible data augmentation system with multiple techniques to improve model robustness and handle imbalanced datasets.

Basic Usage

Using GenericGenerator

 1from seispolarity import WaveformDataset, GenericGenerator
 2from seispolarity import Demean, Normalize, RandomTimeShift
 3
 4# Load dataset
 5dataset = WaveformDataset(path="data.hdf5", name="SCSN", preload=False)
 6
 7# Create generator with augmentations
 8generator = GenericGenerator(dataset)
 9generator.add_augmentations([
10    Demean(),
11    Normalize(amp_norm_type="peak"),
12    RandomTimeShift(max_shift=10)
13])
14
15# Get dataloader
16loader = generator.get_dataloader(batch_size=256, num_workers=4)

Using BalancedPolarityGenerator

For imbalanced datasets with polarity labels:

 1from seispolarity import BalancedPolarityGenerator
 2from seispolarity import Demean, Normalize
 3
 4generator = BalancedPolarityGenerator(
 5    dataset,
 6    strategy="polarity_inversion"  # or "min_based"
 7)
 8generator.add_augmentations([
 9    Demean(),
10    Normalize()
11])

Available Augmentation Methods

1. Demean

Remove the mean from waveforms.

1from seispolarity import Demean
2
3augmentation = Demean()

Parameters: None

2. Normalize

Normalize waveforms by amplitude.

 1from seispolarity import Normalize
 2
 3# Normalize by peak amplitude
 4augmentation = Normalize(amp_norm_type="peak")
 5
 6# Normalize by RMS
 7augmentation = Normalize(amp_norm_type="rms")
 8
 9# Normalize by maximum absolute value
10augmentation = Normalize(amp_norm_type="max")

Parameters:

  • amp_norm_type: Normalization type (“peak”, “rms”, “max”)

3. RandomTimeShift

Randomly shift waveforms in time.

1from seispolarity import RandomTimeShift
2
3# Shift by up to 10 samples
4augmentation = RandomTimeShift(max_shift=10)

Parameters:

  • max_shift: Maximum number of samples to shift (default: 10)

4. RandomPPickShift

Randomly shift the P-phase pick position.

1from seispolarity import RandomPPickShift
2
3# Shift P-pick by up to 5 samples
4augmentation = RandomPPickShift(max_shift=5)

Parameters:

  • max_shift: Maximum number of samples to shift (default: 5)

5. BandpassFilter

Apply a bandpass filter to waveforms.

1from seispolarity import BandpassFilter
2
3# Apply 1-20 Hz bandpass filter
4augmentation = BandpassFilter(freqmin=1.0, freqmax=20.0)

Parameters:

  • freqmin: Minimum frequency (Hz)

  • freqmax: Maximum frequency (Hz)

  • corners: Filter corners (default: 4)

  • zerophase: Whether to use zero-phase filtering (default: True)

6. Detrend

Remove linear trend from waveforms.

1from seispolarity import Detrend
2
3augmentation = Detrend()

Parameters:

  • type: Detrend type (“linear” or “constant”)

7. PolarityInversion

Randomly invert the polarity of waveforms.

1from seispolarity import PolarityInversion
2
3# 50% probability of inversion
4augmentation = PolarityInversion(p=0.5)

Parameters:

  • p: Probability of polarity inversion (default: 0.5)

8. DifferentialFeatures

Compute differential features from waveforms.

1from seispolarity import DifferentialFeatures
2
3augmentation = DifferentialFeatures()

Parameters: None

9. ChangeDtype

Change the data type of waveforms.

1from seispolarity import ChangeDtype
2
3# Convert to float32
4augmentation = ChangeDtype(dtype="float32")

Parameters:

  • dtype: Target data type (“float32”, “float64”, etc.)

10. Stretching

Randomly stretch or compress waveforms.

1from seispolarity import Stretching
2
3# Stretch by up to 10%
4augmentation = Stretching(max_stretch=0.1)

Parameters:

  • max_stretch: Maximum stretch factor (default: 0.1)

11. DitingMotionLoss

Custom loss function for DiTing motion-based model.

1from seispolarity import DitingMotionLoss
2
3loss_fn = DitingMotionLoss()

Balanced Sampling

Polarity Inversion Strategy

This strategy creates a balanced dataset with equal proportions of Up, Down, and Unknown samples.

1from seispolarity import BalancedPolarityGenerator
2
3generator = BalancedPolarityGenerator(
4    dataset,
5    strategy="polarity_inversion"
6)

How it works:

  1. Each Up and Down sample generates two samples (original + polarity-inverted)

  2. Unknown samples are added to match the total count of (Up + Down) samples

  3. Result: Equal distribution - Up = 1/3, Down = 1/3, Unknown = 1/3

This strategy is recommended for Instance and Txed datasets.

Min-Based Strategy

This strategy samples equally from the minority class.

1from seispolarity import BalancedPolarityGenerator
2
3generator = BalancedPolarityGenerator(
4    dataset,
5    strategy="min_based"
6)

How it works:

  1. Count samples in each class

  2. Determine the minimum count

  3. Sample equally from each class up to the minimum count

Custom Augmentation

Create custom augmentation by subclassing the base class:

 1from seispolarity.generate.augmentation import BaseAugmentation
 2
 3class CustomAugmentation(BaseAugmentation):
 4    def __call__(self, waveform, label):
 5        # Apply your custom transformation
 6        augmented_waveform = self._apply_transformation(waveform)
 7        return augmented_waveform, label
 8
 9    def _apply_transformation(self, waveform):
10        # Your transformation logic here
11        return waveform
12
13# Use it
14generator = GenericGenerator(dataset)
15generator.add_augmentations([
16    CustomAugmentation()
17])

Augmentation Pipeline

Combine multiple augmentations:

 1from seispolarity import (
 2    Demean,
 3    Normalize,
 4    RandomTimeShift,
 5    BandpassFilter,
 6    PolarityInversion
 7)
 8
 9generator = GenericGenerator(dataset)
10generator.add_augmentations([
11    Demean(),
12    Normalize(amp_norm_type="peak"),
13    BandpassFilter(freqmin=1.0, freqmax=20.0),
14    RandomTimeShift(max_shift=10),
15    PolarityInversion(p=0.5)
16])

Data Preprocessing

Standard Preprocessing Pipeline

 1from seispolarity import (
 2    Demean,
 3    Detrend,
 4    Normalize,
 5    BandpassFilter
 6)
 7
 8generator = GenericGenerator(dataset)
 9generator.add_augmentations([
10    Detrend(type="linear"),
11    Demean(),
12    BandpassFilter(freqmin=1.0, freqmax=20.0),
13    Normalize(amp_norm_type="peak")
14])

Training vs Validation

 1# Training: include data augmentation
 2train_generator = GenericGenerator(train_dataset)
 3train_generator.add_augmentations([
 4    Demean(),
 5    Normalize(),
 6    RandomTimeShift(max_shift=10),
 7    PolarityInversion(p=0.5)
 8])
 9
10# Validation: only basic preprocessing
11val_generator = GenericGenerator(val_dataset)
12val_generator.add_augmentations([
13    Demean(),
14    Normalize()
15])

Visualization

Visualize Augmented Samples

 1import matplotlib.pyplot as plt
 2import numpy as np
 3
 4# Get original and augmented samples
 5original_waveform, label = dataset[0]
 6augmented_waveform, _ = generator[0]
 7
 8# Plot
 9fig, axes = plt.subplots(2, 1, figsize=(10, 6))
10axes[0].plot(original_waveform[0])
11axes[0].set_title(f"Original (Label: {label})")
12axes[1].plot(augmented_waveform[0])
13axes[1].set_title("Augmented")
14plt.tight_layout()
15plt.show()

Performance Tips

  1. Order matters: Apply normalization after other augmentations

  2. Use carefully: Not all augmentations are appropriate for all tasks

  3. Validate: Always validate on unaugmented data

  4. Monitor loss: Watch for signs of over-augmentation

  5. Dataset size: Use more augmentation for smaller datasets

Example: Complete Training with Augmentation

 1from seispolarity import WaveformDataset, GenericGenerator
 2from seispolarity.models import PPNet
 3from seispolarity.training import Trainer, TrainingConfig
 4from seispolarity import (
 5    Demean,
 6    Detrend,
 7    Normalize,
 8    BandpassFilter,
 9    RandomTimeShift,
10    PolarityInversion
11)
12
13# Load dataset
14dataset = WaveformDataset(path="data.hdf5", name="SCSN")
15
16# Create generator with augmentations
17generator = GenericGenerator(dataset)
18generator.add_augmentations([
19    Detrend(type="linear"),
20    Demean(),
21    BandpassFilter(freqmin=1.0, freqmax=20.0),
22    Normalize(amp_norm_type="peak"),
23    RandomTimeShift(max_shift=10),
24    PolarityInversion(p=0.3)
25])
26
27# Create model and trainer
28model = PPNet(num_fm_classes=3)
29config = TrainingConfig(
30    batch_size=256,
31    epochs=50,
32    learning_rate=1e-4,
33    device="cuda"
34)
35
36trainer = Trainer(model=model, dataset=generator, config=config)
37trainer.train()

See API Reference for detailed API documentation.