API Reference Summary
This document provides a consolidated view of the unified API that all IDS models implement.
Public API Contract
All 6 IDS models (LOF, OCSVM, VAE, Autoencoder, ICL, KitNET) implement the same public API:
Method Signatures
class BaseIDSModel:
"""Base interface for all IDS models"""
def train_model(self, train_loader: DataLoader) -> None:
"""Train the model on training data.
Args:
train_loader: PyTorch DataLoader with training data
"""
def save(self, model_path: Optional[str] = None) -> None:
"""Save trained model to disk.
Args:
model_path: Path to save model. If None, uses default path:
./artifacts/{dataset_name}/models/{model_name}.pth
"""
def load(self, model_path: Optional[str] = None) -> dict:
"""Load trained model from disk.
Args:
model_path: Path to load model from. If None, uses default path.
Returns:
Checkpoint dictionary with model state
"""
def infer(self, test_loader: DataLoader) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
"""Lightweight inference without metrics computation.
Args:
test_loader: PyTorch DataLoader with test data
Returns:
Tuple of (y_true, y_pred, reconstruction_errors):
- y_true: Ground truth labels
- y_pred: Binary predictions (0=benign, 1=anomaly)
- reconstruction_errors: Anomaly scores for each sample
"""
def evaluate(self, test_loader: DataLoader) -> None:
"""Full evaluation with metrics and visualization.
Computes F1, Precision, Recall, Accuracy, AUC-ROC and generates:
- Confusion matrix plot
- ROC curve plot
- Anomaly score plot
- Metrics text file
All artifacts saved to ./artifacts/{dataset_name}/
Args:
test_loader: PyTorch DataLoader with test data
"""
Usage Pattern
Every model follows this exact pattern:
from titli.ids import ModelName # Any of: LOF, OCSVM, VAE, Autoencoder, ICL, KitNET
from titli.utils import StreamingCSVDataset
from torch.utils.data import DataLoader
import torch
# Setup data
dataset = StreamingCSVDataset(
feature_csv_path="features.csv",
label_csv_path="labels.csv"
)
train_loader = DataLoader(dataset, batch_size=32)
# Train
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ModelName(
dataset_name="my_dataset",
input_size=dataset.input_size,
device=device
)
model.train_model(train_loader)
# Persist
model.save() # or model.save("custom_path.pth")
model.load() # or model.load("custom_path.pth")
# Inference
y_true, y_pred, scores = model.infer(test_loader)
# Evaluation
model.evaluate(test_loader)
Model Descriptions
LOF (Local Outlier Factor)
Type: Traditional Machine Learning
Description: Density-based anomaly detection using local outlier factors.
When to use: Small to medium datasets with clear density-based outliers. Works well when anomalies have significantly different local densities than normal samples.
Initialization:
from titli.ids import LOF
model = LOF(
dataset_name="my_dataset",
input_size=100,
device=device
)
OCSVM (One-Class SVM)
Type: Traditional Machine Learning
Description: Boundary-based anomaly detection using support vector machines.
When to use: Datasets with clear decision boundaries. Effective for high-dimensional data and when you want a well-defined separation between normal and anomalous regions.
Initialization:
from titli.ids import OCSVM
model = OCSVM(
dataset_name="my_dataset",
input_size=100,
device=device
)
Autoencoder
Type: Deep Learning
Description: Reconstruction-based anomaly detection using neural networks.
When to use: Complex patterns in high-dimensional data, GPU available. Learns to reconstruct normal patterns; anomalies produce higher reconstruction errors.
Initialization:
from titli.ids import Autoencoder
model = Autoencoder(
dataset_name="my_dataset",
input_size=100,
device=device
)
VAE (Variational Autoencoder)
Type: Deep Learning
Description: Probabilistic deep learning with latent space modeling.
When to use: When you need probabilistic anomaly scores or want to model the distribution of normal data in a latent space. Better for capturing uncertainty than standard autoencoders.
Initialization:
from titli.ids import VAE
model = VAE(
dataset_name="my_dataset",
input_size=100,
device=device
)
ICL (Instance Contrastive Learning)
Type: Contrastive Learning
Description: Contrastive learning approach for anomaly detection.
When to use: When you want to learn discriminative features through contrastive learning. Effective for scenarios where normal samples should cluster together in feature space.
Initialization:
from titli.ids import ICL
model = ICL(
dataset_name="my_dataset",
input_size=100,
device=device
)
KitNET
Type: Ensemble
Description: Ensemble of autoencoders for online anomaly detection.
When to use: Online/streaming scenarios, ensemble methods needed. KitNET adaptively creates an ensemble of small autoencoders, making it efficient for incremental learning.
Initialization:
from titli.ids import KitNET
model = KitNET(
dataset_name="my_dataset",
input_size=100,
device=device
)
Complete Workflow Example
Here’s a complete example showing feature extraction through evaluation:
from titli.fe import AfterImage
from titli.utils import StreamingCSVDataset
from torch.utils.data import DataLoader
from titli.ids import OCSVM
import torch
# Step 1: Extract features
fe = AfterImage(file_path="traffic.pcap")
fe.extract_features(output_path="features.csv")
# Step 2: Create DataLoader
dataset = StreamingCSVDataset(
feature_csv_path="features.csv",
label_csv_path="labels.csv",
max_samples=100000,
label_column=0
)
train_loader = DataLoader(dataset, batch_size=32, shuffle=False)
# Step 3: Train model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
ids = OCSVM(dataset_name="my_dataset", input_size=dataset.input_size, device=device)
ids.train_model(train_loader)
# Step 4: Save and load
ids.save() # Saves to ./artifacts/my_dataset/models/ocsvm.pth
ids.load() # Loads from same path
# Step 5: Inference
test_loader = DataLoader(test_dataset, batch_size=32)
y_true, y_pred, scores = ids.infer(test_loader)
# Step 6: Full evaluation
ids.evaluate(test_loader)
Output Artifacts
Default Paths
All models save artifacts to standardized locations:
./artifacts/{dataset_name}/
├── models/
│ └── {model_name}.pth # Trained model
├── objects/
│ └── metrics/
│ └── {model_name}.txt # Metrics file
└── plots/
├── confusion_matrix/
│ └── {model_name}.png # Confusion matrix plot
├── roc/
│ └── {model_name}.png # ROC curve plot
└── anomaly/
└── {model_name}.png # Anomaly score plot
Metrics File Content
Example metrics.txt content:
Accuracy: 0.956
Precision: 0.892
Recall(TPR): 0.847
F1 Score: 0.869
Confusion Matrix:
TP: 1234
TN: 8765
FP: 234
FN: 167
TPR (Recall): 0.847
FNR: 0.153
FPR: 0.026
TNR: 0.974
AUC-ROC: 0.9234
Key Differences: infer() vs evaluate()
infer() - Lightweight Inference
Purpose: Get predictions without generating artifacts
- Use cases:
Online detection systems
Streaming scenarios
Custom workflows
Integration with external systems
Returns:
(y_true, y_pred, reconstruction_errors)No side effects: Doesn’t save anything to disk
evaluate() - Full Evaluation
Purpose: Comprehensive model evaluation with visualization
- Use cases:
Model benchmarking
Performance analysis
Development and experimentation
Generating reports
Returns:
NoneSide effects: Creates plots and metrics files
Example Comparison
# Use infer() for production/online detection
y_true, y_pred, scores = model.infer(test_loader)
# Now use predictions in your application
anomaly_rate = y_pred.sum() / len(y_pred)
send_alert_if_threshold_exceeded(anomaly_rate)
# Use evaluate() for development/analysis
model.evaluate(test_loader)
# Generates all plots and metrics automatically
# Review ./artifacts/{dataset_name}/ for results
Best Practices
Always use DataLoaders: Never pass raw arrays to train/infer/evaluate methods
Consistent batch sizes: Use smaller batches (16-32) for training, larger for inference (64+)
Save frequently: Call
save()after training to preserve your workUse default paths: Let Titli manage paths automatically unless you have specific needs
Choose the right method: Use
infer()in production,evaluate()for developmentTry multiple models: The unified API makes it trivial to compare different models
GPU when available: Pass
device=torch.device("cuda")for deep learning models
Common Patterns
Model Comparison
from titli.ids import LOF, OCSVM, Autoencoder
models = {
"LOF": LOF(dataset_name="comparison", input_size=100, device=device),
"OCSVM": OCSVM(dataset_name="comparison", input_size=100, device=device),
"Autoencoder": Autoencoder(dataset_name="comparison", input_size=100, device=device)
}
for name, model in models.items():
print(f"Training {name}...")
model.train_model(train_loader)
model.save()
model.evaluate(test_loader)
print(f"{name} complete!\n")
Cross-Validation
from sklearn.model_selection import KFold
kfold = KFold(n_splits=5)
for fold, (train_idx, val_idx) in enumerate(kfold.split(data)):
# Create fold-specific loaders
train_loader = create_loader(data[train_idx])
val_loader = create_loader(data[val_idx])
# Train and evaluate
model = OCSVM(dataset_name=f"fold_{fold}", input_size=100, device=device)
model.train_model(train_loader)
model.evaluate(val_loader)
Hyperparameter Search
learning_rates = [0.001, 0.01, 0.1]
for lr in learning_rates:
model = Autoencoder(
dataset_name=f"lr_{lr}",
input_size=100,
device=device
)
model.learning_rate = lr # Set hyperparameter
model.train_model(train_loader)
model.evaluate(test_loader)