API Reference Summary ===================== This document provides a consolidated view of the unified API that all IDS models implement. Public API Contract ------------------- All 6 IDS models (LOF, OCSVM, VAE, Autoencoder, ICL, KitNET) implement the same public API: Method Signatures ~~~~~~~~~~~~~~~~~ .. code-block:: python class BaseIDSModel: """Base interface for all IDS models""" def train_model(self, train_loader: DataLoader) -> None: """Train the model on training data. Args: train_loader: PyTorch DataLoader with training data """ def save(self, model_path: Optional[str] = None) -> None: """Save trained model to disk. Args: model_path: Path to save model. If None, uses default path: ./artifacts/{dataset_name}/models/{model_name}.pth """ def load(self, model_path: Optional[str] = None) -> dict: """Load trained model from disk. Args: model_path: Path to load model from. If None, uses default path. Returns: Checkpoint dictionary with model state """ def infer(self, test_loader: DataLoader) -> Tuple[np.ndarray, np.ndarray, np.ndarray]: """Lightweight inference without metrics computation. Args: test_loader: PyTorch DataLoader with test data Returns: Tuple of (y_true, y_pred, reconstruction_errors): - y_true: Ground truth labels - y_pred: Binary predictions (0=benign, 1=anomaly) - reconstruction_errors: Anomaly scores for each sample """ def evaluate(self, test_loader: DataLoader) -> None: """Full evaluation with metrics and visualization. Computes F1, Precision, Recall, Accuracy, AUC-ROC and generates: - Confusion matrix plot - ROC curve plot - Anomaly score plot - Metrics text file All artifacts saved to ./artifacts/{dataset_name}/ Args: test_loader: PyTorch DataLoader with test data """ Usage Pattern ~~~~~~~~~~~~~ Every model follows this exact pattern: .. code-block:: python from titli.ids import ModelName # Any of: LOF, OCSVM, VAE, Autoencoder, ICL, KitNET from titli.utils import StreamingCSVDataset from torch.utils.data import DataLoader import torch # Setup data dataset = StreamingCSVDataset( feature_csv_path="features.csv", label_csv_path="labels.csv" ) train_loader = DataLoader(dataset, batch_size=32) # Train device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = ModelName( dataset_name="my_dataset", input_size=dataset.input_size, device=device ) model.train_model(train_loader) # Persist model.save() # or model.save("custom_path.pth") model.load() # or model.load("custom_path.pth") # Inference y_true, y_pred, scores = model.infer(test_loader) # Evaluation model.evaluate(test_loader) Model Descriptions ------------------ LOF (Local Outlier Factor) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Type**: Traditional Machine Learning **Description**: Density-based anomaly detection using local outlier factors. **When to use**: Small to medium datasets with clear density-based outliers. Works well when anomalies have significantly different local densities than normal samples. **Initialization**: .. code-block:: python from titli.ids import LOF model = LOF( dataset_name="my_dataset", input_size=100, device=device ) OCSVM (One-Class SVM) ~~~~~~~~~~~~~~~~~~~~~~ **Type**: Traditional Machine Learning **Description**: Boundary-based anomaly detection using support vector machines. **When to use**: Datasets with clear decision boundaries. Effective for high-dimensional data and when you want a well-defined separation between normal and anomalous regions. **Initialization**: .. code-block:: python from titli.ids import OCSVM model = OCSVM( dataset_name="my_dataset", input_size=100, device=device ) Autoencoder ~~~~~~~~~~~ **Type**: Deep Learning **Description**: Reconstruction-based anomaly detection using neural networks. **When to use**: Complex patterns in high-dimensional data, GPU available. Learns to reconstruct normal patterns; anomalies produce higher reconstruction errors. **Initialization**: .. code-block:: python from titli.ids import Autoencoder model = Autoencoder( dataset_name="my_dataset", input_size=100, device=device ) VAE (Variational Autoencoder) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Type**: Deep Learning **Description**: Probabilistic deep learning with latent space modeling. **When to use**: When you need probabilistic anomaly scores or want to model the distribution of normal data in a latent space. Better for capturing uncertainty than standard autoencoders. **Initialization**: .. code-block:: python from titli.ids import VAE model = VAE( dataset_name="my_dataset", input_size=100, device=device ) ICL (Instance Contrastive Learning) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Type**: Contrastive Learning **Description**: Contrastive learning approach for anomaly detection. **When to use**: When you want to learn discriminative features through contrastive learning. Effective for scenarios where normal samples should cluster together in feature space. **Initialization**: .. code-block:: python from titli.ids import ICL model = ICL( dataset_name="my_dataset", input_size=100, device=device ) KitNET ~~~~~~ **Type**: Ensemble **Description**: Ensemble of autoencoders for online anomaly detection. **When to use**: Online/streaming scenarios, ensemble methods needed. KitNET adaptively creates an ensemble of small autoencoders, making it efficient for incremental learning. **Initialization**: .. code-block:: python from titli.ids import KitNET model = KitNET( dataset_name="my_dataset", input_size=100, device=device ) Complete Workflow Example -------------------------- Here's a complete example showing feature extraction through evaluation: .. code-block:: python from titli.fe import AfterImage from titli.utils import StreamingCSVDataset from torch.utils.data import DataLoader from titli.ids import OCSVM import torch # Step 1: Extract features fe = AfterImage(file_path="traffic.pcap") fe.extract_features(output_path="features.csv") # Step 2: Create DataLoader dataset = StreamingCSVDataset( feature_csv_path="features.csv", label_csv_path="labels.csv", max_samples=100000, label_column=0 ) train_loader = DataLoader(dataset, batch_size=32, shuffle=False) # Step 3: Train model device = torch.device("cuda" if torch.cuda.is_available() else "cpu") ids = OCSVM(dataset_name="my_dataset", input_size=dataset.input_size, device=device) ids.train_model(train_loader) # Step 4: Save and load ids.save() # Saves to ./artifacts/my_dataset/models/ocsvm.pth ids.load() # Loads from same path # Step 5: Inference test_loader = DataLoader(test_dataset, batch_size=32) y_true, y_pred, scores = ids.infer(test_loader) # Step 6: Full evaluation ids.evaluate(test_loader) Output Artifacts ---------------- Default Paths ~~~~~~~~~~~~~ All models save artifacts to standardized locations: .. code-block:: text ./artifacts/{dataset_name}/ ├── models/ │ └── {model_name}.pth # Trained model ├── objects/ │ └── metrics/ │ └── {model_name}.txt # Metrics file └── plots/ ├── confusion_matrix/ │ └── {model_name}.png # Confusion matrix plot ├── roc/ │ └── {model_name}.png # ROC curve plot └── anomaly/ └── {model_name}.png # Anomaly score plot Metrics File Content ~~~~~~~~~~~~~~~~~~~~ Example ``metrics.txt`` content: .. code-block:: text Accuracy: 0.956 Precision: 0.892 Recall(TPR): 0.847 F1 Score: 0.869 Confusion Matrix: TP: 1234 TN: 8765 FP: 234 FN: 167 TPR (Recall): 0.847 FNR: 0.153 FPR: 0.026 TNR: 0.974 AUC-ROC: 0.9234 Key Differences: infer() vs evaluate() -------------------------------------- **infer()** - Lightweight Inference ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * **Purpose**: Get predictions without generating artifacts * **Use cases**: - Online detection systems - Streaming scenarios - Custom workflows - Integration with external systems * **Returns**: ``(y_true, y_pred, reconstruction_errors)`` * **No side effects**: Doesn't save anything to disk **evaluate()** - Full Evaluation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * **Purpose**: Comprehensive model evaluation with visualization * **Use cases**: - Model benchmarking - Performance analysis - Development and experimentation - Generating reports * **Returns**: ``None`` * **Side effects**: Creates plots and metrics files Example Comparison ~~~~~~~~~~~~~~~~~~ .. code-block:: python # Use infer() for production/online detection y_true, y_pred, scores = model.infer(test_loader) # Now use predictions in your application anomaly_rate = y_pred.sum() / len(y_pred) send_alert_if_threshold_exceeded(anomaly_rate) # Use evaluate() for development/analysis model.evaluate(test_loader) # Generates all plots and metrics automatically # Review ./artifacts/{dataset_name}/ for results Best Practices -------------- 1. **Always use DataLoaders**: Never pass raw arrays to train/infer/evaluate methods 2. **Consistent batch sizes**: Use smaller batches (16-32) for training, larger for inference (64+) 3. **Save frequently**: Call ``save()`` after training to preserve your work 4. **Use default paths**: Let Titli manage paths automatically unless you have specific needs 5. **Choose the right method**: Use ``infer()`` in production, ``evaluate()`` for development 6. **Try multiple models**: The unified API makes it trivial to compare different models 7. **GPU when available**: Pass ``device=torch.device("cuda")`` for deep learning models Common Patterns --------------- Model Comparison ~~~~~~~~~~~~~~~~ .. code-block:: python from titli.ids import LOF, OCSVM, Autoencoder models = { "LOF": LOF(dataset_name="comparison", input_size=100, device=device), "OCSVM": OCSVM(dataset_name="comparison", input_size=100, device=device), "Autoencoder": Autoencoder(dataset_name="comparison", input_size=100, device=device) } for name, model in models.items(): print(f"Training {name}...") model.train_model(train_loader) model.save() model.evaluate(test_loader) print(f"{name} complete!\n") Cross-Validation ~~~~~~~~~~~~~~~~ .. code-block:: python from sklearn.model_selection import KFold kfold = KFold(n_splits=5) for fold, (train_idx, val_idx) in enumerate(kfold.split(data)): # Create fold-specific loaders train_loader = create_loader(data[train_idx]) val_loader = create_loader(data[val_idx]) # Train and evaluate model = OCSVM(dataset_name=f"fold_{fold}", input_size=100, device=device) model.train_model(train_loader) model.evaluate(val_loader) Hyperparameter Search ~~~~~~~~~~~~~~~~~~~~~ .. code-block:: python learning_rates = [0.001, 0.01, 0.1] for lr in learning_rates: model = Autoencoder( dataset_name=f"lr_{lr}", input_size=100, device=device ) model.learning_rate = lr # Set hyperparameter model.train_model(train_loader) model.evaluate(test_loader)