Utilities (titli.utils)

The titli.utils module provides utility functions and classes for data handling, loss computation, and dataset management.

Overview

This module contains helper utilities that support the main feature extraction and IDS functionality.

Datasets

StreamingCSVDataset

class titli.utils.StreamingCSVDataset(*args, **kwargs)[source]

Bases: IterableDataset

__init__(feature_csv_path, label_csv_path, max_samples=None, transform=None, label_column=0, skip_header=True)[source]
__iter__()[source]

Iterator that supports multi-worker data loading

Loss Functions

RMSELoss

class titli.utils.RMSELoss(*args, **kwargs)[source]

Bases: Module

__init__()[source]
forward(x, z)[source]

Data Utilities

Directory Management

titli.utils.data.dataloader(pcap_path, fe)[source]
titli.utils.data.create_directories(dataset_name)[source]

Usage Examples

Using StreamingCSVDataset

Load and iterate over CSV data with separate feature and label files:

from titli.utils import StreamingCSVDataset
from torch.utils.data import DataLoader

# Create dataset with feature and label CSV files
dataset = StreamingCSVDataset(
    feature_csv_path="features.csv",
    label_csv_path="labels.csv",
    max_samples=100000,
    label_column=0  # Column index containing labels
)

# Create data loader
loader = DataLoader(
    dataset,
    batch_size=32,
    shuffle=False,
    num_workers=2
)

# Iterate over batches
for features, labels in loader:
    # Process batch
    pass

Standard Pattern with IDS Models

The standard workflow integrating StreamingCSVDataset with IDS models:

from titli.fe import AfterImage
from titli.utils import StreamingCSVDataset
from torch.utils.data import DataLoader
from titli.ids import OCSVM
import torch

# Step 1: Extract features to CSV
fe = AfterImage(file_path="traffic.pcap")
fe.extract_features(output_path="features.csv")

# Step 2: Create DataLoader
dataset = StreamingCSVDataset(
    feature_csv_path="features.csv",
    label_csv_path="labels.csv",
    max_samples=100000,
    label_column=0
)
train_loader = DataLoader(dataset, batch_size=32, shuffle=False, num_workers=2)

# Step 3: Train and evaluate model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
ids = OCSVM(dataset_name="my_dataset", input_size=dataset.input_size, device=device)
ids.train_model(train_loader)
ids.evaluate(train_loader)

Using RMSELoss

Use RMSE as a loss function in PyTorch:

import torch
from titli.utils import RMSELoss

# Initialize loss function
criterion = RMSELoss()

# Compute loss
predictions = model(inputs)
loss = criterion(predictions, targets)

# Backward pass
loss.backward()

Directory Management

Create organized directory structures for experiments:

from titli.utils.data import create_directories

# Create standard directory structure
create_directories("my_experiment")

# This creates:
# - my_experiment/models/
# - my_experiment/results/
# - my_experiment/logs/

Advanced Usage

Custom Dataset Processing

Process large CSV files in chunks:

from titli.utils import StreamingCSVDataset
import pandas as pd

# Create streaming dataset
dataset = StreamingCSVDataset(
    csv_path="large_file.csv",
    chunk_size=10000
)

# Process in chunks to avoid memory issues
for i, (features, labels) in enumerate(dataset):
    print(f"Processing chunk {i}")
    # Your processing logic here

Custom Loss Functions

Combine with other PyTorch loss functions:

import torch
import torch.nn as nn
from titli.utils import RMSELoss

class CombinedLoss(nn.Module):
    def __init__(self, alpha=0.5):
        super().__init__()
        self.rmse = RMSELoss()
        self.mse = nn.MSELoss()
        self.alpha = alpha

    def forward(self, pred, target):
        return self.alpha * self.rmse(pred, target) + \
               (1 - self.alpha) * self.mse(pred, target)

criterion = CombinedLoss(alpha=0.7)