Feature Extractors (titli.fe)

The titli.fe module provides feature extraction capabilities for transforming raw network traffic into feature vectors suitable for machine learning models.

Overview

Feature extractors are responsible for converting network packet data (typically from PCAP files) into numerical feature vectors. These features capture various aspects of network traffic such as temporal patterns, statistical properties, and protocol-specific information.

Available Feature Extractors

AfterImage

class titli.fe.AfterImage(file_path, limit=inf, decay_factors=[5, 3, 1, 0.1, 0.01], max_pkt=inf, dataset_name=None, state=None, **kwargs)[source]

Bases: BaseTrafficFeatureExtractor

AfterImage feature extractor using incremental statistics over time windows.

AfterImage is a packet-based feature extractor that maintains incremental statistics across multiple exponentially-decaying time windows. It tracks network flows at different granularities (MAC-IP pairs, channels, sockets, jitter) to capture both normal and anomalous traffic patterns.

limit

Maximum number of records to store in the database

Type:

float

decay_factors

Time windows for exponential decay (in seconds)

Type:

list

max_pkt

Maximum number of packets to process

Type:

float

state

Internal state maintaining network statistics

Type:

NetStat

__init__(file_path, limit=inf, decay_factors=[5, 3, 1, 0.1, 0.01], max_pkt=inf, dataset_name=None, state=None, **kwargs)[source]

Initialize AfterImage feature extractor.

Parameters:
  • file_path (str) – Path to the PCAP file to process

  • limit (float, optional) – Maximum number of records in the statistics database. Defaults to inf (unlimited)

  • decay_factors (list, optional) – Time windows (in seconds) for exponential decay. Defaults to [5, 3, 1, 0.1, 0.01]

  • max_pkt (float, optional) – Maximum number of packets to process. Defaults to inf (all packets)

  • dataset_name (str, optional) – Name of the dataset (deprecated). Defaults to None

  • state (NetStat, optional) – Pre-existing state to continue from. Defaults to None

  • **kwargs – Additional arguments passed to base class

setup(output_path=None)[source]

Set up AfterImage with NetStat database if starting fresh.

Parameters:

output_path (str or Path, optional) – Custom output path for features. Defaults to None (uses PCAP directory)

peek(traffic_vectors)[source]

Simulate feature extraction without modifying internal state.

Creates a temporary copy of the statistics database, applies traffic vectors to it, and returns features without persisting changes.

Parameters:

traffic_vectors (list) – List of traffic vectors, each containing [IPtype, srcMAC, dstMAC, srcIP, srcproto, dstIP, dstproto, timestamp, size]

Returns:

List of feature arrays corresponding to each traffic vector

Return type:

list

extract_features(output_path=None)[source]

Extract features from PCAP file packet by packet.

Main processing loop that: 1. Reads packets from PCAP file 2. Extracts traffic vectors from each packet 3. Computes features using incremental statistics 4. Writes results to CSV files in batches 5. Handles timestamp offsets when continuing from existing state

Parameters:

output_path (str or Path, optional) – Custom output path for features. Defaults to None (uses PCAP directory)

update(traffic_vector)[source]

Update internal state and compute features for a traffic vector.

Parameters:

traffic_vector (list) – Traffic vector containing [IPtype, srcMAC, dstMAC, srcIP, srcproto, dstIP, dstproto, timestamp, size]

Returns:

Extracted features from incremental statistics

Return type:

np.ndarray

get_traffic_vector(packet)[source]

Extract traffic vector from a network packet.

Processes IP, IPv6, TCP, UDP, ARP, and ICMP packets to extract network flow identifiers and metadata.

Parameters:

packet (scapy.packet.Packet) – Input packet from PCAP

Returns:

Traffic vector as [IPtype, srcMAC, dstMAC, srcIP, srcproto,

dstIP, dstproto, timestamp, size], or None if packet should be skipped

Return type:

list or None

get_headers()[source]

Get feature column names for CSV output.

Returns:

Feature names combining stream type, time window, and statistic

Return type:

list[str]

get_meta_headers()[source]

Get traffic vector column names for metadata CSV.

Returns:

Metadata column names

Return type:

list[str]

NetStat

class titli.fe.NetStat(decay_factors=[5, 3, 1, 0.1, 0.01], limit=100000.0)[source]

Bases: object

__init__(decay_factors=[5, 3, 1, 0.1, 0.01], limit=100000.0)[source]

Datastructure for efficent network stat queries

Parameters:
  • decay_factors (list, optional) – time windows. Defaults to [5,3,1,0.1,0.01].

  • limit (int, optional) – maximum number of incstats. Defaults to 1e5.

get_records(traffic_vectors)[source]

get records associated with all traffic vectors

Parameters:

traffic_vectors (list) – list of all traffic vectors

Returns:

database containing relevant records

Return type:

IncstatDB

update_get_stats(IPtype, srcMAC, dstMAC, srcIP, srcProtocol, dstIP, dstProtocol, timestamp, datagramSize, db=None)[source]

updates the netstat with traffic vectors

Parameters:
  • IPtype (int) – IP type, 0 for IPv4, and 1 for IPv6, not really used

  • srcMAC (str) – source MAC

  • dstMAC (str) – destination MAC

  • srcIP (str) – source IP

  • srcProtocol (int) – source port

  • dstIP (str) – destination IP

  • dstProtocol (int) – destination port

  • timestamp (float) – arrival time

  • datagramSize (float) – packet size

  • db (IncStatDB, optional) – dummy database, if None, updates real one. Defaults to None.

Returns:

features extracted

Return type:

array

Base Classes

BaseTrafficFeatureExtractor

class titli.fe.base_feature_extractor.BaseTrafficFeatureExtractor(file_path, dataset_name=None, state=None, **kwargs)[source]

Bases: ABC

Abstract base class for network traffic feature extraction.

This class provides the framework for extracting features from network traffic captured in PCAP files. Subclasses must implement the abstract methods to define specific feature extraction logic.

file_path

Path to the input PCAP file

Type:

str

state

Optional pre-existing state to continue from previous extraction

feature_file

Output file handle for extracted features

meta_file

Output file handle for traffic vector metadata

count

Number of packets successfully processed

Type:

int

skipped

Number of packets skipped during processing

Type:

int

__init__(file_path, dataset_name=None, state=None, **kwargs)[source]

Initialize the feature extractor.

Parameters:
  • file_path (str) – Path to the PCAP file to process

  • dataset_name (str, optional) – Name of the dataset (deprecated, not used)

  • state (NetStat, optional) – Pre-existing state to continue from. If None, starts fresh extraction

  • **kwargs – Additional arguments for subclass customization

abstractmethod update(traffic_vector)[source]

Update the feature extractor with a new traffic vector.

This method processes a traffic vector and updates the internal state of the feature extractor, returning the computed features.

Parameters:

traffic_vector (np.ndarray) – Traffic vector extracted from packet(s)

Returns:

Extracted features corresponding to the traffic vector

Return type:

np.ndarray

abstractmethod peek(traffic_vectors)[source]

Simulate feature extraction without updating internal state.

This method performs a “dry run” of feature extraction without modifying the extractor’s state. Useful for adversarial attacks or what-if analysis.

Parameters:

traffic_vectors (list) – List of traffic vectors to process

Returns:

List of features corresponding to each traffic vector

Return type:

list

abstractmethod get_traffic_vector(packet)[source]

Extract traffic vector from a raw network packet.

Parameters:

packet (scapy.packet.Packet) – Input packet to process

Returns:

Extracted traffic vector, or None if packet should be skipped

Return type:

np.ndarray or None

setup(output_path=None)[source]

Set up the feature extractor for processing.

Opens the input PCAP file, creates output CSV files for features and metadata, and initializes processing counters and state management flags.

Parameters:

output_path (str or Path, optional) – Custom path for the output feature file. If None, creates the feature file in the same directory as the input PCAP with a .csv extension. The metadata file will be created with a ‘_meta.csv’ suffix in the same directory.

Side Effects:
  • Opens input PCAP file for reading

  • Creates and opens feature and metadata CSV files for writing

  • Initializes count, skipped counters to 0

  • Sets state management flags based on whether pre-existing state was provided

abstractmethod get_headers()[source]

Get the column names for the feature CSV file.

Returns:

List of feature column names

Return type:

list[str]

abstractmethod get_meta_headers()[source]

Get the column names for the metadata/traffic vector CSV file.

Returns:

List of metadata column names

Return type:

list[str]

teardown()[source]

Clean up resources and finalize feature extraction.

Closes all open files (PCAP input, feature output, metadata output), prints processing statistics, and saves the extractor state if configured.

Side Effects:
  • Closes all open file handles

  • Prints processing statistics (skipped, processed, written counts)

  • Saves state to ‘state.pkl’ in the PCAP directory if save_state is True

abstractmethod extract_features()[source]

Main entry point for feature extraction from PCAP file.

This method should implement the complete feature extraction pipeline: reading packets from the input PCAP, extracting traffic vectors, computing features, and writing results to output files.

Must call setup() before and teardown() after processing.

BaseFeatureExtractor

Helper Functions

Usage Examples

Basic Feature Extraction

Extract features from a PCAP file and output to CSV for DataLoader usage:

from titli.fe import AfterImage

fe = AfterImage(file_path="traffic.pcap")
fe.extract_features(output_path="features.csv")

With Custom Parameters

from titli.fe import AfterImage

fe = AfterImage(
    file_path="traffic.pcap",
    decay_factors=[5, 3, 1, 0.1, 0.01],
    max_pkt=100000,
    limit=10000
)
fe.extract_features(output_path="features.csv")

Integration with DataLoader

Use extracted features with DataLoader for model training:

from titli.fe import AfterImage
from titli.utils import StreamingCSVDataset
from torch.utils.data import DataLoader

# Extract features
fe = AfterImage(file_path="traffic.pcap")
fe.extract_features(output_path="features.csv")

# Load with DataLoader
dataset = StreamingCSVDataset(
    feature_csv_path="features.csv",
    label_csv_path="labels.csv",
    label_column=0
)
loader = DataLoader(dataset, batch_size=32)

Note

Feature extractors output CSV files that should be consumed via StreamingCSVDataset and DataLoader for model training. Direct use of extracted features is discouraged. State management is now handled internally by the feature extractors.