Feature Extractors (titli.fe)
The titli.fe module provides feature extraction capabilities for transforming raw network traffic into feature vectors suitable for machine learning models.
Overview
Feature extractors are responsible for converting network packet data (typically from PCAP files) into numerical feature vectors. These features capture various aspects of network traffic such as temporal patterns, statistical properties, and protocol-specific information.
Available Feature Extractors
AfterImage
- class titli.fe.AfterImage(file_path, limit=inf, decay_factors=[5, 3, 1, 0.1, 0.01], max_pkt=inf, dataset_name=None, state=None, **kwargs)[source]
Bases:
BaseTrafficFeatureExtractorAfterImage feature extractor using incremental statistics over time windows.
AfterImage is a packet-based feature extractor that maintains incremental statistics across multiple exponentially-decaying time windows. It tracks network flows at different granularities (MAC-IP pairs, channels, sockets, jitter) to capture both normal and anomalous traffic patterns.
- __init__(file_path, limit=inf, decay_factors=[5, 3, 1, 0.1, 0.01], max_pkt=inf, dataset_name=None, state=None, **kwargs)[source]
Initialize AfterImage feature extractor.
- Parameters:
file_path (str) – Path to the PCAP file to process
limit (float, optional) – Maximum number of records in the statistics database. Defaults to inf (unlimited)
decay_factors (list, optional) – Time windows (in seconds) for exponential decay. Defaults to [5, 3, 1, 0.1, 0.01]
max_pkt (float, optional) – Maximum number of packets to process. Defaults to inf (all packets)
dataset_name (str, optional) – Name of the dataset (deprecated). Defaults to None
state (NetStat, optional) – Pre-existing state to continue from. Defaults to None
**kwargs – Additional arguments passed to base class
- setup(output_path=None)[source]
Set up AfterImage with NetStat database if starting fresh.
- Parameters:
output_path (str or Path, optional) – Custom output path for features. Defaults to None (uses PCAP directory)
- peek(traffic_vectors)[source]
Simulate feature extraction without modifying internal state.
Creates a temporary copy of the statistics database, applies traffic vectors to it, and returns features without persisting changes.
- extract_features(output_path=None)[source]
Extract features from PCAP file packet by packet.
Main processing loop that: 1. Reads packets from PCAP file 2. Extracts traffic vectors from each packet 3. Computes features using incremental statistics 4. Writes results to CSV files in batches 5. Handles timestamp offsets when continuing from existing state
- Parameters:
output_path (str or Path, optional) – Custom output path for features. Defaults to None (uses PCAP directory)
- update(traffic_vector)[source]
Update internal state and compute features for a traffic vector.
- Parameters:
traffic_vector (list) – Traffic vector containing [IPtype, srcMAC, dstMAC, srcIP, srcproto, dstIP, dstproto, timestamp, size]
- Returns:
Extracted features from incremental statistics
- Return type:
np.ndarray
- get_traffic_vector(packet)[source]
Extract traffic vector from a network packet.
Processes IP, IPv6, TCP, UDP, ARP, and ICMP packets to extract network flow identifiers and metadata.
- Parameters:
packet (scapy.packet.Packet) – Input packet from PCAP
- Returns:
- Traffic vector as [IPtype, srcMAC, dstMAC, srcIP, srcproto,
dstIP, dstproto, timestamp, size], or None if packet should be skipped
- Return type:
list or None
NetStat
- class titli.fe.NetStat(decay_factors=[5, 3, 1, 0.1, 0.01], limit=100000.0)[source]
Bases:
object- __init__(decay_factors=[5, 3, 1, 0.1, 0.01], limit=100000.0)[source]
Datastructure for efficent network stat queries
- get_records(traffic_vectors)[source]
get records associated with all traffic vectors
- Parameters:
traffic_vectors (list) – list of all traffic vectors
- Returns:
database containing relevant records
- Return type:
IncstatDB
- update_get_stats(IPtype, srcMAC, dstMAC, srcIP, srcProtocol, dstIP, dstProtocol, timestamp, datagramSize, db=None)[source]
updates the netstat with traffic vectors
- Parameters:
IPtype (int) – IP type, 0 for IPv4, and 1 for IPv6, not really used
srcMAC (str) – source MAC
dstMAC (str) – destination MAC
srcIP (str) – source IP
srcProtocol (int) – source port
dstIP (str) – destination IP
dstProtocol (int) – destination port
timestamp (float) – arrival time
datagramSize (float) – packet size
db (IncStatDB, optional) – dummy database, if None, updates real one. Defaults to None.
- Returns:
features extracted
- Return type:
array
Base Classes
BaseTrafficFeatureExtractor
- class titli.fe.base_feature_extractor.BaseTrafficFeatureExtractor(file_path, dataset_name=None, state=None, **kwargs)[source]
Bases:
ABCAbstract base class for network traffic feature extraction.
This class provides the framework for extracting features from network traffic captured in PCAP files. Subclasses must implement the abstract methods to define specific feature extraction logic.
- state
Optional pre-existing state to continue from previous extraction
- feature_file
Output file handle for extracted features
- meta_file
Output file handle for traffic vector metadata
- __init__(file_path, dataset_name=None, state=None, **kwargs)[source]
Initialize the feature extractor.
- abstractmethod update(traffic_vector)[source]
Update the feature extractor with a new traffic vector.
This method processes a traffic vector and updates the internal state of the feature extractor, returning the computed features.
- Parameters:
traffic_vector (np.ndarray) – Traffic vector extracted from packet(s)
- Returns:
Extracted features corresponding to the traffic vector
- Return type:
np.ndarray
- abstractmethod peek(traffic_vectors)[source]
Simulate feature extraction without updating internal state.
This method performs a “dry run” of feature extraction without modifying the extractor’s state. Useful for adversarial attacks or what-if analysis.
- abstractmethod get_traffic_vector(packet)[source]
Extract traffic vector from a raw network packet.
- Parameters:
packet (scapy.packet.Packet) – Input packet to process
- Returns:
Extracted traffic vector, or None if packet should be skipped
- Return type:
np.ndarray or None
- setup(output_path=None)[source]
Set up the feature extractor for processing.
Opens the input PCAP file, creates output CSV files for features and metadata, and initializes processing counters and state management flags.
- Parameters:
output_path (str or Path, optional) – Custom path for the output feature file. If None, creates the feature file in the same directory as the input PCAP with a .csv extension. The metadata file will be created with a ‘_meta.csv’ suffix in the same directory.
- Side Effects:
Opens input PCAP file for reading
Creates and opens feature and metadata CSV files for writing
Initializes count, skipped counters to 0
Sets state management flags based on whether pre-existing state was provided
- abstractmethod get_meta_headers()[source]
Get the column names for the metadata/traffic vector CSV file.
- teardown()[source]
Clean up resources and finalize feature extraction.
Closes all open files (PCAP input, feature output, metadata output), prints processing statistics, and saves the extractor state if configured.
- Side Effects:
Closes all open file handles
Prints processing statistics (skipped, processed, written counts)
Saves state to ‘state.pkl’ in the PCAP directory if save_state is True
- abstractmethod extract_features()[source]
Main entry point for feature extraction from PCAP file.
This method should implement the complete feature extraction pipeline: reading packets from the input PCAP, extracting traffic vectors, computing features, and writing results to output files.
Must call setup() before and teardown() after processing.
BaseFeatureExtractor
Helper Functions
Usage Examples
Basic Feature Extraction
Extract features from a PCAP file and output to CSV for DataLoader usage:
from titli.fe import AfterImage
fe = AfterImage(file_path="traffic.pcap")
fe.extract_features(output_path="features.csv")
With Custom Parameters
from titli.fe import AfterImage
fe = AfterImage(
file_path="traffic.pcap",
decay_factors=[5, 3, 1, 0.1, 0.01],
max_pkt=100000,
limit=10000
)
fe.extract_features(output_path="features.csv")
Integration with DataLoader
Use extracted features with DataLoader for model training:
from titli.fe import AfterImage
from titli.utils import StreamingCSVDataset
from torch.utils.data import DataLoader
# Extract features
fe = AfterImage(file_path="traffic.pcap")
fe.extract_features(output_path="features.csv")
# Load with DataLoader
dataset = StreamingCSVDataset(
feature_csv_path="features.csv",
label_csv_path="labels.csv",
label_column=0
)
loader = DataLoader(dataset, batch_size=32)
Note
Feature extractors output CSV files that should be consumed via StreamingCSVDataset
and DataLoader for model training. Direct use of extracted features is discouraged.
State management is now handled internally by the feature extractors.