Technical Architecture

Lunyn's parsing engine represents a complete rethinking of market data processing architecture. Traditional approaches sacrifice either performance or maintainability. We engineered both.

Core Architecture

Zero-Copy Parsing Strategy

Parsing operations occur directly on ingested buffers without intermediate allocations. Message boundaries are identified through pointer arithmetic, and field access happens via calculated offsets.

Performance impact: 3-4x throughput improvement vs. allocation-heavy approaches.

Lock-Free Architecture

Multi-threaded processing using lock-free queues and wait-free data structures. Each parsing thread operates independently with minimal synchronization. Message distribution uses MPSC (multi-producer, single-consumer) queues for message routing, memory ordering guarantees for safe concurrent access, and per-thread state to eliminate cache contention.

Performance impact: Linear scaling across CPU cores without lock contention.

Vectorized Operations

Critical parsing operations leverage AVX2 instruction sets to process multiple fields simultaneously: Parallel byte-to-integer conversion, vectorized field extraction, and SIMD-optimized timestamp parsing. Specific operations can process 8 message fields in parallel vs. sequential processing.

Performance impact: 2-3x improvement on parsing-intensive message types.

Production-Grade Reliability

Comprehensive error handling for malformed messages (graceful degradation), out-of-sequence data (sequence validation), protocol violations (detailed error reporting), and edge cases (tested against 500GB+ production data). System maintains parsing state across errors and provides detailed diagnostics for debugging.

System maintains parsing state across errors with detailed diagnostics.

Complete ITCH 5.0 Implementation

Message TypeFrequencyParse LatencyMemory
System EventRare8ns16B
Stock DirectoryDaily12ns256B
Add OrderVery High8ns64B
Add Order (MPID)High9ns72B
Order ExecutedHigh8ns48B
Order Executed (Price)Medium9ns52B
Order CancelHigh7ns40B
Order DeleteMedium7ns32B
Order ReplaceHigh10ns64B
Trade (Non-Cross)High9ns56B
Cross TradeLow11ns72B
Broken TradeRare8ns40B
NOIIHigh13ns128B
RPIIMedium9ns48B

Validated Performance Metrics

Throughput vs. Core Count
1 Core:27M msg/sec
2 Cores:54M msg/sec
4 Cores:107M msg/sec
8 Cores:210M msg/sec

Linear scaling characteristics

Latency Distribution
p50 (Median):8ns
p99:12ns
p99.9:18ns
p99.99:35ns

Measured across all message types

Memory Usage Profile
Per-thread overhead:2MB
Zero-copy buffers:Configurable
Message state:64B avg
Total (4 cores):<100MB

Excluding user buffers

CPU Utilization
Parsing threads:95-98%
I/O threads:60-70%
Cache efficiency:98%+
Branch prediction:99.5%+

Measured under sustained load

All metrics reproducible via our benchmark suite. Contact us for evaluation access.

View Full Benchmark Report

Deployment Options

Binary Integration

Link directly into your application via provided API. Supports real-time streaming interfaces, batch file processing, and custom message routing.

Language bindings: C++, Rust, Python
Standalone Service

Deploy as microservice with gRPC or REST API, Kafka/Redis output streams, and monitoring and alerting integration.

Language bindings: Any (via API)
File Processing Pipeline

Command-line tool for batch conversion (ITCH → Parquet/CSV), historical data processing, and data quality validation.

Language bindings: CLI

Evaluate the Architecture

Request access to detailed technical documentation, benchmark methodologies, and integration guides for your evaluation.