π§ FireMUD System Architecture: Redis
This document outlines FireMUDβs usage of Redis as a transient, high-performance, distributed coordination layer. It focuses on Redisβs responsibilities, safety guarantees, key patterns, and operational practices.
π For full tick execution, retries, and lock behavior, see Tick System and Runtime Design. Out-of-band workflows rely on the gRPC-based Saga approach described in Transaction Strategies.
β οΈ Redis as a Volatile State Layer
Redis is used exclusively for non-authoritative, transient data, including:
- In-flight command queues
- Tick locks and staged results
- Cooldowns and timer expirations (stored in milliseconds)
- Gameplay session state and real-time coordination data (e.g., command queues, timers, tick participation β see Session Keys)
- Retry metadata and inter-tick conflict tracking
- AI/scripted action injection
All canonical game data β accounts, entities, items, rooms β resides in PostgreSQL, owned by domain-specific services.
Redis acts as a coordinated real-time buffer, not a source of truth β but is still treated as critical for game availability and consistency.
The Game Session Service is responsible for coordinating tick and session behavior using Redis as its execution substrate.
β Benefits
- Low-latency access for gameplay-critical state
- Enables stateless, horizontally scalable services
- Supports safe concurrent ticks and session handling
- Facilitates reconnection, failover, and replay
π‘οΈ Redis Availability, Consistency, and Safety Guarantees
Redis is a non-persistent layer β but FireMUD treats it as essential for consistent multiplayer behavior. Availability and deterministic recovery are prioritized.
Cluster Deployment
FireMUD runs Redis in a clustered, replicated configuration:
- Multiple shards and replicas for tick region and session partitioning
- Partitioning aligns with tick region boundaries (typically per-room or per-segment)
- Kubernetes-native failover
- Failover behavior is tested under live tick loads
- Tick lock and retry keys are retained across failover due to AOF and synchronous Lua-based commit policies, ensuring ticks can resume safely after leadership handoff.
For operational context on Docker Compose vs Kubernetes, see Deployment Environments.
Replication and Durability
- Writes are asynchronously replicated
- AOF (Append-Only File) enabled for durability and crash recovery
- Critical Lua writes use
WAIT 1 100
for replica acknowledgment
ποΈ Key Naming and Shard Discipline
Redis keys follow strict naming conventions to ensure:
- Shard-aware key locality
- Clean atomic execution across tick regions
- Conflict and retry isolation
- Debuggable and traceable behavior
Key Format Examples
Redis Key | Description |
---|---|
tick:lock:{entityId} | Lock for entity during tick execution |
tick:pending:{regionId} | Staged results for a tick region |
room:{roomId}:occupants | Room occupancy snapshot |
retry:{regionId} | Retry queue for failed actions |
timer:{entityId}:{effectId} | Cooldown/effect timer metadata (in ms) |
π For session-related keys and structure, see Session Keys and Gameplay Binding β οΈ Tick regions and player sessions are always scoped to a single Redis shard to preserve atomicity. Cross-shard operations are avoided.
π Atomicity and Concurrency Control
Redisβs single-threaded model is extended using Lua scripts for atomic operations:
- Entity lock acquisition (
tick:lock:*
) - Tick staging, commit, and rollback (
tick:pending:*
) - Timer lifecycle management
- Session rebinding and deduplication (
session:*
keys) - Retry queue updates
- AI/scripted action injection
All Lua scripts are:
- Idempotent
- Shard-local
- Retry-safe
- Designed to avoid cross-tick contamination
π For use during tick execution, see Distributed Locking
Example Lock Workflow
- Acquire
tick:lock:{entityId}
usingSET NX PX
with a TTL equal to the tick duration. - Stage updates under
tick:pending:{regionId}
via Lua script while the lock is held. - On successful commit the lock is released and staged data is flushed.
- If the lock expires, the next tick replays
tick:pending:{regionId}
and attempts the workflow again.
β±οΈ Tick Integration (Resilience, Locking, Staging)
Redis is essential for coordinating tick execution across distributed worker services.
It provides:
- Per-entity command queues
- Durable tick staging
- Distributed locks and retry tracking
- Conflict metadata for retry prioritization
- Accurate cooldown and timer tracking
π Ticks are replayable and deterministic due to Lua-based staging, lock control, and AOF durability. π See Tick Execution Flow
π₯ Crash and Recovery Safety
If a tick is interrupted:
- Redis retains:
- Locks
- Staged updates
- Timers
- Retry metadata
- Game Session Service can:
- Retry or roll forward incomplete ticks
- Prevent double-processing via lock validation
All recovery is deterministic and safe.
π Observability and Reliability
FireMUD actively monitors Redis performance and tick health:
- Prometheus metrics (via Redis exporters):
- Lua script latency
- Lock contention
- Retry queue depth
- Keyspace and memory usage
- Grafana dashboards visualize tick throughput and hotspots
- Prometheus Alertmanager sends alerts if metrics exceed thresholds
- Graceful degradation logic reduces gameplay interruption if Redis temporarily stalls
- Redis is the only volatile coordination layer β no per-service caches are used
π Redis observability feeds into the common stack described in Logging & Monitoring
π§ Session Keys and Gameplay Binding
Redis stores transient gameplay session state for each connected player, including:
- Socket binding metadata
- Active
playerId
andworldId
context - Tick region participation and queued commands
- Timer and cooldown data
- Conflict and retry metadata
This state is used by the Game Session Service to:
- Resume gameplay after disconnects
- Rebind gameplay context to a new socket
- Deduplicate reconnect attempts
- Handle character takeovers (one session per character)
π Key formats are internal and subject to change. Services treat Redis as a coordination layer, not a persistent or public contract.
β Summary
Redis in FireMUD is:
- A transient, high-performance coordination layer
- Used for ticks, timers, locks, retries, and gameplay session state (see Session Keys)
- Scripted via Lua for atomic tick and session control
- Durable via AOF and
WAIT
guarantees - Always shard-local to avoid cross-node inconsistencies
- Tightly coupled with the Game Session Service, which orchestrates all tick-related flow
- Not a source of truth β but treated as critical infrastructure
π Related Documentation