Systems I've designed, drawn.
I think in boxes and arrows before I think in code. These are the patterns I've shipped in production — and the trade-offs baked into each.
Distributed rate limiter
Sliding-window limiter in Redis, protecting a 10K-RPS API. Composite keys (tenant, user, route). Fail-soft when Redis degrades.
- Sliding window via Redis sorted sets (ZADD + ZREMRANGEBYSCORE)
- Atomic via Lua script — no race with MULTI/EXEC
- Per-tenant, per-user, per-route composite keys
- Local shadow cache as fail-soft when Redis is unreachable
Idempotent payment pipeline
Retry-safe charges using an idempotency key contract, atomic ledger writes, and an outbox-driven retry worker for PSP calls.
- Required Idempotency-Key header (uuidv4, 24h TTL)
- Redis hot-path dedupe + Postgres unique constraint for truth
- Ledger write + idempotency write in the same transaction
- Outbox pattern → retry worker with exponential backoff & DLQ
Event-driven platform
Services publish aggregate events to Kafka; downstream consumers (search, analytics, notifications) subscribe with their own pace.
- Topics versioned (`orders.v1`) and partitioned by aggregate id
- Consumer groups isolate reader pace + failure domains
- Dead-letter topic with scheduled replay worker
- Schema registry keeps producers and consumers honest
Circuit breakers per dependency
One breaker per remote dependency — not per service. Jittered recovery windows. Fallbacks only for safe reads.
- Per-dependency breakers surface the real culprit
- Breaker timeout < caller timeout (or it hides failures)
- Jittered half-open windows to avoid recovery stampedes
- Fallbacks only for non-authoritative reads; writes fail loud
NFT marketplace on Terra
CosmWasm contracts on-chain (Rust), off-chain indexer projecting events into Postgres for fast queries and UX.
- On-chain: mint, escrow, royalties in CosmWasm
- Off-chain indexer subscribes to chain events
- Postgres projections enable joined queries and fast reads
- Cache hot listings; IPFS for token metadata
Have a system that needs to scale — or stop breaking?
I work with a small number of teams each month on architecture reviews, scaling, and hands-on backend engineering. If that sounds like you, let's talk.