The constraints that shape everything
A payment engine has unique constraints that most web applications never face. Every transaction must be processed exactly once. Balances must be accurate to the smallest currency unit. The system must remain available during traffic spikes (Black Friday, payday, end-of-month). And all of this must happen while maintaining a complete, immutable audit trail.
When we designed Crezaro's payment engine, we started with these constraints and worked backward to the architecture. The result is a system that processes sustained throughput of 5,000+ transactions per second with p99 latency under 200 milliseconds.
The hot path: Go and gRPC
The payment hot path (from API request to processor submission) is written in Go. We chose Go for several reasons: predictable garbage collection pauses, excellent concurrency primitives, and a runtime that compiles to a single binary with no dependencies.
The payment service exposes a gRPC API internally and an HTTP/JSON API externally. When a payment request arrives, it passes through these stages:
- Validation and enrichment: Schema validation, currency checks, merchant status verification
- Risk scoring: Fraud detection rules, velocity checks, device fingerprinting
- Routing: Selecting the optimal processor based on payment method, currency, and cost
- Submission: Sending the transaction to the selected processor
- Event publication: Publishing the result to Kafka for downstream processing
Each stage is a discrete function with its own timeout and circuit breaker. If the risk scoring service is slow, we can degrade gracefully rather than failing the entire transaction.
Event streaming with Apache Kafka
Every state change in the payment lifecycle produces an event that is published to Apache Kafka. These events drive everything that happens after the initial payment submission: ledger entries, webhook deliveries, notification dispatch, analytics, and reconciliation.
We use Kafka for several reasons:
- Durability: Events are replicated across three brokers with
acks=all. We have never lost an event. - Ordering: Events for a single transaction are always processed in order (we partition by transaction ID).
- Replay: When we deploy a new consumer or fix a bug, we can replay events from any point in time.
- Decoupling: The payment engine does not need to know about webhooks, notifications, or analytics. It just publishes events.
// Simplified event structure
{
"event_id": "evt_01H8XYZABC...",
"event_type": "payment.completed",
"timestamp": "2026-02-28T14:23:01.847Z",
"data": {
"transaction_id": "txn_01H8XYZ...",
"amount": 500000,
"currency": "NGN",
"processor": "nibss_nip",
"merchant_id": "mer_01H7ABC..."
}
}
The ledger: PostgreSQL and double-entry accounting
The financial ledger is the heart of any payment system. Ours is built on PostgreSQL with strict double-entry accounting principles. Every money movement creates a debit and a credit entry. Wallet balances are never updated directly; they are computed from the sum of ledger entries.
This approach has significant advantages for a payment system:
- Correctness: The sum of all debits always equals the sum of all credits. If it does not, something is fundamentally broken, and we know immediately.
- Auditability: Every balance change has a corresponding ledger entry with a timestamp, actor, and reason.
- Reconciliation: We can reconcile any account at any point in time by replaying ledger entries.
PostgreSQL handles this workload well because of its strong transactional guarantees. We use SERIALIZABLE isolation for ledger writes to prevent double-spending, and we have optimized our schema for append-only writes with minimal index overhead.
Caching and read optimization
While writes go through PostgreSQL, reads are heavily cached in Redis. Merchant configurations, fee schedules, exchange rates, and processor routing rules are all cached with appropriate TTLs. We use a write-through cache pattern: when a merchant updates their configuration, we update both PostgreSQL and Redis in the same transaction.
For transaction queries and analytics, we replicate data to ClickHouse, which handles our analytical workloads. The dashboard's transaction search, revenue charts, and settlement reports all query ClickHouse rather than the primary PostgreSQL database.
Lessons learned
Building a payment engine teaches you things that no amount of reading can prepare you for:
- Idempotency is not optional. Network failures happen. Processors timeout. Retries happen. If your system cannot handle the same request twice without double-charging, you will lose merchants and their trust.
- Monitoring is a feature. We spent as much time building our monitoring and alerting infrastructure as we did building the payment engine itself. Every transaction goes through 47 distinct metrics.
- Test with real money early. Sandbox environments never perfectly replicate production. We ran real transactions (with our own money) for months before onboarding the first merchant.
If you are interested in the technical details, reach out. We are always happy to talk shop with fellow engineers. And if you want to build on top of an infrastructure that handles this complexity for you, check out our API docs.