What is the most efficient way to evaluate poker hands at scale?
The most efficient way to evaluate poker hands at scale is not to write logic that checks for straights and flushes at runtime. Instead, the industry standard for high-frequency trading (HFT) and real-money poker is pre-computed Lookup Tables (LUTs) combined with bitwise operations and SIMD (Single

The most efficient way to evaluate poker hands at scale is not to write logic that checks for straights and flushes at runtime. Instead, the industry standard for high-frequency trading (HFT) and real-money poker is pre-computed Lookup Tables (LUTs) combined with bitwise operations and SIMD (Single Instruction, Multiple Data) vectorization. The goal is to reduce hand evaluation to a single CPU instruction or a few array lookups, achieving nanosecond-level latency. This allows the engine to evaluate millions of hands per second for Monte Carlo simulations, AI training, or real-time fairness checks. The fundamental insight is that there are a finite number of 5-card combinations from a 52-card deck: $\binom{52}{5} = 2,598,960$. The Strategy: Pre-computation: Before the server starts, generate a massive array where every possible 5-card (or 7-card) combination maps to a unique integer "strength" value. Runtime: Convert the player's hand into a unique integer key (hash). Lookup: Access the array at that index. The value returned is the hand's strength. Memory Trade-off: 5-Card LUT: ~2.6 million entries. If each entry is a 4-byte integer, this is ~10 MB. Fits easily in L3 CPU cache. 7-Card LUT: ~133 million entries. ~534 MB. Fits in RAM, but may spill out of cache on large servers. Optimization: Most engines use a 5-card LUT and a 7-to-5 reduction algorithm. They generate all $\binom{7}{5} = 21$ combinations, look them up, and pick the best. The most famous high-performance implementation is the Cactus Kev algorithm (popularized by poker-eval in C/C++). It maps cards to prime numbers or bitmasks to perform arithmetic operations that instantly detect flushes and straights. Cards: Represented as integers. Ranks: 2-14 (2-Ace). Suits: 4 bits (1 for Spade, 2 for Heart, etc.). Bitmasks: A 52-bit integer where the $i$-th bit is 1 if the $i$-th card is present. // Pre-computed table: maps a 5-card hash to a strength score (0-7462) // Lower score = better hand (or higher, depending on convention) int lookup_table[2598960]; int evaluate_hand(int hand_mask) { // 1. Check for Flush: Sum of suit bits. If any suit sum >= 4 (5 cards), it's a flush. // This is done via bitwise AND with suit masks (e.g., 0xF0F0F0F0F0F0F0F0) if (is_flush(hand_mask)) { return lookup_flush[hand_mask]; // Pre-computed flush strength } // 2. Check for Straight: Use a "rank bitboard". // Shift bits to detect sequences of 5 consecutive bits. if (is_straight(hand_mask)) { return lookup_straight[hand_mask]; } // 3. Standard Evaluation (Pairs, Trips, etc.) // Use the "Cactus Kev" prime multiplication method. // Each rank is a prime: 2=2, 3=3, ..., A=41. // Multiply the primes of the 5 cards. // The product is unique for every hand type (e.g., A-A-A-K-K always yields X). int product = 1; for (int i = 0; i < 5; i++) { product *= RANK_PRIMES[get_rank(hand_mask, i)]; } return lookup_table[product]; } Why this is fast: No Loops: The logic uses bitwise shifts and masks, which are single-cycle CPU instructions. Cache Friendly: The lookup table is small enough to stay in the CPU's L1/L2 cache. Branch Prediction: The if statements are highly predictable (most hands are high-card, few are straights), minimizing pipeline stalls. For Monte Carlo simulations (e.g., "What is my equity if I run this hand 100,000 times?"), single-threaded evaluation is too slow. You must use SIMD (AVX2, AVX-512 on x86; NEON on ARM). The Approach: Load 8 or 16 hands into a single 256-bit or 512-bit register. Perform the bitwise operations (flush check, straight check) on all 8/16 hands simultaneously. Use lookup tables that are vectorized (or use shuffle instructions to map multiple keys to multiple values). Library Example: HandEvaluator in Rust/C++ with AVX2 // Conceptual SIMD evaluation (using a crate like `hand-eval-simd`) fn evaluate_batch(hands: &[Hand], results: &mut [u32]) { // Load 8 hands into a 256-bit register let hand_vec = load_simd(hands); // Parallel Flush Check let flush_mask = check_flush_simd(hand_vec); // Parallel Straight Check let straight_mask = check_straight_simd(hand_vec); // Parallel Lookup (requires a specialized vectorized LUT or scatter/gather) let scores = vectorized_lookup(hand_vec, flush_mask, straight_mask); store_simd(results, scores); } This can achieve 100M+ hands per second per core on modern CPUs. In a distributed architecture, do not embed the heavy evaluation logic in the main game loop if you need to run simulations. Offload it to a dedicated Evaluator Service. Protocol: gRPC or UDP (for low latency). Language: C++ or Rust for the evaluator core (maximum performance). Interface: EvaluateHandsRequest: { hand_1: [c1, c2...], hand_2: [...] } EvaluateHandsResponse: { score_1: 1234, score_2: 5678 } Scaling: Run the service as a stateless pod in Kubernetes. Use Horizontal Pod Autoscaling (HPA) based on CPU utilization. Redis: Cache the results of common board textures (e.g., A-K-Q-J-T on board) if the same board appears frequently in simulations. PostgreSQL: Store the final hand history, including the hand_strength_score and winner_ids. Do not store the raw bitmasks unless needed for debugging. Determinism: The evaluator must be pure. No random numbers, no external state. If the same 7 cards are passed in, the same score must be returned. This is non-negotiable for regulatory compliance. Certification: The LUT generation algorithm must be certified by an independent lab (e.g., eCOGRA, GLI). The source code for the LUT generator is often audited, not just the runtime binary. Anti-Cheating: Never expose the evaluator logic to the client. The client sends cards only when allowed; the server evaluates. Scenario: A 6-max No-Limit Hold'em table. Pre-flop: Player A (AA) vs Player B (KK). Simulation: The engine runs 100,000 Monte Carlo simulations to determine equity for the "All-In" button. Execution: The "Simulator" service generates random boards (5 cards). It calls the Evaluator 200,000 times (2 hands x 100k simulations). The Evaluator uses the Cactus Kev LUT + AVX2 to process 8 hands in parallel. Total time: ~50ms. Result: Player A has 82% equity. The server updates the UI. Trade-offs: Memory vs. Speed: A full 7-card LUT is fast but memory-heavy (500MB+). A 5-card LUT + 21 combinations is slower (21x lookups) but uses only 10MB. Solution: Hybrid approach. Use a 5-card LUT. If the board is a flush/straight, use the optimized path. Otherwise, use the standard lookup. Since Node.js is single-threaded and slow for raw math, use N-API to bind a C++ evaluator. // evaluator.cc (C++ N-API Module) #include <node_api.h> #include "poker_eval.h" // The C++ LUT engine napi_value EvaluateHand(napi_env env, napi_callback_info info) { // 1. Extract card array from JS // 2. Convert to bitboard (uint64_t) // 3. Call C++ evaluate(bitboard) -> int // 4. Return int to JS return napi_create_int32(env, result); } // evaluator.js (Node.js) const evalModule = require('./build/Release/evaluator.node'); function getHandStrength(holeCards, communityCards) { // Hole: [0, 1], Board: [2, 3, 4, 5, 6] const allCards = [...holeCards, ...communityCards]; return evalModule.evaluate(allCards); } Q1: Why not just write a simple if (isFlush) return ... function in Python/JavaScript? if-else approach involves loops to check suits, loops to check ranks, and sorting. This is $O(N)$ or $O(N \log N)$ and involves many conditional branches. In a high-load environment (10k hands/sec), this creates a bottleneck and high CPU usage. A LUT approach is $O(1)$ and uses bitwise operations, which are orders of magnitude faster. Q2: How do you handle 7-card hands (Omaha/Hold'em) with a 5-card LUT? Q3: Can I use this for Web3/Blockchain poker? Off-chain Evaluation with On-chain Verification. The server (or a trusted oracle) evaluates the hand off-chain using the LUT, signs the result, and the smart contract verifies the signature. The contract never runs the evaluator. Q4: What is the "Cactus Kev" method? Q5: How do I ensure the LUT is correct and not buggy? verify the LUT against a known reference implementation (like the PokerStars or WSOP logic) or a brute-force generator. Generate all 2.6M 5-card hands. Evaluate them with a slow, correct (but inefficient) reference algorithm. Evaluate them with your fast LUT. Assert that the results match for every single hand. This verification process is part of your CI/CD pipeline.
Key Takeaways
- โขThe most efficient way to evaluate poker hands at scale is not to write logic that checks for straights and flushes at runtime
- โขThis story was reported by Dev.to, covering developments in the dev space.
- โขAI advancements continue to reshape industries โ read the full article on Dev.to for complete coverage.
๐ Continue reading the full article:
Read Full Article on Dev.to โShare this article



