A merchant in Karachi blocked 50 fake orders last month using phone number and IP blocklists. Good. But COD fake order detection based on blocklists alone let 30 more fraudulent orders ship anyway — same addresses, slightly different spelling, brand new phone numbers every time. Each failed delivery cost ₨850 in shipping, packaging, and courier return fees. That's ₨25,500 burned on orders that were never going to convert to cash.
This is the blocklist ceiling. You hit it somewhere between 100 and 300 daily orders, when the volume of fraud attempts outpaces your ability to manually spot patterns. The repeat offenders learn your system faster than you learn theirs. They change one digit in their phone number, swap "Street" for "St," and your blocklist treats them like a brand new customer.
Why Blocklists Stop Working at Scale
Phone number blocking catches the laziest fraudsters — the ones who reuse the same number three times in a row. IP blocking catches even fewer, because most mobile users in South Asia and MENA share dynamic IPs across thousands of devices.
The real problem isn't the data you're blocking on. It's that blocklists are binary: an order is either blocked or it isn't. There's no middle ground. A slightly suspicious order from a new number at a previously failed address sails right through.
Merchants running 200+ COD orders per day in India, Pakistan, Egypt, and Saudi Arabia report RTO rates between 25% and 35% even with active blocklists. If you haven't set up phone and IP blocking yet, start there — it's your baseline. But the 15-20% that blocklists catch feels like progress only until you calculate what the remaining 30% costs you every month.
Behavioral Risk Scoring Adds the Layer Blocklists Can't
Behavioral risk scoring is a weighted fraud detection layer that sits on top of your blocklist. Instead of a binary block/allow decision, every incoming order gets a score based on multiple signals. High-score orders get flagged for manual review or automatic OTP verification. Low-score orders ship normally.
The concept is borrowed from payment fraud detection, where companies like Stripe and Signifyd have used behavioral scoring for years. The difference: you don't need their software or their budget. You need a spreadsheet, your order export data, and about 2 hours to set it up.
A basic scoring model uses five to seven signals, each weighted by how strongly it predicts a failed delivery. The combined score tells you more than any single signal could alone.
What Are the Signals That Predict a Fake COD Order?
Not every signal carries the same weight. Based on patterns reported by high-volume COD merchants across multiple markets, here's what to track:
- Address fuzzy match to a previously failed delivery. "Building 4, Al Rashid St" and "Bldg 4, Al-Rashid Street" are the same address. Exact-match blocklists miss this. Use a simple string similarity check (even a basic VLOOKUP with partial matching) against your failed delivery address list. Weight: high.
- Postal code failure rate. Some postal codes have 50%+ RTO rates. If a new order comes from a zone where half your deliveries fail, that's a signal — not a block, but a flag. Weight: medium-high.
- Order velocity from the same postal code. Three orders from the same postal code within 2 hours, all from different phone numbers, all for single high-value items. That pattern doesn't happen organically. Weight: medium.
- Cart composition. Single high-value items with no add-ons carry higher RTO risk than multi-item orders across every COD market. A customer ordering one ₹4,000 item is statistically more likely to refuse delivery than someone ordering three items totaling ₹2,500. Weight: medium.
- Time-of-day clustering. Legitimate orders follow your store's normal traffic patterns. A cluster of orders between 2 AM and 5 AM local time — especially from new customers — is worth flagging. Weight: low-medium.
- Incomplete or suspicious address fields. Missing apartment numbers, single-word city names, or addresses shorter than 15 characters correlate with higher RTO. Weight: low-medium.
- First-time customer + high order value + COD. Each of these alone is fine. All three together push the risk score up. A first-time customer placing a ₹6,000 COD order at 3 AM from a high-RTO postal code? That order deserves a phone call before it ships. Weight: varies (use as a multiplier).
Build a Scoring Sheet in Google Sheets Using Your Order Data
You don't need fraud detection software for this. Export your last 90 days of orders — including delivery status — into Google Sheets. Then build your scoring model:
- Create a "Failed Deliveries" reference sheet. Filter for all orders that were returned, refused, or undeliverable. Pull the address, postal code, phone number, and order value into a separate tab.
- Calculate your postal code failure rates. Group orders by postal code. Divide failed deliveries by total deliveries. Any postal code above 30% failure rate gets flagged as high-risk.
- Assign point values to each signal. Start simple: address fuzzy match = 30 points, high-risk postal code = 20 points, single high-value item = 15 points, odd-hour order = 10 points, short address = 10 points, first-time customer = 5 points. These weights will evolve as you learn your store's specific patterns.
- Set your threshold. Orders scoring above 50 points go to manual review. Orders above 70 get automatic OTP verification or a confirmation call before shipping. Orders below 50 ship normally.
- Review weekly and adjust. Check which flagged orders actually converted and which confirmed orders still failed. Move the weights. This is a learning system, not a set-and-forget rule.
The entire setup takes one afternoon. The ongoing maintenance is 15 minutes per week reviewing your threshold accuracy.
Automate the High-Risk Response
Manual review doesn't scale past 300 orders per day. Once your scoring model is calibrated (give it 2-3 weeks of data), automate the response for high-risk orders:
- OTP verification for orders scoring 50+. If you're using EasySell's OTP verification, you can require phone verification on orders that match high-risk criteria — adding friction only where it's needed, not for every customer.
- Confirmation WhatsApp message for orders scoring 40-50. A simple "Confirm your order" message with a reply button. Legitimate customers respond in minutes. Fake orders go silent.
- Auto-hold for orders scoring 70+. Don't ship these until a team member reviews them. The 5-minute delay costs nothing compared to a ₨850 failed delivery.
The goal isn't to block every suspicious order. It's to add just enough friction to the high-risk ones that fraudsters move on to easier targets. Legitimate customers barely notice a one-time OTP request.
What "Good" Looks Like After 30 Days
Merchants who've implemented behavioral scoring report consistent results across markets:
- RTO rates drop 8-12 percentage points within the first month (from ~30% to ~18-22%)
- False positive rate (legitimate orders incorrectly flagged) stays under 5% with proper threshold tuning
- Shipping cost savings of 15-25% on previously wasted deliveries
The math is straightforward. If you ship 200 COD orders per day at an average delivery cost of ₹120, and your RTO rate drops from 30% to 20%, you save ₹2,400 per day. That's ₹72,000 per month — from a Google Sheet and 15 minutes of weekly maintenance.
You won't catch every fake order. That's not the point. The point is catching the 30% that your blocklist misses — the repeat offenders with new numbers, the address variations, the patterns that are invisible when you look at orders one at a time but obvious when you score them together.
The Scoring Model Improves Itself
Every order that ships gives you new data. Every failed delivery tells you which signals you underweighted. Every successful delivery to a flagged address tells you which signals you overweighted.
After 90 days, your scoring model knows your store's fraud patterns better than you do. The postal codes shift. The time-of-day patterns change. The cart composition signals get more specific to your product catalog. A static blocklist can't adapt to this. A scoring model does it automatically as long as you feed it delivery outcome data.
Start this week. Export your orders, build the reference sheet, assign your initial weights. You'll lose another ₹25,000 in failed deliveries this month if you don't — and every one of those orders will look perfectly clean to your blocklist.