SG DEALS
Unified Signals Engine for Real-Time Singapore Promotions Detection. Combined pipeline, architecture, scoring framework, and data model for detecting real promotions, promo codes, discounts, sales, vouchers, bundles, free gifts, and silently dropped prices in Singapore.
Every single Signals category and Discovery category on OnTheRice is backed by an engine of this same sophistication. SG DEALS is published here as a representative public example. The other engines follow the same canonical pipeline architecture, gating logic, evidence layers, and noise-filtering rigor.
Mission
SG DEALS is a Singapore-localized verification engine that continuously scans the web, social surfaces, community channels, email/newsletter flows, and commerce platforms to surface real, usable, current promotions. It is designed to answer one question: what are the strongest actual Singapore deals a user can use right now or in the last 24 hours?
The system does not behave like a coupon graveyard. It behaves like a signal engine: ingest widely, parse aggressively, cluster duplicates, verify live status, penalize noise, and only publish what survives the gates.
Signal qualification
- Promo codes, fixed-dollar discounts, percentage discounts, bundles, 1-for-1 and BOGO, free gifts with purchase, free shipping, cashback stacks, app-only sales, bank-card promotions, loyalty/member deals, limited-time food, retail, travel, service, and attraction offers.
- A signal is valid if it is newly launched, newly discovered, newly updated, or actively re-verified within the last 24 hours.
- Global or ambiguous offers do not qualify unless the Singapore applicability is proven.
1. Unified Pipeline
| Stage | What happens |
|---|---|
| Stage 1 — Internet-wide ingestion | Parallel collectors pull from official merchant pages, marketplaces, banks, social, Telegram, forums, newsletters, RSS, search-engine discovery, and price-tracking feeds. High-priority sources refresh every 5–30 min; long-tail domains every 30–60 min. |
| Stage 2 — Raw parsing and normalization | Convert HTML, JSON, email, post text, captions, OCR text, and image banners into a strict DealCandidate object. Extract merchant, code, discount, min spend, dates, channel, source, evidence, and region markers. |
| Stage 3 — Singapore relevance gate | Require clear SG grounding: .sg domain, SGD, Singapore wording, SG delivery/redemption, local malls, postal districts, local payment rails, SG banks, or timezone/context alignment. |
| Stage 4 — Temporal gate | Classify each signal as active now, starting soon, expiring soon, updated, or re-verified. Anything clearly past end date is dropped. Anything older than 24h without fresh proof decays or expires. |
| Stage 5 — Deduplication and clustering | Merge duplicate or near-duplicate sightings using merchant + code, merchant + offer signature, semantic similarity, page hash, OCR text similarity, and image/banner fingerprinting. |
| Stage 6 — Validity verification | Test whether the signal is alive without burning a user code. Validate page health, checkout availability, code-field presence, official landing page, extracted terms, and cross-source agreement. |
| Stage 7 — Value and usability scoring | Score how worthwhile the deal is: discount strength, min-spend fairness, stackability, breadth of use, redemption friction, and exclusion burden. |
| Stage 8 — Noise and fraud filtering | Penalize coupon farms, stale aggregator spam, bait offers, affiliate clones, fake urgency, region mismatches, unverifiable claims, redirect-heavy pages, and weakly specified promos. |
| Stage 9 — Publish and delivery | Only publish canonical signals that pass SG relevance, freshness, validity, and noise ceilings. Deliver to cards, APIs, alerting, dashboards, feeds, and admin review surfaces. |
2. Architecture
| Layer | Role |
|---|---|
| Ingestion layer | Distributed crawlers, API connectors, social listeners, newsletter inbox hooks, OCR workers, search-trigger discovery, change-detection monitors, and price-snapshot watchers. |
| Queue and stream layer | Kafka or Redis Streams for raw event durability, backpressure control, replay, and decoupling between collection, parsing, verification, and publishing. |
| Raw storage layer | Object storage for raw HTML, screenshots, images, OCR outputs, emails, page hashes, and crawl metadata. Structured landing in Postgres or a warehouse. |
| Extraction layer | Regex, rules, NLP, lightweight classifiers, and LLM extraction convert messy promo text into canonical fields. Supports HTML, app-like pages, social captions, and image-based promotions. |
| Entity and dedup layer | Canonical merchant mapping, near-duplicate clustering, fingerprint cache, embeddings, OCR signature matching, and source graph consolidation. |
| Verification layer | Headless browser checkers, passive checkout simulation, official-domain matching, expiry extraction, code syntax checks, and source-consensus validators. |
| Intelligence layer | Freshness engine, SG relevance engine, validity engine, value engine, popularity/spread engine, noise engine, ranking engine, decay engine, and TTL revalidation loop. |
| Serving layer | Final signals API, dashboard, alerting, Telegram/WhatsApp push, card generation, admin queue, and audit/history views. |
3. Core Formula Stack
The combined engine keeps the weighted logic intact instead of collapsing everything into one mushy confidence number. The master score rewards real, fresh, local, usable deals and penalizes spam, staleness, and weak evidence.
Where V = validity, F = freshness, R = Singapore relevance, U = user value, E = evidence strength, P = popularity/spread quality, C = redemption clarity, M = merchant trust, and N = noise penalty.
| Component | Meaning (weights censored) |
|---|---|
| Validity V | T code/cart viability, O official-source presence, Q terms consistency, L landing page health. |
| Freshness F | Exponential decay over hours since first detection, latest update, or latest live verification. Decay constant censored. |
| SG relevance R | D SG-domain/location evidence, G SGD/commercial signals, L local serviceability, B SG payment/bank rails, T UTC+8 timing, S SG text/entity markers. |
| User value U | D_s discount strength, M_s min-spend fairness, St stackability, B_r breadth of usability, W free gift or perk value. |
| Evidence E | O_f official evidence, X cross-source corroboration, C_m community confirmations, I image/OCR/banner support. |
| Popularity P | Z_v velocity z-score, Z_c credible mentions z-score, Z_s trusted-channel social spread. |
| Clarity C | C = 1 − Fr, where Fr is redemption friction from hidden T&Cs, app-only traps, member gates, or unclear flow. |
| Merchant trust M | O ownership evidence, H historical legitimacy, K known merchant reputation, A absence of scam/affiliate risk. |
| Noise N | S_p coupon-farm spam, E_x expired/dead-code signal, D_u duplicate/recycled signal, R_m region mismatch, F_k fake/unverifiable claim. |
Legacy PCS bridge
A Promotion Confidence Score (PCS) is kept as an inner verifier instead of being discarded, combining validity, terms, code, merchant trust, and source-agreement components with a deductive penalty term. The exact PCS weights are censored for copyright. PCS clearing the assist threshold can be used as a publish assist or verification prior inside the larger SG DEALS master score.
4. Gates, Thresholds, and 24-Hour Logic
| Gate | Rule |
|---|---|
| Singapore gate | Publish only if R clears the SG-relevance threshold. Exact cutoff censored. |
| Validity gate | Publish only if V clears the validity threshold. Exact cutoff censored. |
| Freshness gate | Publish only if F clears the freshness threshold. Exact cutoff censored. |
| Noise ceiling | Reject if N exceeds the noise ceiling. Exact cutoff censored. |
| PCS assist gate | Optional: require PCS above the assist threshold for automatic publish. Threshold censored. |
Tiering (score bands)
- 850–1000 — Strong Singapore Deal Signal
- 760–849 — Very Good Active Deal
- 680–759 — Good but narrower or more conditional
- 600–679 — Watchlist / niche / lower confidence
- Below 600 — Do not publish by default
24-hour interpretation
- New: first discovered in the last 24 hours.
- Existing but still live: older deal, but re-verified live in the last 24 hours.
- Updated: older deal with changed end date, new code, new stack condition, refreshed banner, or newly observed live proof in the last 24 hours.
A 2-hour-old signal should naturally score above a 22-hour-old signal unless the older signal has unusually strong evidence, value, and official confirmation.
5. Canonical Data Objects and Output
| Object | Purpose |
|---|---|
| RawDealEvent | Unprocessed signal from crawler, API, post, email, OCR frame, app banner, or price snapshot. |
| DealCandidate | Normalized possible promotion with extracted fields and source evidence. |
| DealCluster | Deduplicated canonical grouping of the same promo seen across multiple sources. |
| DealVerification | Live-status checks, expiry status, page health, validation verdict, and verification timestamps. |
| SingaporeDealSignal | Final published record used by the feed, APIs, cards, alerts, and admin audit surfaces. |
Recommended SingaporeDealSignal fields
- merchant, canonical merchant_id, title, category, deal_type, promo_code, savings estimate, discount type, discount value, min spend, start_time, end_time, verified_timestamp
- source_count, official_link, evidence_links, source_mix, first_seen, last_confirmed, freshness_score, validity_verdict, SG relevance, master score, PCS score, confidence
- important_summary_1, important_summary_2, important_summary_3, exclusions, redemption_method, online_or_instore, stackability, noise_filtered_count, history trail
Three mandatory summaries per signal card
| Summary slot | What it says |
|---|---|
| 1. What the deal is | State the concrete offer, merchant, threshold, and code if present. |
| 2. Why it matters now | State whether it is new, updated, or re-verified within the last 24 hours, and why the engine believes it is live. |
| 3. Important catch | State the main limitation: app-only, selected merchants, stack restrictions, bank/card requirement, low stock, or narrow redemption scope. |
Best detection patterns include: official merchant post plus active landing page plus live code structure; bank promo page plus merchant page plus SG checkout confirmation; Telegram/community alert plus official banner plus passive cart validation; image-based story plus OCR plus matching redemption page; and silent price drops proven by price snapshots plus active product availability.
6. Final Design Notes
- Do not trust stated end dates alone. Merchants pull offers early.
- Community sources often beat official sources for speed, but official sources beat them for trust. The engine should use both, not worship one.
- Passive validation is preferred. Do not burn single-use codes.
- Silent promotions count. A real price drop with Singapore applicability is still a signal.
- Keep a 2-hour revalidation loop for active records and auto-expire records that lose live proof.
SG DEALS is not a coupon scraper. It is a Singapore-localized promotion verification engine with internet-wide ingestion, 24-hour freshness enforcement, live validation, deduped canonical records, and weighted scoring that preserves the logic contributed across the different designs.
Reminder: every Signals category and Discovery category on OnTheRice is backed by an engine of this same sophistication. SG DEALS is the public example.