Our Engines · Representative Public Specification

SG DEALS

Unified Signals Engine for Real-Time Singapore Promotions Detection. Combined pipeline, architecture, scoring framework, and data model for detecting real promotions, promo codes, discounts, sales, vouchers, bundles, free gifts, and silently dropped prices in Singapore.

⚙️ System-Wide Standard

Every single Signals category and Discovery category on OnTheRice is backed by an engine of this same sophistication. SG DEALS is published here as a representative public example. The other engines follow the same canonical pipeline architecture, gating logic, evidence layers, and noise-filtering rigor.

ScopeInternet-wide, Singapore-only, 24-hour active window.
Signal typesPromo codes, discounts, sales, bundles, cashback, free gifts, bank/card deals, flash sales.
Publishing ruleMust be real or likely real, locally relevant, and fresh, updated, or re-verified within 24 hours.

Mission

SG DEALS is a Singapore-localized verification engine that continuously scans the web, social surfaces, community channels, email/newsletter flows, and commerce platforms to surface real, usable, current promotions. It is designed to answer one question: what are the strongest actual Singapore deals a user can use right now or in the last 24 hours?

The system does not behave like a coupon graveyard. It behaves like a signal engine: ingest widely, parse aggressively, cluster duplicates, verify live status, penalize noise, and only publish what survives the gates.

Signal qualification

1. Unified Pipeline

StageWhat happens
Stage 1 — Internet-wide ingestionParallel collectors pull from official merchant pages, marketplaces, banks, social, Telegram, forums, newsletters, RSS, search-engine discovery, and price-tracking feeds. High-priority sources refresh every 5–30 min; long-tail domains every 30–60 min.
Stage 2 — Raw parsing and normalizationConvert HTML, JSON, email, post text, captions, OCR text, and image banners into a strict DealCandidate object. Extract merchant, code, discount, min spend, dates, channel, source, evidence, and region markers.
Stage 3 — Singapore relevance gateRequire clear SG grounding: .sg domain, SGD, Singapore wording, SG delivery/redemption, local malls, postal districts, local payment rails, SG banks, or timezone/context alignment.
Stage 4 — Temporal gateClassify each signal as active now, starting soon, expiring soon, updated, or re-verified. Anything clearly past end date is dropped. Anything older than 24h without fresh proof decays or expires.
Stage 5 — Deduplication and clusteringMerge duplicate or near-duplicate sightings using merchant + code, merchant + offer signature, semantic similarity, page hash, OCR text similarity, and image/banner fingerprinting.
Stage 6 — Validity verificationTest whether the signal is alive without burning a user code. Validate page health, checkout availability, code-field presence, official landing page, extracted terms, and cross-source agreement.
Stage 7 — Value and usability scoringScore how worthwhile the deal is: discount strength, min-spend fairness, stackability, breadth of use, redemption friction, and exclusion burden.
Stage 8 — Noise and fraud filteringPenalize coupon farms, stale aggregator spam, bait offers, affiliate clones, fake urgency, region mismatches, unverifiable claims, redirect-heavy pages, and weakly specified promos.
Stage 9 — Publish and deliveryOnly publish canonical signals that pass SG relevance, freshness, validity, and noise ceilings. Deliver to cards, APIs, alerting, dashboards, feeds, and admin review surfaces.

2. Architecture

LayerRole
Ingestion layerDistributed crawlers, API connectors, social listeners, newsletter inbox hooks, OCR workers, search-trigger discovery, change-detection monitors, and price-snapshot watchers.
Queue and stream layerKafka or Redis Streams for raw event durability, backpressure control, replay, and decoupling between collection, parsing, verification, and publishing.
Raw storage layerObject storage for raw HTML, screenshots, images, OCR outputs, emails, page hashes, and crawl metadata. Structured landing in Postgres or a warehouse.
Extraction layerRegex, rules, NLP, lightweight classifiers, and LLM extraction convert messy promo text into canonical fields. Supports HTML, app-like pages, social captions, and image-based promotions.
Entity and dedup layerCanonical merchant mapping, near-duplicate clustering, fingerprint cache, embeddings, OCR signature matching, and source graph consolidation.
Verification layerHeadless browser checkers, passive checkout simulation, official-domain matching, expiry extraction, code syntax checks, and source-consensus validators.
Intelligence layerFreshness engine, SG relevance engine, validity engine, value engine, popularity/spread engine, noise engine, ranking engine, decay engine, and TTL revalidation loop.
Serving layerFinal signals API, dashboard, alerting, Telegram/WhatsApp push, card generation, admin queue, and audit/history views.

3. Core Formula Stack

The combined engine keeps the weighted logic intact instead of collapsing everything into one mushy confidence number. The master score rewards real, fresh, local, usable deals and penalizes spam, staleness, and weak evidence.

🔒 Formula weights below are censored for copyright. The structure, components, and meaning of each scoring term remain visible so external auditors and AI assistants can verify rigor — but the precise weighted coefficients are proprietary to OnTheRice and intentionally redacted.
SSG_DEALS = 1000 × ( w₁V + w₂F + w₃R + w₄U + w₅E + w₆P + w₇C + w₈M − w₉N )

Where V = validity, F = freshness, R = Singapore relevance, U = user value, E = evidence strength, P = popularity/spread quality, C = redemption clarity, M = merchant trust, and N = noise penalty.

ComponentMeaning (weights censored)
Validity VT code/cart viability, O official-source presence, Q terms consistency, L landing page health.
Freshness FExponential decay over hours since first detection, latest update, or latest live verification. Decay constant censored.
SG relevance RD SG-domain/location evidence, G SGD/commercial signals, L local serviceability, B SG payment/bank rails, T UTC+8 timing, S SG text/entity markers.
User value UD_s discount strength, M_s min-spend fairness, St stackability, B_r breadth of usability, W free gift or perk value.
Evidence EO_f official evidence, X cross-source corroboration, C_m community confirmations, I image/OCR/banner support.
Popularity PZ_v velocity z-score, Z_c credible mentions z-score, Z_s trusted-channel social spread.
Clarity CC = 1 − Fr, where Fr is redemption friction from hidden T&Cs, app-only traps, member gates, or unclear flow.
Merchant trust MO ownership evidence, H historical legitimacy, K known merchant reputation, A absence of scam/affiliate risk.
Noise NS_p coupon-farm spam, E_x expired/dead-code signal, D_u duplicate/recycled signal, R_m region mismatch, F_k fake/unverifiable claim.

Legacy PCS bridge

A Promotion Confidence Score (PCS) is kept as an inner verifier instead of being discarded, combining validity, terms, code, merchant trust, and source-agreement components with a deductive penalty term. The exact PCS weights are censored for copyright. PCS clearing the assist threshold can be used as a publish assist or verification prior inside the larger SG DEALS master score.

4. Gates, Thresholds, and 24-Hour Logic

GateRule
Singapore gatePublish only if R clears the SG-relevance threshold. Exact cutoff censored.
Validity gatePublish only if V clears the validity threshold. Exact cutoff censored.
Freshness gatePublish only if F clears the freshness threshold. Exact cutoff censored.
Noise ceilingReject if N exceeds the noise ceiling. Exact cutoff censored.
PCS assist gateOptional: require PCS above the assist threshold for automatic publish. Threshold censored.

Tiering (score bands)

24-hour interpretation

  1. New: first discovered in the last 24 hours.
  2. Existing but still live: older deal, but re-verified live in the last 24 hours.
  3. Updated: older deal with changed end date, new code, new stack condition, refreshed banner, or newly observed live proof in the last 24 hours.

A 2-hour-old signal should naturally score above a 22-hour-old signal unless the older signal has unusually strong evidence, value, and official confirmation.

5. Canonical Data Objects and Output

ObjectPurpose
RawDealEventUnprocessed signal from crawler, API, post, email, OCR frame, app banner, or price snapshot.
DealCandidateNormalized possible promotion with extracted fields and source evidence.
DealClusterDeduplicated canonical grouping of the same promo seen across multiple sources.
DealVerificationLive-status checks, expiry status, page health, validation verdict, and verification timestamps.
SingaporeDealSignalFinal published record used by the feed, APIs, cards, alerts, and admin audit surfaces.

Recommended SingaporeDealSignal fields

Three mandatory summaries per signal card

Summary slotWhat it says
1. What the deal isState the concrete offer, merchant, threshold, and code if present.
2. Why it matters nowState whether it is new, updated, or re-verified within the last 24 hours, and why the engine believes it is live.
3. Important catchState the main limitation: app-only, selected merchants, stack restrictions, bank/card requirement, low stock, or narrow redemption scope.

Best detection patterns include: official merchant post plus active landing page plus live code structure; bank promo page plus merchant page plus SG checkout confirmation; Telegram/community alert plus official banner plus passive cart validation; image-based story plus OCR plus matching redemption page; and silent price drops proven by price snapshots plus active product availability.

6. Final Design Notes

Verdict

SG DEALS is not a coupon scraper. It is a Singapore-localized promotion verification engine with internet-wide ingestion, 24-hour freshness enforcement, live validation, deduped canonical records, and weighted scoring that preserves the logic contributed across the different designs.

Reminder: every Signals category and Discovery category on OnTheRice is backed by an engine of this same sophistication. SG DEALS is the public example.