Layer 3 — Voice Transaction

Voice Pay

Speak. Confirm. Done — for real money.

Sesim Voice Transaction (Layer 3) lets users authorize bank transfers, POS payments, and consent flows by speaking. NLP intent extraction (amount/recipient/currency) plus a unique-per-turn dual-confirmation challenge plus anti-spoof on the confirmation audio. Combined with Layer 1 (coercion detection) and Layer 2 (voiceprint), one voice intent yields three signals: who + under coercion? + what action.

Private pilot

Live voice-pay execution is restricted to active partner-led pilots. To request access for a mobile banking, POS, ATM voice-pay, healthcare consent, or public e-services pilot, contact us.

Request pilot access
Demo flow (illustrative)

A typical voice-pay turn — illustrated only. No real audio is captured on this page; live execution requires a partner-led pilot.

  1. 1
    User says
    "Send 100 TRY to Ahmet"
  2. 2
    ASR + NLP intent
    { amount: 100, currency: TRY, recipient: Ahmet, action: transfer }
  3. 3
    Whitelist + cap check
    Ahmet (...4567) ✓ · 100 ≤ 5,000 ✓
  4. 4
    Dual-confirmation prompt
    "Confirm 100 TRY to Ahmet (...4567)? Say: 'Yes, code-cloud, 100 TRY confirm'"
  5. 5
    User says
    "Yes, code-cloud, 100 TRY confirm"
  6. 6
    Cross-layer verdict
    L1 stress 0.08 · L2 voiceprint 0.91 · L3 anti-spoof 0.94 → PASS
  7. 7
    Execute + audit
    Bank API → blockchain audit hash · 2.4s end-to-end

How voice-pay works

  • NLP intent extraction: amount + recipient + currency + action entities pulled from a free-form Turkish or English utterance.
  • Dual-confirmation: a fresh random word is injected into the confirmation prompt — the user must repeat it together with the amount.
  • Anti-spoof on confirmation: TTS clone, voice conversion and replay are rejected at ≥95% on the red-team adversarial set.
  • Cap / limit / whitelist enforcement: bank policy applied deterministically before execute (per-tx, daily, weekly).
  • Cross-layer signals: Layer 1 coercion + Layer 2 voiceprint scored on the same audio.
  • Immutable audit: voice intent + confirmation + cross-layer scores + blockchain hash, retained 7 years per banking regs.

Why bank-grade not consumer-grade

  • Consumer voice tools (Siri Pay, Google Pay voice, Alexa Pay) are one-way speech-to-action: no per-turn random word, no voiceprint match on confirmation, no cap engine, no dispute audit chain.
  • Sesim L3 is a deterministic decision-execution layer: every transaction passes a policy gate before reaching the bank core.
  • 3-factor cross-layer single-utterance handshake (Patent claims 41, 42): voice biometric (inherence) + dynamic random word (knowledge) + NFC card-tap (possession) — PSD2 SCA aligned.
  • 3-validator Proof-of-Voice consensus (Patent claim 50): 2/3 majority required before audit anchor; validator health monitoring and auto-disable on consecutive failures.
  • IPFS/DHT-anchored off-chain audit (Patent claim 56): tamper-evident, distributed audit chain; not just an in-memory hash chain. Bank can run its own validator node.
  • Voice smart-contract intent extraction (Patent claim 52): NLP intent → policy gate → execution; same path serves bank transfers, healthcare consent, transit, e-commerce, exam integrity.
  • PCI-DSS scope minimization: tokenization at boundary — Sesim never sees the PAN.
  • KVKK m.5/2-c, m.5/2-e, m.5/2-f lawful basis under the Turkish Banking Act (5411).

KPI gates (pilot)

  • Transaction success rate ≥ 95% on the funnel (intent → confirm → execute).
  • Misrecognition rate ≤ 1% across production audit + adversarial set.
  • Dual-confirmation success ≥ 99% on the confirmation prompt.
  • Voice-to-confirmation latency p95 ≤ 1500 ms; voice-to-execution p95 ≤ 3500 ms.
  • Dispute / chargeback rate ≤ 0.1%.
  • Deepfake / synthetic-voice rejection ≥ 95% on red-team set.

Pilot at a glance

Duration
12–16 weeks (partner-led)
Fee
$40–60k creditable to Phase 2
Scope
Outbound TRY ≤ 5,000 — whitelisted recipients only
Payment
30% kickoff + 30% mid-pilot + 40% on success-criteria sign-off
Phase 2
$500–700k yearly (volume / channel tier)