Voice Pay
Speak. Confirm. Done — for real money.
Sesim Voice Transaction (Layer 3) lets users authorize bank transfers, POS payments, and consent flows by speaking. NLP intent extraction (amount/recipient/currency) plus a unique-per-turn dual-confirmation challenge plus anti-spoof on the confirmation audio. Combined with Layer 1 (coercion detection) and Layer 2 (voiceprint), one voice intent yields three signals: who + under coercion? + what action.
Live voice-pay execution is restricted to active partner-led pilots. To request access for a mobile banking, POS, ATM voice-pay, healthcare consent, or public e-services pilot, contact us.
Request pilot access →A typical voice-pay turn — illustrated only. No real audio is captured on this page; live execution requires a partner-led pilot.
- 1User says
"Send 100 TRY to Ahmet" - 2ASR + NLP intent
{ amount: 100, currency: TRY, recipient: Ahmet, action: transfer } - 3Whitelist + cap check
Ahmet (...4567) ✓ · 100 ≤ 5,000 ✓ - 4Dual-confirmation prompt
"Confirm 100 TRY to Ahmet (...4567)? Say: 'Yes, code-cloud, 100 TRY confirm'" - 5User says
"Yes, code-cloud, 100 TRY confirm" - 6Cross-layer verdict
L1 stress 0.08 · L2 voiceprint 0.91 · L3 anti-spoof 0.94 → PASS - 7Execute + audit
Bank API → blockchain audit hash · 2.4s end-to-end
How voice-pay works
- NLP intent extraction: amount + recipient + currency + action entities pulled from a free-form Turkish or English utterance.
- Dual-confirmation: a fresh random word is injected into the confirmation prompt — the user must repeat it together with the amount.
- Anti-spoof on confirmation: TTS clone, voice conversion and replay are rejected at ≥95% on the red-team adversarial set.
- Cap / limit / whitelist enforcement: bank policy applied deterministically before execute (per-tx, daily, weekly).
- Cross-layer signals: Layer 1 coercion + Layer 2 voiceprint scored on the same audio.
- Immutable audit: voice intent + confirmation + cross-layer scores + blockchain hash, retained 7 years per banking regs.
Why bank-grade not consumer-grade
- Consumer voice tools (Siri Pay, Google Pay voice, Alexa Pay) are one-way speech-to-action: no per-turn random word, no voiceprint match on confirmation, no cap engine, no dispute audit chain.
- Sesim L3 is a deterministic decision-execution layer: every transaction passes a policy gate before reaching the bank core.
- 3-factor cross-layer single-utterance handshake (Patent claims 41, 42): voice biometric (inherence) + dynamic random word (knowledge) + NFC card-tap (possession) — PSD2 SCA aligned.
- 3-validator Proof-of-Voice consensus (Patent claim 50): 2/3 majority required before audit anchor; validator health monitoring and auto-disable on consecutive failures.
- IPFS/DHT-anchored off-chain audit (Patent claim 56): tamper-evident, distributed audit chain; not just an in-memory hash chain. Bank can run its own validator node.
- Voice smart-contract intent extraction (Patent claim 52): NLP intent → policy gate → execution; same path serves bank transfers, healthcare consent, transit, e-commerce, exam integrity.
- PCI-DSS scope minimization: tokenization at boundary — Sesim never sees the PAN.
- KVKK m.5/2-c, m.5/2-e, m.5/2-f lawful basis under the Turkish Banking Act (5411).
KPI gates (pilot)
- Transaction success rate ≥ 95% on the funnel (intent → confirm → execute).
- Misrecognition rate ≤ 1% across production audit + adversarial set.
- Dual-confirmation success ≥ 99% on the confirmation prompt.
- Voice-to-confirmation latency p95 ≤ 1500 ms; voice-to-execution p95 ≤ 3500 ms.
- Dispute / chargeback rate ≤ 0.1%.
- Deepfake / synthetic-voice rejection ≥ 95% on red-team set.