Layer 3 — Voice Transaction

Voice Pay

Speak. Confirm. Done — for real money.

Sesim Voice Transaction (Layer 3) lets users authorize bank transfers, POS payments, and consent flows by speaking. NLP intent extraction (amount/recipient/currency) plus a unique-per-turn dual-confirmation challenge plus anti-spoof on the confirmation audio. Combined with Layer 1 (coercion detection) and Layer 2 (voiceprint), one voice intent yields three signals: who + under coercion? + what action.

Private pilot

Live voice-pay execution is restricted to active partner-led pilots. To request access for a mobile banking, POS, ATM voice-pay, healthcare consent, or public e-services pilot, contact us.

Request pilot access →

Demo flow (illustrative)

A typical voice-pay turn — illustrated only. No real audio is captured on this page; live execution requires a partner-led pilot.

1
User says
"Send 100 TRY to Ahmet"
2
ASR + NLP intent
{ amount: 100, currency: TRY, recipient: Ahmet, action: transfer }
3
Whitelist + cap check
Ahmet (...4567) ✓ · 100 ≤ 5,000 ✓
4
Dual-confirmation prompt
"Confirm 100 TRY to Ahmet (...4567)? Say: 'Yes, code-cloud, 100 TRY confirm'"
5
User says
"Yes, code-cloud, 100 TRY confirm"
6
Cross-layer verdict
L1 stress 0.08 · L2 voiceprint 0.91 · L3 anti-spoof 0.94 → PASS
7
Execute + audit
Bank API → blockchain audit hash · 2.4s end-to-end

How voice-pay works

NLP intent extraction: amount + recipient + currency + action entities pulled from a free-form Turkish or English utterance.
Dual-confirmation: a fresh random word is injected into the confirmation prompt — the user must repeat it together with the amount.
Anti-spoof on confirmation: TTS clone, voice conversion and replay are rejected at ≥95% on the red-team adversarial set.
Cap / limit / whitelist enforcement: bank policy applied deterministically before execute (per-tx, daily, weekly).
Cross-layer signals: Layer 1 coercion + Layer 2 voiceprint scored on the same audio.
Immutable audit: voice intent + confirmation + cross-layer scores + blockchain hash, retained 7 years per banking regs.

Why bank-grade not consumer-grade

Consumer voice tools (Siri Pay, Google Pay voice, Alexa Pay) are one-way speech-to-action: no per-turn random word, no voiceprint match on confirmation, no cap engine, no dispute audit chain.
Sesim L3 is a deterministic decision-execution layer: every transaction passes a policy gate before reaching the bank core.
3-factor cross-layer single-utterance handshake (Patent claims 41, 42): voice biometric (inherence) + dynamic random word (knowledge) + NFC card-tap (possession) — PSD2 SCA aligned.
3-validator Proof-of-Voice consensus (Patent claim 50): 2/3 majority required before audit anchor; validator health monitoring and auto-disable on consecutive failures.
IPFS/DHT-anchored off-chain audit (Patent claim 56): tamper-evident, distributed audit chain; not just an in-memory hash chain. Bank can run its own validator node.
Voice smart-contract intent extraction (Patent claim 52): NLP intent → policy gate → execution; same path serves bank transfers, healthcare consent, transit, e-commerce, exam integrity.
PCI-DSS scope minimization: tokenization at boundary — Sesim never sees the PAN.
KVKK m.5/2-c, m.5/2-e, m.5/2-f lawful basis under the Turkish Banking Act (5411).