Capstone Project Report

BioVault — Multi-Modal Biometric Security Application

Synopsis submitted for the partial fulfilment of the degree of Bachelor of Technology (CSE — Cloud Computing).

Student Lakshika Tanwar

Registration # GF202220476

Programme B.Tech CSE — Cloud Computing

Semester VIII

Institute Yogananda School of AI, Computers and Data Sciences

University Shoolini University, Solan, H.P.

Mentor [Capstone Mentor]

Year 2025–2026

Download .docx Open live demo View pitch Source on GitHub

Abstract

This capstone presents BioVault, a cloud-native, multi-modal biometric authentication system designed to replace fragile password and OTP flows with a continuous, risk-adaptive trust score. The application fuses four independent factors — facial recognition with blink-based liveness, voice biometrics, keystroke dynamics, and a WebAuthn passkey — each contributing a calibrated score that is combined into a single trust value and mapped to one of three actions: ALLOW, STEP_UP, or DENY.

All machine-learning inference runs entirely in the user's browser (face-api.js for face descriptors, the Web Audio API for voice features, native event timing for keystroke dynamics, and the W3C WebAuthn API for hardware passkeys). The Python backend, built on FastAPI and deployed to Google Cloud Run in asia-east1 with a scale-to-zero configuration, performs only stateless cryptographic verification and feature comparison. No raw audio, video, or PII ever leaves the device, and templates are held only in volatile memory with a 30-minute TTL — a deliberate privacy-by-design choice for the MVP.

The system was built end-to-end in a single sprint, validated against synthetic and same-user test scenarios, and shipped as a public, fully reproducible repository with one-command deployment. It serves as a working proof that strong, multi-factor, privacy-preserving biometric auth can be delivered on commodity browsers and a pay-per-millisecond serverless backend.

1. Introduction & Problem Definition

1.1 Background

The Verizon DBIR consistently attributes more than four-fifths of breaches to weak, reused, or stolen passwords; SMS-OTP and TOTP improve the picture but remain phishable and shareable. Single-modal biometrics (a face match, a fingerprint) close some of that gap but fall to spoofing, presentation attacks, and irreversible loss once the template leaks. The problem is not a missing factor — it is the absence of a framework that combines several factors into a graded, contextual decision.

1.2 Problem statement

Design and implement a multi-modal biometric authentication system that (a) performs all sensitive computation locally, (b) stores only one-way feature vectors, (c) fuses any subset of available factors into a calibrated trust score, and (d) deploys on a serverless, scale-to-zero platform suitable for an Indian-context privacy and cost profile.

1.3 Target users

End users seeking phishing-resistant login on consumer apps (banking, government services, education).
Developers who want to add strong auth to a web product without managing biometric infrastructure.
Compliance officers who need a defensible, DPDP-Act-aligned model where templates are revocable and minimal.

1.4 Objectives

Implement four working biometric modalities with calibrated scoring.
Provide a real-time risk decision with clear allow / step-up / deny semantics.
Run on a min=0 Cloud Run service with sub-second cold starts.
Ship verifiable open-source code, a live demo, and a written report.

1.5 Scope of MVP

In scope: in-browser ML inference; stateless API; in-memory template store; TTL-based eviction; live audit log; arrow-key pitch deck and public report.

Out of scope (explicitly): persistent template storage, encrypted template envelopes, multi-tenant administration, deepfake-resistant voice models, advanced 3D liveness, mobile native SDKs.

2. System Requirements

2.1 Functional

ID	Requirement	Acceptance criteria
F1	Create a demo user identity	`POST /api/users` returns a unique id and label.
F2	Enroll a face descriptor with liveness	Browser captures a blink; 128-D L2-normalized vector is posted to `/api/face/enroll`.
F3	Verify a face descriptor	`/api/face/verify` returns distance, score, pass/fail.
F4	Enroll and verify a voice template	18-D feature vector compared by cosine similarity.
F5	Enroll and verify keystroke timing	Manhattan distance on z-score-normalized timing vector.
F6	Register and authenticate a WebAuthn passkey	Standard W3C ceremony with origin- and challenge-bound proof.
F7	Compute aggregate trust + action	`/api/risk/score` returns trust, risk, action, factor breakdown.
F8	Live audit feed	`/api/events` exposes the last 500 events; the UI polls every 3 s.

2.2 Non-functional

Performance: cold start < 500 ms; per-modality verify p95 < 50 ms.
Availability: Cloud Run regional service with auto-restart.
Security: HSTS, restrictive Permissions-Policy, no raw biometric data on the wire, per-request correlation IDs.
Privacy: in-memory only; 30-minute TTL; user-initiated deletion.
Accessibility: WCAG 2.2 AA on the demo UI; reduced-motion respected.
Cost: zero idle cost on Cloud Run min=0; under ₹0.10 per 1 000 verifications.

2.3 Hardware / software

Layer	Requirement
Client	Modern browser with camera, microphone, and WebAuthn (Chrome 95+, Safari 16+, Firefox 113+, Edge 100+).
Network	HTTPS only; works on 3G with 1 MB initial payload.
Server	Cloud Run, 256 MiB RAM, 1 vCPU, autoscale 0–3, asia-east1.
Build	Python 3.12, Cloud Build, Docker, Artifact Registry.

3. System Architecture & Design

3.1 High-level view

┌────────────────────────────────────────────────────────────────────────────┐
│  Browser (untrusted)                                                       │
│  ─────────────────────────────────────────────────────────────────────     │
│   • face-api.js (TinyFaceDetector + landmark68 + recognition)              │
│   • Web Audio API → 18-D voice features (FFT + Mel)                        │
│   • keystroke dwell/flight timing capture                                  │
│   • WebAuthn navigator.credentials.* ceremony                              │
└────────────────────────────────────────────────────────────────────────────┘
                                  │  TLS 1.3, JSON, x-correlation-id
                                  ▼
┌────────────────────────────────────────────────────────────────────────────┐
│  Cloud Run · asia-east1 · min=0  (FastAPI / Pydantic v2 / NumPy)           │
│  ─────────────────────────────────────────────────────────────────────     │
│   /api/users        →  EphemeralStore (RLock-guarded)                      │
│   /api/face/*       →  cosine + Euclidean against enrolled 128-D           │
│   /api/voice/*      →  z-score cosine similarity                           │
│   /api/keystroke/*  →  Manhattan distance, normalised                      │
│   /api/passkey/*    →  py_webauthn (challenge stash + verify)              │
│   /api/risk/score   →  weighted fusion + decision band                     │
│   /api/events       →  ring-buffered audit log                             │
│   structured JSON logs · GZip · HSTS · correlation IDs                     │
└────────────────────────────────────────────────────────────────────────────┘

3.2 Data model (in-memory)

UserRecord
├── user_id, label, created_at, last_seen_at
├── FaceTemplate(descriptor: float[128], liveness_passed: bool)
├── VoiceTemplate(feature_vector: float[18])
├── KeystrokeTemplate(passphrase: str, timing_vector: float[2N-1])
└── PasskeyCredential(credential_id: bytes, public_key: bytes, sign_count: int)

3.3 Sequence: multi-factor verify + decide

User → Browser : enroll & later verify
Browser → API  : POST /api/face/verify        →  {score, distance, passed}
Browser → API  : POST /api/voice/verify       →  {score, passed}
Browser → API  : POST /api/keystroke/verify   →  {score, passed}
Browser → API  : POST /api/passkey/auth/*     →  {score=1.0, passed}
Browser → API  : POST /api/risk/score         →  {trust, risk, action, factors}
Browser ← API  : action ∈ {ALLOW, STEP_UP, DENY}

3.4 Decision policy

Trust	Action	Meaning
≥ 0.85	`ALLOW`	Grant access without further challenge.
0.65 – 0.85	`STEP_UP`	Require an additional factor (e.g., passkey).
< 0.65	`DENY`	Reject and log the attempt.

3.5 Threat model

Photo / video replay → blink-based liveness; future work adds depth/texture.
Voice replay → fixed passphrase challenge; future work adds anti-spoofing CNN.
Phishing → WebAuthn is origin-bound; passkey caps the score even if other factors fail.
Template theft → only one-way embeddings stored, in volatile memory, with TTL eviction.
Replay of API payloads → per-challenge nonces (WebAuthn); CSRF mitigated by SameOrigin + correlation IDs.

4. Technology Stack

Layer	Choice	Rationale
UI	Vanilla HTML + ESM JS + custom CSS	Zero build step, fast cold start, easy to audit.
Face ML	face-api.js on TensorFlow.js	Mature, browser-only, MIT-licensed; runs on CPU.
Voice	Web Audio API + custom FFT + Mel filterbank	No external lib needed; deterministic features.
Keystroke	Native KeyboardEvent timestamps	Sub-millisecond resolution, zero overhead.
Passkey	W3C WebAuthn + `py_webauthn`	Industry standard, phishing-resistant.
Backend	FastAPI 0.115 / Pydantic v2 / NumPy	Strict typed validation, async, fast import.
Container	python:3.12-slim, single layer	~85 MiB image, < 500 ms cold start.
Hosting	Cloud Run · asia-east1 · min=0	Pay-per-request, India-adjacent latency, scale-to-zero.
CI/CD	GitHub Actions + `gcloud run deploy --source`	One-command deploy, OIDC-authenticated push.

5. Implementation

5.1 Repository layout

biovault/
├─ app/
│  ├─ main.py                  # FastAPI routes, middleware, lifespan
│  ├─ biometric/
│  │  ├─ store.py              # EphemeralStore, dataclasses, TTL
│  │  ├─ face.py               # cosine/Euclidean comparison + scoring
│  │  ├─ voice.py              # z-score cosine on 18-D vector
│  │  ├─ keystroke.py          # Manhattan distance comparator
│  │  └─ risk.py               # weighted fusion + decision band
│  └─ static/
│     ├─ index.html · pitch.html · report.html
│     ├─ css/{app,pitch,report}.css
│     └─ js/{app,api,face,voice,keystroke,passkey,pitch}.js
├─ Dockerfile · requirements.txt · .dockerignore
├─ scripts/{deploy.sh, build_report.py}
└─ .github/workflows/deploy.yml

5.2 Backend highlights

Stateless validation with Pydantic v2; every payload's vector length is enforced before any compute happens.
Structured JSON logs with file/line, log level, and correlation id, suitable for Cloud Logging ingest.
Custom middleware stamps response with X-Correlation-ID, HSTS, Permissions-Policy, X-Content-Type-Options.
Thread-safe in-memory store with TTL eviction; users older than 30 min are auto-removed on read.

5.3 Frontend highlights

ES modules — one entrypoint, modular subsystems for face, voice, keystroke, passkey, API.
All ML happens client-side; the server only sees feature vectors.
Live trust meter and SIEM-style audit feed update every 3 seconds.
Reduced-motion respected; AAA-friendly contrast and visible focus rings.

5.4 Critical code paths

face.py – calibrated face score
score = 1                              if d ≤ 0
      = 0.5 + 0.5(1 − d/T)             if 0 < d ≤ T = 0.55
      = 0.5(1 − (d−T)/(D−T))           if T < d ≤ D = 1.10
      = 0                              otherwise
where d = ||a − b||  on L2-normalised 128-D descriptors.

risk.py – weighted fusion
weights w  = {face: 0.35, passkey: 0.30, voice: 0.20, keystroke: 0.15}
trust      = Σ wᵢ · scoreᵢ  /  Σ wᵢ           (over present factors)
            × 0.9 if single factor
            ↓ capped at 0.5 if any modality hard-failed
action     = ALLOW (≥0.85) · STEP_UP (≥0.65) · DENY (otherwise)

6. Algorithms & Models

6.1 Face descriptor

BioVault uses the pretrained ResNet-style FaceRecognition head from face-api.js, which produces a 128-dimensional, approximately L2-normalized embedding per face. Detection is performed by TinyFaceDetector for speed (~30–80 ms per frame on a mid-range laptop). Comparison uses Euclidean distance on the unit-normalized vectors, calibrated to a 0..1 score against the recommended 0.55 threshold.

6.2 Liveness

Eye-aspect ratio (EAR) is computed per frame across the 6 landmarks of each eye. A drop to 60 % of the running baseline within a 3.5-second window is treated as a blink. Spoofs based on a single still photo therefore fail this challenge. Future work: head-pose challenge (yaw > ±15°), texture analysis, and 3D structure-from-motion cues.

6.3 Voice features

Audio is captured at 16 kHz, framed at 25 ms with 10 ms hop, Hann-windowed, FFT'd in a custom radix-2 implementation, and reduced per voiced frame to a 18-D vector:

Zero-crossing rate
Normalized spectral centroid
Spectral rolloff (85 %)
Spectral flatness (geometric / arithmetic mean)
Log frame energy
13 mel-band log-energies (80 Hz – Nyquist)

Per-frame vectors are averaged across voiced frames. At verification time, the candidate and enrolled vectors are independently z-scored, then compared by cosine similarity. Mapping (cos+1)/2 yields a 0..1 score.

6.4 Keystroke dynamics

For an N-character passphrase, the browser captures a 2N − 1-vector interleaving dwell times (keyup − keydown) and flight times (next keydown − previous keyup). Both vectors are z-score-normalized; the per-key Manhattan distance is divided by the vector length and mapped via 1 − norm/1.5 to a 0..1 score.

6.5 Passkey verification

Standard W3C WebAuthn registration and authentication ceremonies are performed using the py_webauthn library. The server generates a 32-byte challenge, stashes it under a per-user scope, and verifies the returned attestation/assertion against the request's RP-ID and origin. A successful authentication contributes a fixed score of 1.0 with a high weight (0.30).

6.6 Fusion

The fusion module computes the weighted average of available factor scores, re-normalizing the denominator over the factors actually present, then applies two correctives: a single-factor penalty (×0.9) and a hard-fail cap (max 0.5 if any modality returned passed=false). The final trust value is bucketed into the action tiers.

7. Testing

7.1 Test strategy

Because biometric matchers are pure functions over vectors, the most reliable tests are property-style: same inputs produce identical scores; small perturbations produce gracefully degraded scores; mismatched dimensions raise validation errors; the action band is monotonic in trust.

Layer	Approach
Pydantic schemas	Length and finiteness checks unit-tested at vector boundaries.
Face matcher	Self-match → score = 1.0; orthogonal vectors → score ≈ 0.
Voice matcher	Identical vectors → score = 1.0; flipped sign → score ≈ 0.
Keystroke matcher	Same vector → score = 1.0; ×3 dilation → ~0.4.
Risk fusion	Decision band boundaries verified at 0.65 and 0.85.
WebAuthn	End-to-end ceremony in Chrome / Safari / Firefox.
HTTP layer	Manual exercise of every endpoint via the live UI plus `/api/docs`.
Failure modes	Missing camera, denied mic, partial typing, expired challenge — each raised as a user-friendly message.

7.2 Edge cases exercised

Browser without WebAuthn support → graceful degradation; passkey factor is omitted from fusion.
Network failure mid-verify → user sees a toast; correlation ID is preserved for retry.
Concurrent enroll/verify on the same user → store is RLock-guarded.
TTL eviction during long demo sessions → user is asked to re-enroll.

8. Results & Performance Analysis

Metric	Observed	Notes
Cold-start latency	~250–350 ms	Cloud Run, 256 MiB, 1 vCPU, asia-east1.
API verify p95 (face)	< 25 ms	NumPy on 128-D vector.
API verify p95 (voice)	< 8 ms	Pure 18-D arithmetic.
API verify p95 (keystroke)	< 6 ms	Length depends on passphrase.
Browser face capture	~1.2 s	Includes blink challenge + 4-frame averaging.
Voice capture	2.8 s recording + ~120 ms FFT	16 kHz, Hann, radix-2 FFT.
Image size	~85 MiB	python:3.12-slim, single layer.
Initial JS payload	~700 KB gzipped	face-api.js dominates; loaded from CDN.
Idle cost	₹0 / month	Cloud Run min=0 with no requests.

8.1 Accuracy under same-user conditions

In informal testing across two users and three lighting conditions, same-user face scores stayed above 0.78 and impostor scores below 0.30. Voice scores varied more (0.62 – 0.95 same-user) because the feature set is intentionally lightweight; voice is therefore weighted lower in fusion. Keystroke scores were strongly speaker-typist dependent — same-typist 0.70+, impostor < 0.40.

8.2 Limitations of these numbers

These are MVP-grade observations on a small sample. They demonstrate that the pipeline is wired correctly end-to-end and that the fusion policy responds to inputs as designed. Production deployment would require formal FAR/FRR/EER measurement on a diverse, consented dataset and IRB-style protocols; both are listed in the future scope.

9. Deployment

9.1 One-shot deploy

gcloud run deploy biovault \
  --source . \
  --region asia-east1 \
  --allow-unauthenticated \
  --min-instances 0 \
  --max-instances 3 \
  --cpu 1 --memory 256Mi \
  --concurrency 40 \
  --port 8080

The scripts/deploy.sh wrapper sets the project, enables required APIs (Cloud Run, Cloud Build, Artifact Registry) and prints the resulting URL.

9.2 GitHub Actions

A workflow at .github/workflows/deploy.yml authenticates via OIDC to a Workload Identity Pool, then runs the same gcloud run deploy --source command on every push to main. Builds use Cloud Build's buildpacks plus a hand-tuned Dockerfile for predictable layering.

9.3 Routes

Path	Purpose
`/`	Live demo SPA.
`/pitch`	Arrow-key pitch deck (12 slides).
`/report`	This report.
`/report.docx`	Updated Word version.
`/api/docs`	OpenAPI / Swagger.
`/api/*`	JSON endpoints.
`/health`	Liveness + store stats.

10. Challenges & Solutions

Challenge	Solution
Cold-start time on min=0	Slim base image, single-layer Dockerfile, no heavy ML on the server, async lifespan.
Voice features without a Python ML stack on the client	Hand-written radix-2 FFT and Mel filterbank in JavaScript; only ~150 LOC.
Liveness without depth sensors	EAR-based blink detection over 3.5 s window with adaptive baseline.
Browser passkey UX divergences	Used `py_webauthn` on the server; conservative ResidentKey/UV preferences; tested in Chrome, Safari, Firefox.
Score calibration across modalities	Each matcher returns a normalised 0..1 score; thresholds picked so 0.5 is the decision boundary.
Avoiding PII on the wire	Browser computes embeddings; only vectors are POSTed; no images, no audio, no plaintext passphrases.

11. Conclusion & Future Scope

BioVault demonstrates that a privacy-preserving, multi-factor biometric flow can ship on a free-tier serverless backend, with all sensitive computation happening on the user's own device. The MVP integrates four established factors, fuses them with a transparent, auditable policy, and delivers a UX that completes enrolment in under half a minute.

Roadmap

Persistence with pgvector and envelope encryption (KMS-managed).
Anti-spoofing: 3D liveness (depth + parallax), deepfake voice detection.
Continuous authentication: behavioural session signals (gait, mouse, scroll).
Drop-in JS SDK and Android/iOS bindings.
Compliance: SOC 2 Type II, ISO 27001, DPDP DPO console.
Pluggable policy engine: per-tenant weights, geo / device risk.

Viva-Voce Questions

1. What real-world problem does your project solve, and who are the target users?

BioVault replaces phishable passwords and OTPs with a privacy-preserving, multi-modal biometric flow. End users get strong, hardware-backed authentication; developers get a drop-in service; compliance officers get a defensible, DPDP-aligned model where nothing reconstructable is stored.

2. Why did you choose this technology stack over other alternatives?

Vanilla JS keeps the bundle small and the cold-start budget tight. FastAPI gives strict, declarative validation with very low import overhead. Cloud Run was chosen over App Engine and a VM because it scales to zero, charges per request, and supports asia-east1 with low latency from India. face-api.js was preferred over MediaPipe because the recognition head ships a usable 128-D embedding out of the box.

3. Explain your system architecture — how do different components interact?

The browser performs all sensitive ML and produces compact feature vectors. The FastAPI service exposes a small set of stateless endpoints that compare those vectors to in-memory templates and emit a structured event. A separate fusion endpoint aggregates the per-factor results into a single trust score and decision. A live audit ring buffer powers the SIEM-style feed in the UI.

4. How will your system handle scalability if users increase from 100 to 10,000?

Cloud Run auto-scales horizontally; max instances and concurrency tune per-instance load. Stateless endpoints distribute trivially. For 10 k users, the in-memory store is replaced by Redis (templates) plus Postgres + pgvector (cold storage); rate limits move to Cloud Armor. The matcher math is O(d) per verify, so end-to-end latency stays flat with user count.

5. What security measures have you implemented?

Origin- and challenge-bound WebAuthn, HSTS preload, restrictive Permissions-Policy, X-Content-Type-Options, structured audit logs with correlation IDs, no raw biometric data on the wire, in-memory only templates with TTL eviction, single-factor and hard-fail penalties in fusion.

6. What were the biggest challenges, and how did you solve them?

Implementing a usable voice front-end without bringing in Python ML on the client led to a hand-written FFT and Mel filterbank in ~150 lines of JS. Calibrating four independent score scales to a single decision band required choosing per-modality thresholds and centring them at 0.5. Cold-start latency on Cloud Run min=0 forced a slim base image and a single-layer Dockerfile.

7. How did you test your system, and how do you ensure it is reliable?

Each matcher is a pure function and was exercised with self-match, orthogonal-match, dilated-match, and bad-input cases. The fusion policy was probed at boundary values. The full HTTP layer was driven through the live UI and the auto-generated Swagger page. Failure modes (denied permissions, network drops, expired challenges) raise user-friendly messages with a preserved correlation ID.

8. If your system fails in production, how will you handle debugging and recovery?

Every request emits a structured JSON log line containing the correlation ID, file, function, and timing. Cloud Logging makes those searchable in real time. The SIEM feed surfaces score and decision history for the live demo. Recovery: Cloud Run auto-restarts on crash; the in-memory store is rebuilt on first request, so the system is always one redeploy away from a clean slate.

9. What are the limitations of your project, and how can it be improved?

The MVP does not persist anything; it uses lightweight voice features rather than a dedicated ECAPA-TDNN model; liveness is blink-only; and accuracy was measured only informally. Each is addressed in the Future Scope section: persistence with pgvector, ECAPA voice embeddings, multi-cue liveness, and formal FAR/FRR/EER evaluation on a consented dataset.

10. If you had to deploy this as a real product or startup, what would be your next steps?

Move templates behind KMS-encrypted storage; ship a hosted SDK and admin console; secure SOC 2 / ISO 27001; build a paid tier (free under 1 k MAU, ₹2 / verification beyond); and partner with one Indian banking-tech or government-tech early customer to validate FAR/FRR on a real population.

References

W3C, Web Authentication: An API for accessing Public Key Credentials, Level 3, 2024.
Verizon, Data Breach Investigations Report 2024.
IBM Security, Cost of a Data Breach Report 2024.
NIST SP 800-63-3, Digital Identity Guidelines, 2017.
Government of India, Digital Personal Data Protection Act 2023.
OWASP, Top 10 Web Application Security Risks 2021.
F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: A Unified Embedding for Face Recognition and Clustering, CVPR 2015.
K. Killourhy, R. Maxion, Comparing Anomaly-Detection Algorithms for Keystroke Dynamics, DSN 2009.
D. Snyder et al., X-Vectors: Robust DNN Embeddings for Speaker Recognition, ICASSP 2018.
Google Cloud, Cloud Run documentation, 2025–26.
justadudewhohacks, face-api.js, MIT License, 2018-present.
FastAPI, Project documentation, 2025–26.