Biometric SecurityLegacy Project10 min read

How I Built TASA: Face Authentication for a Virtual Assistant

Implementing hierarchical access control and real-time face recognition for privacy-aware voice assistants.

PythonTensorFlowOpenCVRaspberry PiCNNHOG

The Problem Space

Back in 2023, I worked on TASA as my undergraduate research project. The motivation was straightforward: virtual assistants like Alexa and Siri had a fundamental security flaw. They couldn't distinguish between users. Anyone within earshot could access your calendar, messages, or smart home controls. There was no concept of ownership or privacy.

TASA (Trusted Assistant with Secure Access) explored whether we could build a virtual assistant with proper authentication and hierarchical access control, without sacrificing the hands-free convenience that makes these systems useful.

Legacy Project Note

This was an academic research project completed in 2023. The code is no longer publicly available, but the technical approach and lessons learned remain relevant for anyone interested in biometric authentication or secure system design.

The Authentication Pipeline

The core challenge was building a multi-layered authentication system that worked in real-time. We settled on a two-factor approach:

Primary: Face Recognition

CNN with HOG (Histogram of Oriented Gradients) feature descriptors for real-time face identification

Secondary: Secret Passphrase

User-specific passphrase prevents photo-based spoofing and adds a knowledge factor

Why HOG for Feature Extraction?

Most modern face recognition systems use deep feature extractors like FaceNet or ArcFace. We chose Histogram of Oriented Gradients for specific reasons:

Computational Efficiency

Significantly lighter than end-to-end deep learning. Critical for real-time on a Raspberry Pi.

Interpretability

HOG captures structural features: gradient orientations that correspond to edges and contours. More transparent than black-box deep features.

Low Training Requirements

Requires far fewer training samples per user. Realistic when each user only provides 50–100 registration images.

// HOG Feature Extraction Pipeline

Image → Preprocessing → Gradient Calculation → Cell Division →

Histogram Generation → Block Normalization → Feature Vector → CNN

// Per-pixel gradient computation

Magnitude: G = √(Gx² + Gy²)

Orientation: θ = arctan(Gy/Gx)

The image is divided into 8×8 pixel cells. For each cell, we build a 9-bin histogram of gradient orientations. These histograms are then normalized using L2 normalization across overlapping blocks to handle lighting variations. The result is a feature descriptor that's robust to illumination changes but sensitive to facial structure.

The CNN Architecture

The HOG descriptors feed into a relatively shallow CNN (3 convolutional layers + 2 fully connected layers). The network learns to classify users into three tiers, detect invalid authentication attempts, and handle temporal information across video frames.

Anti-Spoofing: Rather than processing single images, we extract features from multiple consecutive frames. The CNN learns patterns that only appear in live video feeds: micro-movements, subtle lighting changes, natural head motion. Static photos fail this temporal consistency check.

Multi-Level Access Control

Admin (Owner)

Full system access: personalized data, system configuration, user management, and all standard assistant features.

Sub-Admin

Configurable partial access: general assistant features, limited personal data, no system configuration rights.

Guest

Minimal access: public information queries, basic entertainment. Zero access to personal data or system settings.

Key Design: This hierarchy is enforced at the backend API level, not just in the UI. Critical for actual security.

Implementation Challenges

Real-Time Performance

Face recognition needed to happen in <500ms to feel seamless. Our initial implementation took 2–3 seconds.

Reduced HOG cell size from 16×16 to 8×8

Implemented frame skipping: authenticate every 3rd frame

Used OpenCV Haar Cascade for initial face detection (very fast) before HOG+CNN

Moved histogram normalization to GPU

Result: ~400ms on a Raspberry Pi 4

Static Image Bypass

Early testing showed users could authenticate using printed photos, defeating the entire purpose.

Solution: Multi-frame temporal analysis. The CNN looks for consistency across 5–10 consecutive frames. A static image shows perfect consistency (too perfect), while a real face shows natural micro-variations. We also added a liveness check: prompting users to turn their head slightly.

False Rejection Rate

Initial accuracy was 90%, but false rejections were frustrating. Users would get denied despite being registered.

Histogram equalization during preprocessing

Multiple registration sessions under different lighting

Confidence thresholding: 70-85% triggers passphrase instead of rejecting

False rejections reduced from ~10% to ~3%

Results & Performance

90%

Recognition Accuracy

92%

Precision

<1%

False Positive Rate

420ms

Avg Authentication Time

Compared to consumer products (Alexa, Siri, Google Assistant), TASA offered user authentication, hierarchical access control, anti-spoofing protection, and transparent data access policies. Features none of them had at the time.

The tradeoff was requiring users to authenticate before each session, but for security-sensitive use cases (banking queries, medical data, confidential work), this seemed reasonable.

What I'd Do Differently Today

Improvements

Use pre-trained face embeddings (FaceNet/dlib)
Add voice biometrics for stronger multi-modal auth
Edge ML optimization with TFLite or ONNX Runtime
Differential privacy for stored biometrics

Key Takeaways

Security and UX are in constant tension
Traditional CV techniques still have real value
Temporal information from video is underutilized
90% accuracy ≠ 90% user satisfaction

TASA taught me a fundamental lesson:

"Build security that people will actually use."

The safest system is useless if users bypass it out of frustration.

Published & Patented

TASA was published at IEEE ICCUBEA 2023 and filed as an Indian patent (202221066577).

IEEE ICCUBEA 2023 Indian Patent Filed