How I Built TASA: Face Authentication for a Virtual Assistant
Implementing hierarchical access control and real-time face recognition for privacy-aware voice assistants.
The Problem Space
Back in 2023, I worked on TASA as my undergraduate research project. The motivation was straightforward: virtual assistants like Alexa and Siri had a fundamental security flaw. They couldn't distinguish between users. Anyone within earshot could access your calendar, messages, or smart home controls. There was no concept of ownership or privacy.
TASA (Trusted Assistant with Secure Access) explored whether we could build a virtual assistant with proper authentication and hierarchical access control, without sacrificing the hands-free convenience that makes these systems useful.
Legacy Project Note
This was an academic research project completed in 2023. The code is no longer publicly available, but the technical approach and lessons learned remain relevant for anyone interested in biometric authentication or secure system design.
The Authentication Pipeline
The core challenge was building a multi-layered authentication system that worked in real-time. We settled on a two-factor approach:
Primary: Face Recognition
CNN with HOG (Histogram of Oriented Gradients) feature descriptors for real-time face identification
Secondary: Secret Passphrase
User-specific passphrase prevents photo-based spoofing and adds a knowledge factor
Why HOG for Feature Extraction?
Most modern face recognition systems use deep feature extractors like FaceNet or ArcFace. We chose Histogram of Oriented Gradients for specific reasons:
Computational Efficiency
Significantly lighter than end-to-end deep learning. Critical for real-time on a Raspberry Pi.
Interpretability
HOG captures structural features: gradient orientations that correspond to edges and contours. More transparent than black-box deep features.
Low Training Requirements
Requires far fewer training samples per user. Realistic when each user only provides 50–100 registration images.
The image is divided into 8×8 pixel cells. For each cell, we build a 9-bin histogram of gradient orientations. These histograms are then normalized using L2 normalization across overlapping blocks to handle lighting variations. The result is a feature descriptor that's robust to illumination changes but sensitive to facial structure.
The CNN Architecture
The HOG descriptors feed into a relatively shallow CNN (3 convolutional layers + 2 fully connected layers). The network learns to classify users into three tiers, detect invalid authentication attempts, and handle temporal information across video frames.
Anti-Spoofing: Rather than processing single images, we extract features from multiple consecutive frames. The CNN learns patterns that only appear in live video feeds: micro-movements, subtle lighting changes, natural head motion. Static photos fail this temporal consistency check.
Multi-Level Access Control
Admin (Owner)
Full system access: personalized data, system configuration, user management, and all standard assistant features.
Sub-Admin
Configurable partial access: general assistant features, limited personal data, no system configuration rights.
Guest
Minimal access: public information queries, basic entertainment. Zero access to personal data or system settings.
Key Design: This hierarchy is enforced at the backend API level, not just in the UI. Critical for actual security.
Implementation Challenges
Real-Time Performance
Face recognition needed to happen in <500ms to feel seamless. Our initial implementation took 2–3 seconds.
Result: ~400ms on a Raspberry Pi 4
Static Image Bypass
Early testing showed users could authenticate using printed photos, defeating the entire purpose.
Solution: Multi-frame temporal analysis. The CNN looks for consistency across 5–10 consecutive frames. A static image shows perfect consistency (too perfect), while a real face shows natural micro-variations. We also added a liveness check: prompting users to turn their head slightly.
False Rejection Rate
Initial accuracy was 90%, but false rejections were frustrating. Users would get denied despite being registered.
False rejections reduced from ~10% to ~3%
Results & Performance
Compared to consumer products (Alexa, Siri, Google Assistant), TASA offered user authentication, hierarchical access control, anti-spoofing protection, and transparent data access policies. Features none of them had at the time.
The tradeoff was requiring users to authenticate before each session, but for security-sensitive use cases (banking queries, medical data, confidential work), this seemed reasonable.
What I'd Do Differently Today
Improvements
- Use pre-trained face embeddings (FaceNet/dlib)
- Add voice biometrics for stronger multi-modal auth
- Edge ML optimization with TFLite or ONNX Runtime
- Differential privacy for stored biometrics
Key Takeaways
- Security and UX are in constant tension
- Traditional CV techniques still have real value
- Temporal information from video is underutilized
- 90% accuracy ≠ 90% user satisfaction
TASA taught me a fundamental lesson:
"Build security that people will actually use."
The safest system is useless if users bypass it out of frustration.
Published & Patented
TASA was published at IEEE ICCUBEA 2023 and filed as an Indian patent (202221066577).