Case Study 03 · Systems & AI Architecture

Argus: Private AI Search Without Sending Data to the Cloud

Most modern AI systems require users to upload sensitive information to external providers. For many organizations—and increasingly for individuals—that creates a trust problem.

Search SpeedSub-10msIn-memory similarity retrieval.

ConcurrencyGo Worker PoolsEfficient async file parsing.

Security ModelAES-256 CTRZero-trust, disk-level data isolation.

Tech ModelLocal RAG StackEliminated external APIs and cloud pipelines.

The Problem

Modern search tools create a difficult tradeoff: privacy vs functionality. Most solutions force users to choose one. Organizations cannot safely run cloud-based semantic search tools across proprietary documentation without risk of leakage. I wanted to explore whether both high-precision search and total user data ownership could coexist.

The question behind Argus was simple: Can modern AI-powered search remain useful without requiring users to surrender ownership of their data?

The Solution

I designed and built a local-first retrieval system that processes, indexes, encrypts, and searches documents entirely on local hardware, requiring zero external APIs.

The system combines a background crawling daemon to parse documents asynchronously, a local FAISS similarity index, and direct Server-Sent Events to stream conceptual matches safely to the user client interface.

Core Engineering

1. High-Throughput Bounded Concurrency in Go

Parsing multi-gigabyte directories containing diverse document types requires I/O efficiency and resource containment. I developed the core ingestion daemon in Go, utilizing a bounded worker pool pattern to manage computational workloads.

Engineered Concurrency Boundaries

By implementing controlled goroutine pools and structured, synchronized channel flows, I isolated heavy parsing tasks from file crawling. The system processes documents concurrently without generating data races or exceeding memory thresholds. Limiting active workers ensures stable CPU and RAM overhead, allowing the local machine to remain highly responsive.

2. Sub-10ms Similarity Inquiries via FAISS

To support semantic search queries (e.g. searching for conceptual ideas rather than exact text characters), document segments are transformed into dense embeddings. I integrated a localized embeddings pipeline mapping text to vectors, loaded directly into a custom FAISS (Facebook AI Similarity Search) index structure. High-dimensional vector calculations are executed in-memory, delivering semantic results in under 10 milliseconds.

3. Zero-Trust Storage Security & Streaming

To maintain strict cryptographic isolation, plain text document segments are never written directly to disk. The parsed texts are encrypted using AES-256 CTR (Counter) mode and matched with specific vector coordinate indices.

When a search is submitted, it is vectorized and sent to the local similarity index. FAISS returns the closest matching index IDs; the platform retrieves only the corresponding encrypted files, decrypts them in secure temporary memory buffers, and streams them instantly to the client interface using Server-Sent Events (SSE). Plain text fragments are never persisted, and users receive real-time, token-by-token visual feedback.

The Stack

Utilizing low-latency, highly specialized systems languages and optimized mathematical engines to secure absolute privacy:

Go (Golang)PythonFAISS Vector IndexingAES-256-CTRServer-Sent Events (SSE)Next.js (TypeScript)

Outcome

Argus demonstrated that high-quality semantic search does not require cloud infrastructure. The system achieved sub-10ms retrieval, zero cloud dependencies, encrypted storage, and a local-first architecture.

The project became less about search and more about a broader question: What would privacy-first AI infrastructure actually look like?