Projects
Everything here was built to solve a real problem in a regulated environment. The repos are public. Judge the work for yourself.
Featured Case Study
A traceable, citation-grounded retrieval architecture tested against USCIS policy documents. The point is not immigration. The point is what trustworthy AI infrastructure looks like under pressure.
A RAG pipeline that ingests PDF documents, chunks them with respect to legal hierarchy, stores embeddings in pgvector, and retrieves relevant context to generate grounded answers.
Source citation on every answer, deterministic model settings, retrieval choices designed to reduce redundancy, and architecture decisions made for repeatability.
If the system can hold up against dense policy manuals, it can be adapted for compliance documents, internal knowledge bases, policy libraries, and regulatory corpuses.
Featured Case Study
A production-grade ETL platform for ingesting, classifying, and governing AI-processed data. Built with full local-to-cloud parity, immutable audit trails, AI-powered document classification, and cost-controlled infrastructure-as-code deployment.
An Airflow-orchestrated pipeline that ingests raw documents, classifies them using Claude on Amazon Bedrock, applies data quality gates via Great Expectations, and lands validated records in Databricks Delta Lake with full lineage tracking.
Every transformation is versioned. Every AI classification decision is logged with model version, prompt hash, and confidence score. Quality gates block bad data before it reaches the warehouse. The entire pipeline is reproducible from a single Makefile command.
The full stack runs locally via Docker Compose and LocalStack, mirroring the production AWS deployment. Developers can test DAGs, quality gates, and classification logic without touching cloud infrastructure or incurring cost.
Compliance pipelines, document classification workflows, warehouse migrations, AI integration with audit trails. If your team needs data infrastructure that can survive a review, this is the architecture.
Discuss Your Use CaseMore Work
Workflow-driven data quality monitoring with checks for schema drift, null thresholds, freshness windows, and row-count anomalies. Quality gates without the overhead of a full observability platform.
If your team is building pipelines, AI workflows, or data platforms that must be reliable, traceable, and explainable — start here.
Get in Touch