Projects — Archetype Core

Featured Case Study

Audit-Ready RAG System

A traceable, citation-grounded retrieval architecture tested against USCIS policy documents. The point is not immigration. The point is what trustworthy AI infrastructure looks like under pressure.

The Architecture

A RAG pipeline that ingests PDF documents, chunks them with respect to legal hierarchy, stores embeddings in pgvector, and retrieves relevant context to generate grounded answers.

What Makes It Auditable

Source citation on every answer, deterministic model settings, retrieval choices designed to reduce redundancy, and architecture decisions made for repeatability.

Why This Matters

If the system can hold up against dense policy manuals, it can be adapted for compliance documents, internal knowledge bases, policy libraries, and regulatory corpuses.

Architecture Decisions

pgvector over Pinecone: SQL-joinable, no new infrastructure dependency, lower cost, and the data stays in your environment.
MMR retrieval over plain similarity: Maximal Marginal Relevance reduces redundancy when retrieving from repetitive legal text.
Temperature 0: Deterministic outputs for compliance use cases where consistency matters more than creativity.
Recursive chunking: Respects nested legal document hierarchy instead of blindly splitting on token count.
Claude Sonnet 4.6 via AWS Bedrock: Strong performance on dense legal text with an enterprise-friendly deployment path.
Explicit configuration: Model settings, credentials, and data paths are made visible rather than assumed from ambient environment state.

AWS Bedrock Claude Sonnet 4.6 Titan Embeddings v2 PostgreSQL pgvector LangChain FastAPI Docker S3 Python

View Repo → Discuss Your Use Case →

Featured Case Study

ETL Architecture for AI Governance

A production-grade ETL platform for ingesting, classifying, and governing AI-processed data. Built with full local-to-cloud parity, immutable audit trails, AI-powered document classification, and cost-controlled infrastructure-as-code deployment.

The Architecture

An Airflow-orchestrated pipeline that ingests raw documents, classifies them using Claude on Amazon Bedrock, applies data quality gates via Great Expectations, and lands validated records in Databricks Delta Lake with full lineage tracking.

What Makes It Auditable

Every transformation is versioned. Every AI classification decision is logged with model version, prompt hash, and confidence score. Quality gates block bad data before it reaches the warehouse. The entire pipeline is reproducible from a single Makefile command.

Local-to-Cloud Parity

The full stack runs locally via Docker Compose and LocalStack, mirroring the production AWS deployment. Developers can test DAGs, quality gates, and classification logic without touching cloud infrastructure or incurring cost.

Architecture Decisions

Airflow over Step Functions: Full DAG visibility, Python-native operators, and rich observability. Airflow gives the team a single pane of glass for pipeline state, retries, and lineage.
Databricks Delta Lake over raw S3: ACID transactions, schema enforcement, time travel for audit queries, and a query layer that supports both batch analytics and ad-hoc investigation.
Claude on Bedrock for classification: Enterprise deployment path designed for AWS-based regulated environments, with controlled access, logging, and deployment boundaries.
Great Expectations for quality gates: Declarative data quality checks that run as Airflow tasks. Schema drift, null thresholds, and row-count anomalies are caught before data lands in the warehouse.
LocalStack for local AWS emulation: S3, SQS, and IAM run locally in Docker. Integration tests execute against the same API surface as production without cloud spend or credential risk.
Terraform with remote state: Versioned S3 state backend with DynamoDB locking. Every infrastructure change is tracked, reviewable, and reproducible across environments.
Makefile-driven workflows: One command to stand up the full stack, run tests, lint, or tear down. Reduces onboarding friction and eliminates undocumented tribal knowledge.
Strict code quality tooling: Ruff for linting, mypy for type checking, pytest for tests. Enforced in CI. No shortcuts on code that governs data lineage.

Apache Airflow 3.2 Amazon Bedrock Claude Sonnet Databricks Delta Lake PostgreSQL Great Expectations Terraform Docker Compose LocalStack Python

View Repo → Discuss Your Use Case →

Your data. The same rigor.

Compliance pipelines, document classification workflows, warehouse migrations, AI integration with audit trails. If your team needs data infrastructure that can survive a review, this is the architecture.

Discuss Your Use Case

More Work

Related systems.

Live

Automated Data Quality Monitoring

Workflow-driven data quality monitoring with checks for schema drift, null thresholds, freshness windows, and row-count anomalies. Quality gates without the overhead of a full observability platform.

n8n PostgreSQL Python SQL

Real systems, open code, documented decisions.

Audit-Ready RAG System

The Architecture

What Makes It Auditable

Why This Matters

Architecture Decisions

ETL Architecture for AI Governance

The Architecture

What Makes It Auditable

Local-to-Cloud Parity

Architecture Decisions

Your data. The same rigor.

Related systems.

Automated Data Quality Monitoring

Need auditable data infrastructure?