The AI Vibe — AI & Data Engineering Research

samkhya v1.0 ships an LLM-pluggable corrector backend for embedded analytical engines — DataFusion, DuckDB, Polars, Postgres, Iceberg, gpudb. Plug Claude, GPT-4o-mini, or local Ollama into the cardinality-estimation slot via a simple HTTP wire contract (Python FastAPI and Node TypeScript reference servers ship in the box). Every LLM output is clamped from above by a provable pessimistic ceiling (LpJoinBound — 40.95× tighter than the 2008 AGM bound) so the LLM can never make your plan worse than the engine's native estimate. Transport-floor latency measured at P95 0.07–0.11 ms; live-LLM end-to-end cells honestly marked PROJECTED pending API budget.

Latest Posts

View all

Data Engineering10 min read

Streaming OLAP: The Post-Kafka Stack for Real-Time Analytics

The Kafka + Flink + ClickHouse/Pinot/Druid stack we built between 2018 and 2024 is fragmenting into three forks: single-engine streaming SQL, table-format-as-stream, and OLAP databases that eat the streaming layer entirely. Kafka isn't dying — it's becoming plumbing.

Jun 13, 2026Read

Data Engineering6 min read

What I Learned Writing GPU Kernels for SQL Aggregates

Three months, two abandoned designs, one breakthrough. The one-paragraph version: Apple Silicon GPUs don't have 64-bit atomic_fetch_add until very recent OS versions, and that single missing instruction shapes every other architectural decision in a Metal SQL aggregate engine.

Jun 6, 2026Read

Data Engineering5 min read

Multi-Aggregate Fusion: One Read, Four Answers

Every analytical engine treats SELECT SUM(x), MIN(x), MAX(x), COUNT(x) FROM t as four passes of the column. Fuse them into a single kernel and the speedup ratio against four-pass code becomes 9x to 25x. Here's why the technique works, and the data shape where it doesn't.

May 30, 2026Read

Data Engineering6 min read

Apple Silicon's Unified Memory Is the Quiet Revolution in Analytical Compute

M3 Ultra ships 512 GB of memory at 819 GB/s, addressable by the GPU with zero PCIe transfer cost. Every GPU database project from the past decade was architected around the assumption that memory bandwidth came at PCIe-tax prices. That assumption is now wrong on a fifth of the developer laptops in the world.

May 23, 2026Read

Data Engineering16 min read

samkhya v1.0: Plug Claude, GPT-4o-mini, or Local Ollama Into Your SQL Query Optimizer

samkhya v1.0 ships an LLM-pluggable corrector backend for embedded analytical engines — DataFusion, DuckDB, Polars, Postgres, Iceberg, gpudb. Plug Claude, GPT-4o-mini, or local Ollama into the cardinality-estimation slot via a simple HTTP wire contract (Python FastAPI and Node TypeScript reference servers ship in the box). Every LLM output is clamped from above by a provable pessimistic ceiling (LpJoinBound — 40.95× tighter than the 2008 AGM bound) so the LLM can never make your plan worse than the engine's native estimate. Transport-floor latency measured at P95 0.07–0.11 ms; live-LLM end-to-end cells honestly marked PROJECTED pending API budget.

May 17, 2026Read

Data Engineering27 min read

Why I built a GPU SQL engine in 2026 — when every other one died

Every standalone GPU database built between 2013 and 2024 was acqui-hired or pivoted. So why ship gpudb in 2026? Because nobody had wired Apple Silicon's unified memory into a SQL engine — and DuckDB hands you a hundred-thousand-user distribution channel without writing a database from scratch.

May 9, 2026Read

Stay Curious

Exploring the frontiers of AI, data, and technology. New research and insights published regularly.

About the Author

Research, Build,Story

AI & Machine Learning

Data Engineering

Stories

From The First Mind

The Hackathon

Words Become Numbers

The Map of Meaning

Friends in High Dimensions

The First Spark

The Stack of Minds

Featured

samkhya v1.0: Plug Claude, GPT-4o-mini, or Local Ollama Into Your SQL Query Optimizer

Latest Posts

Streaming OLAP: The Post-Kafka Stack for Real-Time Analytics

What I Learned Writing GPU Kernels for SQL Aggregates

Multi-Aggregate Fusion: One Read, Four Answers

Apple Silicon's Unified Memory Is the Quiet Revolution in Analytical Compute

samkhya v1.0: Plug Claude, GPT-4o-mini, or Local Ollama Into Your SQL Query Optimizer

Why I built a GPU SQL engine in 2026 — when every other one died

Stay Curious

Research, Build,
Story