Josef Chen says he compressed all of human cooking into two megabytes. That’s a bold claim. It also checks out.

Chen, co-founder and CEO of London food AI startup KAIKAKU.AI, published a paper on arXiv this week alongside researcher Jakub Radzikowski, presenting Epicure — three AI models trained on 4.14 million recipes pulled from 11 datasets across seven languages. The result: a map of 1,790 ingredients, each described by 300 numbers, that fits within a standard email attachment limit with room to spare.

“4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions,” Chen wrote on X. “All of human cooking compressed into 2 megabytes.”

It’s not storing recipes

Before you imagine a two-megabyte USB stick jammed with stir-fry instructions, the model doesn’t store a single recipe. The two megabytes is more a coordinate table than a cookbook.

Think of it as a map. Every ingredient gets a precise location based on how it behaves across millions of real dishes worldwide. The math is straightforward: 1,790 ingredients × 300 numbers per ingredient × 4 bytes each ≈ 2.05 megabytes. Those numbers encode which ingredients appear together, which share flavor compounds, and which belong to the same culinary tradition. Once the model learns all that from the recipes, the recipes themselves can go. The knowledge lives in the coordinates.

This is essentially the same approach word2vec applied to language in 2013, when researchers showed that meaning could be encoded through mathematical relationships. Epicure does that for food. Take beef, orient it toward American cuisine and you get bread, lettuce, maybe beer. Orient it toward Southeast Asia and the model shifts toward soy sauce, ginger, and sesame oil.

This happens through what the paper describes as a steering operator called SLERP rotation. Take a seed ingredient — chicken — and rotate it mathematically toward a cuisine direction. At 30 degrees you start seeing Tex-Mex territory. At 60 degrees, chicken and beef converge on the same Mexican pantry: corn tortilla, salsa, monterey jack, poblano pepper. The angle acts as a dial between “stay near this ingredient” and “land somewhere new.”

Three models, three questions

Epicure comes in three versions. Cooc learns from recipe co-occurrence — what shows up together in real dishes. Chem learns from flavor chemistry — which ingredients share aroma compounds sourced from the FlavorDB chemical database. Core is a blend of both.

Ask Cooc what pairs with chocolate and you may get dessert-pantry companions: cocoa powder, vanilla, almond. Ask Chem and you get flavor-chemistry peers: toffee, fudge, ganache.

Same ingredient, different question. A chef looking for a substitute has different needs than a chef mapping flavor compatibility.


Source: KAIKAKU.AI Trained an Ingredient Model on 4M Recipes — It Fits in 2MB