The Full Data Stack 12
Agentic Data Modelling, the python GIL is gone and what a Data Strategy really is
Hey y’all, I’m Hoyt and thanks for reading The Full Data Stack!
Each week, I share thoughts and findings in the world of Data and AI. I span the entire data stack and keep you up to date with the shifting landscape.
Our community has been growing because people like you are sharing and restacking this weekly newsletter. It is our lifeblood!
Not Subscribed? Come join! ⊂(◉‿◉)つ
AI and Models
Jenny Ouyang notes a really handy breakdown of different prompting strategies. I think every prompt still needs to be unique to the request but categorizing prompt types is a really good idea.
Stanford dropped some online courses so you can go learn the math behind transformers (and subsequently LLM’s). Have fun with that one! I’ll be over here trying to build stuff instead 😂
Anthropic released Haiku 4.5, which is the cheaper model compared to Claude 4.5. If you are using Anthropic models directly, or through the API then you need to be mindful about what you use and when. That’s because you are paying for it without any consistent cost like Cursor offers. The article makes a really good point about how you would use Claude 4.5 to build out a larger overarching plan on how to build something, then use multi agents with Claude Haiku to go out and execute the plan. Claude (expensive) does the initial high level thinking, then Haiku (cheaper) goes and does the grunt work.
Data contracts for LLM’s (structured outputs) is really important if you are going to build a reliable AI integrated system. The future of Data + AI isn’t letting it do whatever it wants, but to tie it down until it is almost deterministic. Paul Iusztin writes a banger about the topic.
Anthropic released a new feature just for Claude tool users called “Claude Skills”. But honestly if you want to learn about it read Simon Willison’s article about it here. I always keep reference.md and notes.md in my projects now which I use as reference when prompting in Cursor. But Anthropic has taken that idea and put it on steroids. If Karo (Product with Attitude) thinks it’s amazing then all you vibe coders out there using Claude had better recognize it!
I found a great Substack called Gradient Flow (dope name) by Ben Lorica 罗瑞卡. Ben co-authored a recent article with Bauplan’s Ciro Greco giving once a again a great breakdown of the composable, agentic future of Data. MUST READ!
Engines and Libraries
OMFG THEY REMOVED THE GIL IN PYTHON!!!!!! ٩(๑❛ᴗ❛๑)۶
To add to the myriad of things you can do with DuckDB, you can also stream data into DuckDB. This quick and useful post from DuckDB’s blog breaks down three different architectures for DuckDB streaming options.
I published a Youtube video as a companion video to my Substack article about DuckLake. The video gives more colour and uses my Substack as a guide.
Data Engineering
Let’s all give a round of appluase to Simon Späti for putting together this comprehensive piece on what you need for Agentic Data Modelling. It’s all happening! (っ◕‿◕)っ
It’s also good to just sit and think for a moment about what exactly a data model even really is. Juha Korpela wrote a concise thought experiment about what they are and also the levels at which they exist. Really liked it.
If you want to become a Staff Engineer (still no clue what that really means) then you NEED to embrace office politics. Byte-Sized Design offers a rare free full article showing you how to do it. Now go conquer your South Bay Engineering org!
Love to see Andrew Jones here on Substack cheering for Data Contracts. Data Contracts are sort of my favorite subject in Data. Because if you can actually get them implemented you end up with the Data Panacea we were always promised. And I’m not lying about this cause I did it myself before.
Data Platforms and Business
I wasn’t really excited about new UI updates that rolled out for BigQuery. I guess my LinkedIn post hit a nerve because the GTM leadership team for BigQuery got ahold of me and I’m having a session with their UX team to give feedback. The power of social media!
I fully agree with Nick Valiotti here that if you can’t fit your Data Strategy onto a napkin then you probably don’t have one. A strategy is a set of decisions. Building pipelines and Dashboards is called being tactical. Data Teams get this confused ALL THE TIME. Thankfully we have Nick to explain it to the masses.
Philip Su reminds us that great ideas aren’t as hard as taking action and executing well. Every has ideas, very few successfully execute them.








