We’ve made it a habit to look up at the sky and appreciate the stars most nights now.
It reminds me of a few things all at once.
The first is how crazy the physics of this universe is. The night sky we are looking at is really the night sky from four years ago. That fact always blows my mind.
The second is that you see more of them once your eyes get used to the dark. We see more when we pay attention.
Third, stars are so beautiful. And yet it is so easy to not look up and notice them.
And finally, they always have a way of reminding me and giving me perspective of just how big the universe is. And how insignificant I am in the grand scheme of things.
All the correlations with longevity aside, the sit-and-rise test is intriguing. It feels straightforward but it is anything but.
As part of my mobility work, I do the sit-and-rise test once every week.
This was challenging at first. I felt unstable when getting up. I could do it way better with my left leg. And I used to feel a muscle between my thigh and my groin hurt.
In the first months, out of every 10 sit-and-rises, 1 or 2 would be good quality. That number gradually went up to 3-4. Then to 6-7.
Ten months in, that uncomfortable muscle finally stopped acting up. And good quality is approaching 10. It is a lot easier to do.
Two lessons –
One is that it’s sometimes tempting to say, “I’m sore, I shouldn’t be doing this.” But muscle soreness is such a key part of their growth. There’s a beautiful parallel to learning as well – some things make us mentally sore, but that means they’re probably worth doing.
The second lesson is that it is important to stick with things that matter for a while. In this case, meaningful improvement took weekly practice over twelve movements.
Small improvements compound. It happens gradually, then suddenly.
Notes on LLM RecSys Product – Edition 3 of a newsletter focused on building LLM powered products.
Eval-first product building is having a moment. We all know we should run evals, ship evals, and just do more evals.
But evals are just snapshots – moments in time when you check if something works. The real breakthrough isn’t running evaluations. It’s building the evaluation loop – the end-to-end system that diagnoses continuously, improves deliberately, and makes probabilistic systems governable.
From Self-Awareness to Action
In the last post, I wrote about teacher models giving us painful self-awareness – you can evaluate millions of outputs and know what’s broken at scale. The eval loop is what you DO with that awareness: a continuous diagnostic and improvement system that makes LLM-powered products actually improvable.
In deterministic products, the process used to be: Ship → Monitor engagement → Small-batch human reviews when something breaks. Feedback was slow and sparse. In LLM recsys products, the eval loop runs continuously.
The Eval Loop
Here’s how it works:
The Product Policy defines what “good” looks like – the criteria, constraints, and guardrails for how the system should behave – including a “Golden Set” of high-quality examples.
This trains the Teacher Model, your evaluator.
The production Stack generates outputs in response to real user interactions. The Teacher continuously evaluates these outputs at a scale of millions per day.
We also meld user feedback and investigation into anecdotal issues/data to surface more defects.
This surfacing of defects at scale leads to Diagnosis where failure points to one of three places:
Policy problem – We didn’t define “good” clearly enough. The rubric is incomplete or wrong.
Teacher problem – Our evaluator itself is broken. It’s judging incorrectly.
Distillation problem – The production model hasn’t learned what the teacher knows. Training data or model capacity is the bottleneck.
The fix flows back into the system – refine policy, improve the teacher, update training data – and the loop continues.
Unlocking velocity and shifting measurement
The loop unlocks velocity by operating on two planes simultaneously:
Online: The teacher judges live production outputs continuously. You know what’s breaking in real-time, not weeks later through support tickets or engagement dips.
Offline: Before shipping any change, you can test it against the teacher. Run it through thousands of test cases. See how eval metrics move. Fail fast without running live experiments on real users.
In deterministic products, we measured outcome metrics: clicks, retention, revenue. In LLM recsys products, these are lagging indicators. The eval loop runs on quality metrics – leading indicators judged by your teacher.
Quality metrics thus guide daily improvement. Outcome metrics tell you if that improvement matters.
The Loop Is The Mechanism
In deterministic products, you improved by changing code. You wrote a new feature, shipped it, measured engagement.
In LLM recsys products, you improve by running the evaluation loop first. You can’t fully specify the system’s behavior upfront – it’s probabilistic. The loop is how you systematically improve something you can’t completely control.
Once again, adding an LLM into your system isn’t a panacea. You’ll be limited in production by latency and cost – so, the eval loop is the only way to build products around models that are powerful but imperfect, capable but costly, impressive in demos but messy at scale.
Next: The eval loop only works if your teacher knows what “good” looks like. That’s where we’re going next.
PS: A quick note on eval suites: As your product matures, you’ll likely move from one teacher to multiple – separate evaluators for various dimensions of the product. The same principle scales. Each evaluator judges a specific dimension, all feeding into the same diagnostic loop.
We had the incredible privilege of visiting New Zealand recently. We spent most our time in National Parks – so you’ll see posts on those parks (and perhaps a couple of other experiences) coming soon to a blog near you. That aside, I had a few reflections –
(1) New Zealand is among the most beautiful countries we’ve had the privilege to visit. The landscapes are incredible – both in their diversity and beauty. Once we set this aside, there were a few other things I noticed that stayed with me.
(2) The AirBnB experience in New Zealand felt closer to what the AirBnB founders intended. Owners seemed personally invested – a couple of them even came over to say hello. They’re far from the hotel-like experiences we see so often these days in the US.
(3) I appreciated that New Zealanders take a very pragmatic approach to road signs and road rules.
For example, despite most highways being single lane, they all had a speed limit of 100 km/h (roughly 65 miles/h).
You’d be lucky to see that speed limit on a 6 lane highway in the US.
I loved the 100 km/h speed limit because it felt reasonable. So, it felt like a limit that everyone tried to abide by.
Reasonable is a word I found myself associating with New Zealand’s rules.
(3) I loved the high crash area sign. It felt data-driven. Once again, reasonable.
(4) I was struck by how strong the influence of Māori culture is on New Zealand. It is unlike any country where settlers took over the lands of native people. It felt like they took a very thoughtful and, wait for it, reasonable approach.
(5) New Zealand’s heroes also have characteristically reasonable traits. I loved how Sir Edmund Hillary ensured he shared the credit for his climbing of Mount Everest with his sherpa, Tenzing Norgay.
He went on to care deeply about helping the local Sherpa communities – even after tragically losing his wife and daughter in a plane crash in Nepal.
(6) I loved a story about Sir Peter Jackson, another local hero. He was filming a scene where Frodo sees the Shire being burnt. That scene was originally going to involve just a burning building. However, while the scene was supposed to happen at 9pm, it got delayed to 4am and the firefighter crew who were on duty (in case something happened) were understandably tired.
So, Sir Peter Jackson just had them get dressed as orcs and shot the scene with them in it. Once again, a reasonable decision in the circumstances.
More to come on New Zealand in the coming weeks – suffice to say, I hope to have the good fortune to go back.
(1) One of my favorite Exponential View articles is about how energy has gone from commodity to technology. This table tells the story beautifully.
(2) The cost trend is clear – batteries and solar cost curve continue to make their way downward.
This is the case on every front.
(3) We’re already beginning to see the consequences of a world with abundant energy. Australian households in 3 states are getting 3 hours of free solar generated power this summer.
(4) This means the no brainer strategy here is electrification. And the country that has most gotten that memo is China.
From the article – Today, China produces 75% of lithium-ion batteries globally and manufactures 90% of the neodymium magnets that make motors spin. That means that China controls the means of producing electric vehicles (EVs), drones, robots, and all of the other electric products that are replacing the combustion-driven machines on which America built its might.
Everything is going to be electric.
(5) China is electrifying so rapidly that it has passed Europe in per capita consumption.
But, despite this rapid increase, CO2 emissions from China are flat or declining because new capacity is all renewable.
The bulk of the addition is solar.
Every year, they increase their investment in renewable technologies
(6) China’s investments are so rapid that it is on track to be 100% renewable by 2051. At current rate, the US will get there in 2148.
In December alone, China installed more battery capacity than the US had in all of 2025.
(7) All of transportation will be electric before we know it. EV markets are accelerating quickly.
(8) Meanwhile, some clean energy projects in California can take up to 20 years.
Private energy markets in Texas, meanwhile, are leading the way. So much so that solar has already overtaken coal on the Texas grid.
The trend lines are obvious – the future is electric and renewable. Energy is now technology.
Any economy that wastes time attempting to prop up fossil fuels and attempt to keep the status quo is going to left behind.
William Meijer shared an insightful post – An extreme commitment to the truth makes relationships acutely dysfunctional but systems chronically functional (think Elon Musk).
An extreme commitment to kindness makes relationships acutely functional but systems chronically dysfunctional (think Sweden, UK)
I saw excerpts from two papers from the National Bureau of Economic Research recently –
One of them was how about the costs of climate change being borne right now. In this paper, we examine several major vectors through which climate change affects US households, including cost increases associated with home insurance claims and increased cooling, as well as sources of increased mortality. Although we consider only a subset of climate costs over recent decades, we find an aggregate annual cost averaging between $400 and $900 per household; in 10 percent of counties, costs exceed $1,300 per household. Costs vary significantly by geography, with the largest costs occurring in some western regions of the United States, the Gulf Coast, and Florida.
Another was about the economic impact of Brexit This paper examines the impact of the UK’s decision to leave the European Union (Brexit) in 2016. Using almost a decade of data since the referendum, we combine simulations based on macro data with estimates derived from micro data collected through our Decision Maker Panel survey. These estimates suggest that by 2025, Brexit had reduced UK GDP by 6% to 8%, with the impact accumulating gradually over time.
As with any analysis that involves so many variables, the trends matter more than the absolute numbers.
Reading the summary of both papers made me think about how complex systems change. We can debate interpretations endlessly, but the underlying forces continue to operate regardless of our opinions.
When I was in my twenties I would hitchhike to work every day. I’d walk down three blocks to Route 22 in New Jersey, stick out my thumb and wait for a ride to work. Someone always picked me up. I had to punch-in for my job as a packer at a warehouse at 8 o’clock sharp, and I can’t remember ever being late. It never ceased to amaze me even then, that the kindness of strangers could be so dependable. Each day I counted on the service of ordinary commuters who had lives full of their own worries, and yet without fail, at least one of them would do something kind, as if on schedule. As I stood there with my thumb outstretched, the question in my mind was simply: “How will the miracle happen today?”
He goes on to share wonderful experiences from his travel and experience-rich life, many of which involved incredible acts of kindness from strangers. He explains how gratitude and faith have become synonymous to him over time.
I’ve slowly changed my mind about spiritual faith. I once thought it was chiefly about believing in an unseen reality; that it had a lot in common with hope. But after many years of examining the lives of the people whose spiritual character I most respect, I’ve come to see that their faith rests on gratitude, rather than hope. The beings I admire exude a sense of knowing they are indebted, of resting upon a state thankfulness. They recognize they are at the receiving end of an ongoing lucky ticket called being alive.
And he ends with a call to be more open to kindness, commit to gratitude and embrace pronoia.
My new age friends call that state of being pronoia, the opposite of paranoia. Instead of believing everyone is out to get you, you believe everyone is out to help you. Strangers are working behind your back to keep you going, prop you up, and get you on your path. The story of your life becomes one huge elaborate conspiracy to lift you up. But to be helped you have to join the conspiracy yourself; you have to accept the gifts.
Although we don’t deserve it, and have done nothing to merit it, we have been offered a glorious ride on this planet, if only we accept it. To receive the gift requires the same humble position a hitchhiker gets into when he stands shivering on the side of the empty highway, cardboard sign flapping in the cold wind, and says, “How will the miracle happen today?”
A good friend once went through an experience where a former manager and mentor of theirs reached out with an opportunity.
After going through the process, they went back and forth and eventually, it didn’t pan out.
However, that wasn’t the part of the story that stayed with me. It was the grace with which the person responded.
They reinforced how important this friend was to them. They wished they’d have reached out at a different time when the timing might have worked. And they ended with a note saying they’d always be in their corner.
It is easy to show grace when things go your way.
The real test of character is how you respond when they don’t.