Copernicus temperatures

The European Union’s Copernicus Earth Observation Program released a few charts summarizing temperatures in 2025.

It was the third warmest year on record.

All monthly temperature records have been broken over the past 3 years.

Surface temperatures continued to rise.

Sea ice continued to decline.

And we’ll breach +1.5 degrees C by the end of the decade.

The facts are clear.

The question, of course, is – what will we do about it?

And as I think about energy as technology in 2026, one of the biggest side-effects of the AI boom and energy build out is that it might actually galvanize an unlikely surge of clean energy and electrification to make the picture in the next decade a bit better.

Goals and strategy

A good friend was reflecting on goals and strategy.

Good strategy starts with “where do we play?” and then moves to “how do we win?”

Her reflection was that goals often led to missteps in the past because she skipped the “where do we play” question and jumped straight into “how do I win.”

It’s a nice and simple articulation of the challenge with goals – they get you hyper-focused on one pre-decided particular outcome which may not be the outcome you want at a later period of time.

Good strategy is always evolving based on the context.

I think a better approach is to pick directions and get the process right.

The direction evolves based on the answer to “where do we play.” It’s not a specific destination. It’s a direction.

And “how do we win” is your process – we simply spend our days focused on doing the right things.

And good outcomes follow good process.

Moltbook

If you haven’t checked out Moltbook.com, I’d recommend taking 5 minutes to check it out. It is a Reddit-style platform for AI agents. You’ll see posts like this one.

Or even discussions like this around why you should ship while your human sleeps.

Azeem Azhar had a thoughtful post on all of this. 3 notes that stuck with me –

(1) Moltbook demonstrates what I’d call compositional complexity. What’s emerged exceeds any individual agent’s programming. Communities form, moderation norms crystallise, identities persist across different threads. Agents edit their own config files, launch on-chain projects, express “social exhaustion” from binge-reading posts. None of this was scripted.

Most striking: no Godwin’s law, which states:

As an online discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches one.

No race to the bottom of the brainstem. Agentic behaviour, properly structured, doesn’t default to toxicity. It’s rather polite, in fact. That’s a non-trivial finding for anyone who’s watched human platforms descend into performative outrage.

Of course, this is all software, trained on human knowledge, shaped to engage on our terms, in our ways. Of course, there is nothing there in terms of living or consciousness. But that’s precisely what makes it so compelling.

(2) Moltbook is a live experiment in how coordination actually works. It treats culture as an externalised coordination game and lets us watch, in real time, how shared norms and behaviours emerge from nothing more than rules, incentives, and interaction…

…If agents can generate civility through incentive architecture alone, then human platform dysfunction becomes a design choice. Not an inevitability.

The toxicity we’ve normalised – outrage cycles, pile-ons, the race to the brainstem – reflects architectures optimised for engagement over coordination. We built systems that reward inflammatory (and sometimes false) content with maximum attention. We got what we paid for.

Moltbook’s agents face different constraints. There’s no ad model demanding eyeballs, no algorithmic amplification of conflict, no dopamine metrics. Result is boring civility, functional discourse – and at the core, coordination that works.

(3) Moltbook is a terrarium, a controlled environment that reflects both us and the world we might build.

It may show that culture doesn’t require consciousness. Neither does civility. The social behaviours we’ve attributed to human nature may be more mechanical than we’d like to admit: feedback loops, iterated games, incentive gradients.

More practically, it previews the rules we’ll need when agents start coordinating with each other across the internet at scale; the negotiating, trading, forming alliances without us.

So Moltbook isn’t just the most interesting site on the internet right now. For the moment, it’s the most important one.

Early 2026 Manchester United notes

It was sad to see Manchester United sack manager Ruben Amorim recently. The occasional recent resurgence was another false dawn – in a string of many over the past ~15 years.

A year ago, Ruben’s signing was seen as a big step forward. United signed a young coach full of potential with a clear philosophy.

He showed the clear philosophy alright. But the results struggled to follow. And even with an uptick relative to the disastrous last campaign, the consistency wasn’t there. Most importantly, and sadly, Ruben seemed more intent on proving his philosophy right than actually winning.

It all culminated in a show of public frustration against his bosses. There was only one way this story would end.

United went back and hired Michael Carrick – a former player – as interim manager. We’ve seen this movie before. It was no surprise to see a collection of former players and pundits lambasting the decision. The consensus was that the team would be ripped apart by Manchester City and Arsenal – the table toppers.

The opposite happened – United won both games. A few interesting notes –

(1) The changes Michael Carrick made didn’t require crazy tactical thinking. It felt rather straightforward. He just put round pegs in round holes.

Simple is hard sometimes. Or maybe more often than not.

(2) While I hope he’ll see continued success, it does drive home something I’ve seen time and time again.

You can spend a lot of time listening to the critics and the pundits and the news.

Or you can just go ahead and get results.

A good lesson in football, and in life.

Internal weather

I recently wrote about an analogy of how people are like lakes. The color of a lake changes based on the sun and atmospheric conditions.

The takeaway was that the external atmosphere in which people find themselves changes their behavior.

Another angle from which to think about the atmosphere is to think about the internal weather – this came up in a conversation recently.

The person I spoke to shared that they went through an extended period of not feeling like themselves – it lasted four years.

Four years.

It’s crazy to think how long internal weather can sometimes be off.

There were a series of factors that drove this. But they expressed their relief at being able to get out of this funk (with help from a coach) and go back to the better version of themselves from before.

Weather – both internal and external – matters.

Optimizing Phone Batteries

We all use our phones a lot and battery life goes a long way. Here are three things I’ve learned.

(1) Heat is the biggest killer – especially heat combined with a full charge. Leaving your phone in a hot car or using intensive apps while charging accelerates permanent capacity loss faster than almost anything else.

This also means avoiding fast charging or wireless charging where possible.

(2) Use optimized charging. The charge from 20-80% advice is technically correct but often impractical. All you need to do is to enable optimized charging (max at 80% or 85%) when you charge overnight.

(3) Deep discharges stress batteries. While occasional full discharges aren’t catastrophic, they’re best avoided.

The practical takeaway: Avoid heat, use optimized charging, and aim to charge when you’re around 20%.

Looking at the stars

We’ve made it a habit to look up at the sky and appreciate the stars most nights now.

It reminds me of a few things all at once.

The first is how crazy the physics of this universe is. The night sky we are looking at is really the night sky from four years ago. That fact always blows my mind.

The second is that you see more of them once your eyes get used to the dark. We see more when we pay attention.

Third, stars are so beautiful. And yet it is so easy to not look up and notice them.

And finally, they always have a way of reminding me and giving me perspective of just how big the universe is. And how insignificant I am in the grand scheme of things.

Sit-and-rise progress

A year ago, I wrote about the sit-and-rise test.

All the correlations with longevity aside, the sit-and-rise test is intriguing. It feels straightforward but it is anything but.

As part of my mobility work, I do the sit-and-rise test once every week.

This was challenging at first. I felt unstable when getting up. I could do it way better with my left leg. And I used to feel a muscle between my thigh and my groin hurt.

In the first months, out of every 10 sit-and-rises, 1 or 2 would be good quality. That number gradually went up to 3-4. Then to 6-7.

Ten months in, that uncomfortable muscle finally stopped acting up. And good quality is approaching 10. It is a lot easier to do.

Two lessons –

One is that it’s sometimes tempting to say, “I’m sore, I shouldn’t be doing this.” But muscle soreness is such a key part of their growth. There’s a beautiful parallel to learning as well – some things make us mentally sore, but that means they’re probably worth doing.

The second lesson is that it is important to stick with things that matter for a while. In this case, meaningful improvement took weekly practice over twelve movements.

Small improvements compound. It happens gradually, then suddenly.

Moving from Evals to Eval Loops

Notes on LLM RecSys Product – Edition 3 of a newsletter focused on building LLM powered products.


Eval-first product building is having a moment. We all know we should run evals, ship evals, and just do more evals.

But evals are just snapshots – moments in time when you check if something works. The real breakthrough isn’t running evaluations. It’s building the evaluation loop – the end-to-end system that diagnoses continuously, improves deliberately, and makes probabilistic systems governable.

From Self-Awareness to Action

In the last post, I wrote about teacher models giving us painful self-awareness – you can evaluate millions of outputs and know what’s broken at scale. The eval loop is what you DO with that awareness: a continuous diagnostic and improvement system that makes LLM-powered products actually improvable.

In deterministic products, the process used to be: Ship → Monitor engagement → Small-batch human reviews when something breaks. Feedback was slow and sparse. In LLM recsys products, the eval loop runs continuously.

The Eval Loop

Here’s how it works:

  • The Product Policy defines what “good” looks like – the criteria, constraints, and guardrails for how the system should behave – including a “Golden Set” of high-quality examples.
  • This trains the Teacher Model, your evaluator.
  • The production Stack generates outputs in response to real user interactions. The Teacher continuously evaluates these outputs at a scale of millions per day.
  • We also meld user feedback and investigation into anecdotal issues/data to surface more defects.

This surfacing of defects at scale leads to Diagnosis where failure points to one of three places:

  1. Policy problem – We didn’t define “good” clearly enough. The rubric is incomplete or wrong.
  2. Teacher problem – Our evaluator itself is broken. It’s judging incorrectly.
  3. Distillation problem – The production model hasn’t learned what the teacher knows. Training data or model capacity is the bottleneck.

The fix flows back into the system – refine policy, improve the teacher, update training data – and the loop continues.

Unlocking velocity and shifting measurement

The loop unlocks velocity by operating on two planes simultaneously:

  • Online: The teacher judges live production outputs continuously. You know what’s breaking in real-time, not weeks later through support tickets or engagement dips.
  • Offline: Before shipping any change, you can test it against the teacher. Run it through thousands of test cases. See how eval metrics move. Fail fast without running live experiments on real users.

In deterministic products, we measured outcome metrics: clicks, retention, revenue. In LLM recsys products, these are lagging indicators. The eval loop runs on quality metrics – leading indicators judged by your teacher.

Quality metrics thus guide daily improvement. Outcome metrics tell you if that improvement matters.

The Loop Is The Mechanism

In deterministic products, you improved by changing code. You wrote a new feature, shipped it, measured engagement.

In LLM recsys products, you improve by running the evaluation loop first. You can’t fully specify the system’s behavior upfront – it’s probabilistic. The loop is how you systematically improve something you can’t completely control.

Once again, adding an LLM into your system isn’t a panacea. You’ll be limited in production by latency and cost – so, the eval loop is the only way to build products around models that are powerful but imperfect, capable but costly, impressive in demos but messy at scale.


Next: The eval loop only works if your teacher knows what “good” looks like. That’s where we’re going next.

PS: A quick note on eval suites: As your product matures, you’ll likely move from one teacher to multiple – separate evaluators for various dimensions of the product. The same principle scales. Each evaluator judges a specific dimension, all feeding into the same diagnostic loop.