The Bitter Truth About The Bitter Lesson

This piece is a response to The Bitter Lesson by Rich Sutton. To summarize that piece, using it’s own words:

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.

You’re trying to build a program that wins chess games. You spend years diligently learning the intricacies of the game, reading chess books, learning openings and endgames, and put all that hard work into your program, and along comes DeepMind with a bunch of GPUs and they just crush you.

Lesson learned.

General methods + compute > human knowledge.

But…

Moore’s Law has a cost.

Is this just the same mentality that got us JavaScript frontends that download the entire internet just to display basic CRUD app data?

What’s the cost of all those servers to run your neural networks?

I’ve heard figures ranging form $200,000 to $700,000 per DAY for ChatGPT server costs.

But it’s chill right? Wait 18 months and it’ll go down by half. Somehow.

Sure, there might be some “externalities”, like massive amounts of electronic waste. But you’re single-mindedly optimizing for the most effective solution to a given problem, externalities be damned.

Sure, embedding an LLM in your new app might be a little slow today, but give it another two generations of iPhones and it’ll be just fine.

Er… unless you decide to embed a bigger model at some point in the future that is.

Sometimes it feels like software gets slower faster than hardware gets faster.

Postscripts

Cunningham’s Law may be applicable here. If there is something wrong with my reasoning, please tell me what, you can email me at apchenet@gmail.com

if the links are broken, I’ve made archival copies of the original sources on my personal site

The Bitter Lesson

The Slow Winter

🏡