AI Reading for Saturday, September 14

Sep 14, 2024

We’re back!

The big news is OpenAI’s new Strawberry, or o1 announcement, natch.

After playing around with it, is is clearly much better on some complex prompts, like designing a system. And it can even count the Rs in strawberry, after thinking for a bit!

The claims for the final o1 model are attention-grabbing, the folks who fall victim to Eliza Effect are really going to fall hard for this.
~4x as expensive, and highly rate limited for now.
The output tokens charged include invisible 'reasoning tokens', so while it is thinking it iterates about its thought process in the context, but hides that output.
OpenAI doesn’t want you to see what it is doing under the hood, hence the hidden reasoning tokens. If you even *ask* about what it is doing, you may get an email saying you are violating OpenAI's terms of service and risking a ban. This probably won’t work, hackers will rise to the challenge. There are always adversarial prompts that will defeat it, and it's practically impossible to shut them out without severe negative impact on output quality.
In the web UI, unlike the API, you don't pay for tokens, but you are rate limited to 30 queries per week on the big model and 50 queries per week on the small model.
No tools yet, so it can't browse the web, can't upload docs, can't run the code it writes. Significant limitation. One gets the sense OpenAI is running on fumes and sucking air a little on execution, this feels rushed, still no Sora, limited beta release of Advanced Voice Assisstant, SearchGPT in limited and reportedly underwhelming beta release.
The OpenAI post says the next version, of which this is a preview, will be even better. It sounds a little like they trained it, nerfed it while productionizing it, didn't rerun benchmarks on the production preview.
More detailed OpenAI blog post
Examples of coding with it
OpenAI hype videos
Rate limits. Currently they are resetting limits more than once a week, possibly they started off draconian to fend off a hug of death on the servers.

Until now ChatGPT has been a poet, not a quant. It has that right-brained, creative, imaginative thinking, but not that left-brain deep understanding, although using tools and writing code helps. It has Kahneman's system 1 ‘thinking fast’ part down, but not the system 2 ‘thinking slow’, and analytical part. O1 is a move in that direction. Using it so far, it feels like a really good, auto-chain-of-thought prompt. Awaiting the ‘aha’ where I can see it has deep analytical world models.

I'm not sure how much it changes best practices for developing AI systems, which is to use AI as a copilot / assistant / duet , let the AI do what it's good at which is understanding/generating text, let the tools do what they are good at, let the human do what he or she is good at, which is using critical thinking and experience to check the AI and guide it.

Right now seems like a big deal but doesn’t change everything. It seems much better at specific tasks. It remains to be seen how significant those tasks are.

In these fast-changing times it may pay for IT managers to be a bit more of an explorer in the context of the exploration/exploitation tradeoff. But as always, the zebras who stay at the center of the herd eat trampled grass, the zebras at the edge of the herd risk getting eaten by lions.

More AI reading: