AI Reading for Tuesday February 25

Feb 25, 2025

Milestone timeline showing Claude progressing from assistant to pioneer

Claude 3.7, the latest model from Anthropic, can be instructed to engage in a specific level of reasoning (in token budget) to solve hard problems. - WIRED

Claude 3.7 Sonnet and Claude Code - Anthropic

Extended thinking - Anthropic

All eyes on GPT-4.5 as both Gemini 2.0 and Sonnet 3.7 match up with GPT-4o pretty well. - Ars Technica

A bit on the expensive side ... Sonnet feels great to me, sounds more college-educated, scientific, and neurotic vs less-educated, commercial, direct OpenAI. When I upload a paper and ask questions about it, Sonnet seems best vs. more simplistic OpenAI, and Gemini sometimes goes off the rails. But despite leading on artifacts, protocols and other stuff, Sonnet has been punching below its weight in benchmarks, even though to me it is top tier with OpenAI and Google (and maybe now R1 and Grok 3). Also OpenAI is way ahead on product, advanced voice mode, web search, agentic stuff etc.

Not 100% clear if Sonnet 3.7 catches up with o3-mini and DeepSeek R1 but seems promising (Gemini flash thinking is OK but maybe not as good as those 2) - One Useful Thing

Anthropic raising $3.5b at $61.5b valuation - Bloomberg

When reasoning models expose their chain of thought, they leave a detailed breadcrumb trail for how to jailbreak them. - The Register

This generation of deeper reasoning, lower latency, advanced voice mode with vision, lower cost, is leading up to the Star Trek computer assistant you can talk to all day to help you, look at stuff, show you stuff, and an explosion of agentic products. All the pieces are there now. - TechRadar

H20 orders jump as Chinese firms want to deploy DeepSeek at scale. Nvidia earnings tomorrow should be lit. - Reuters

Trump wants tighter AI chip export restrictions but fired all the Feds who know anything about designing and implementing chip policy - Tom's Hardware

Huawei and SMIC said to be getting better at manufacturing AI chips in China - FT

Apple to open AI server factory in Texas as part of '$500 billion' U.S. investment - CNBC

Google says its AlphaGeometry2 AI now 'better than human gold medalists' at solving International Math Olympiad geometry problems. - livescience.com

"ChatGPT Saved My Life (No, Seriously, I’m Writing this from the ER)" - Hard Mode First

How Is Your Team Spending the Time Saved by Gen AI? - Harvard Business Review

Adobe ships much-improved mobile Photoshop including Firefly AI features - The Verge

"You killed a forest for this" - The Daily Dot

Federal workers will have to justify their continued employment to an AI on a weekly basis, or something - NBC News

Giant mining dumptrucks get self-driving capability - Bloomberg

OpenAI engineer takes potshots at Musk and xAI publicly when they try to recruit him (while simultaneously throwing another recent recruit under the bus with a shot at OpenAI) - Futurism

You can never be too skeptical with this stuff - Pivot to AI

Well, BBC was a bit hype about this story, this story is hating on it, truth is, time will tell if this stuff really helps scientists and develops good hypotheses, or it just got lucky or had something similar in training data.

Perplexity wants to reinvent the web browser with AI—but there’s fierce competition - Ars Technica

Early gen AI roadkill Chegg sues Google for hurting traffic with AI as it considers strategic alternatives - CNBC

Grok gives detailed instructions on how to make WMDs - Xitter

Debate rages over whether Grok 3 really beat OpenAI's models at math - TechCrunch

Feds now have to justify their continued employment weekly to a Grok AI prompt. - NBC

Seems designed to make everyone quit who is able to find work in private sector (where they get higher pay and better working conditions).

AI deepfake plays at HUD - New York Post

Follow the latest AI headlines via SkynetAndChill.com on Bluesky

AI Reading for Tuesday February 25

Discussion about this post