New Post 2-19-2025

Top Story

The rise of “reasoning” AI models

As we have reported previously, the CEOs and top researchers at leading AI companies have lately been exhibiting some combination of serenity, or swagger, or giddiness, with hints that AGI (Artificial General Intelligence, or human level performance across a broad swath of intellectual tasks) is just around the corner. As though they have seen the promised land, and just need to tidy up a few more loose ends to get there. Two weeks ago OpenAI released a paper that may give hints as to why they seem so confident. Superficially, the paper is about how they are improving the ability of their AI models to produce computer code. The first shocker is how far, and how fast they are improving that ability. It turns out that coding is a competitive sport, and the website Codeforces hosts frequent competitions with competitors ranked like chess players. Over the past several months, OpenAI’s best models have been climbing the ranks, and recently became the 175th highest-ranked programmer in the world, better than 99.8% of human players. CEO Altman has stated publicly that they have an unreleased model that is now the 50th highest ranked programmer, and by the end of the year, they expect one of their models to be number one. The secret to this rapid advancement, as laid out in the paper, is basically 2-fold: 1) making AI models think step by step and check their work, and 2) playing against opponents and themselves in a process known as Reinforcement Learning, or RL. RL is how Google’s AlphaGo learned to beat the human world champion at the board game Go. The point is, almost every intellectual task that produces a checkable correct (or more correct, or even less wrong) outcome should be amenable to this same refinement process. So does Reasoning + Reinforcement Learning + gobs of task-time compute = AGI? I expect that the answer may not be long in coming.

OpenAI’s journey to producing an AI that scores in the 99th percentile of human coders.

Clash of the Titans

Perplexity unveils its own Deep Research product for free

Scrappy AI search startup Perplexity is thumbing its nose at OpenAI, whose shockingly good Deep Research analysis model has senior managers of large companies rethinking their job descriptions, by rolling out their own product with the exact same name, and for free. Like the OpenAI product, Perplexity’s Deep Research will use a “reasoning” AI model to craft a search of the web for sources, and then write a comprehensive summary of the findings. Adding reasoning seems to be the magic, because the summaries from Perplexity’s Deep Research are more detailed and nuanced than the results of their standard AI searches. OpenAI’s Deep Research uses a more powerful reasoning engine and more computation time than Perplexity’s product, and so it is to be expected that the quality of the result will be better. But Perplexity’s product has scored well against all other competitors on AI benchmarks, and “free” is everyone’s favorite price.

AI startup Perplexity announces its new Deep Research product with this perplexing graphic.

CEO Sam Altman tweets OpenAI’s future product roadmap

OpenAI still dominates the AI landscape, but their plethora of poorly-differentiated models and their obscure naming conventions make their offerings increasingly opaque to the average user. CEO Altman has recognized the problem, and in a recent tweet, vowed to fix it. But not just yet. Soon there will be a release of what will be called ChatGPT 4.5, an upgrade from the current ChatGPT 4o, and what is promised to be the last “non-reasoning” model. Later this year, ChatGPT 5 will be released as a system that has all of OpenAI’s various model capabilities under the hood, hidden behind a unified interface that is smart enough to know which model to pick to perform the user’s request. Users in the free tier will have access to all the tools, but with compute-time intelligence throttled down. Users at the “Plus” tier (currently $20/month) will get more think time per query, and so more intelligence. And “Pro” users (currently $200/month) will get way, way more compute time on their queries.

Sam Altman gazes into the future.

Nvidia’s total installed computer power is doubling every 10 months

Leading AI chip-maker Nvidia is selling so many chips so fast, and increasing the computing power of each new generation so swiftly, that the total computing power of Nvidia’s installed base is doubling every 10 months (see chart.)

Nvidia is selling ever more chips that are ever more powerful, so its total installed compute doubles every 10 months.

Fun News

Ageless Innovation makes Robo-pets for the elderly

Older adults can get lonely, but cognitive and/or physical limitations may make it difficult for them to take care of a pet companion. Since 2018, Ageless Innovation, a spinout from toymaker Hasbro, has been making animatronic pets as companions for such seniors. They cite research from AARP and other trusted sources that show robotic pets have a measurable positive impact on the quality of life of the recipients. To date, almost all of Ageless Innovation’s robot pets have been distributed under various state programs for the elderly, the Veterans Administation, or hospice programs, but their products are available to anyone at their Joy for All website.

Ageless Innovation’s robotic pet cat meows, purrs ,and responds to touch and voice.

NBA’s AI basketball robot is Steph Curry’s new shooting partner

National Basketball Association Commissioner Adam Silver recently announced 3 new AI robots to help players on and off the court. The 3 robots are being piloted with the Golden State Warriors, and the team’s legendary shooter Steph Curry has been paired with the Automated Basketball Engine, or ABE, which catches rebounds and passes the ball back while Curry is practicing his shots. The Warriors are also trying out MIMIC, a robot that acts a practice dummy for offensive and defensive plays, and KIT, a morale boosting robot that can converse with the players and deliver context-appropriate motivational messages.

Steph Curry with his robot companion, ABE, who catches rebounds and passes the ball back.

Replit makes non-techy Zillow employees into app-makers

AI is getting scary good at computer programming, and now AI coding app Replit is striving to make even non-techies able to create useful computer applications by just typing what they want in plain English. Replit has partnered with AI startup Anthropic (the Pepsi to OpenAI’s Coke), whose AI chatbot Claude had generally been acknowledged as the best coding AI until OpenAI’s o1 and o3 models debuted. Replit has also eased deployment of apps on the web by partnering with Google Cloud, so that Replit can offer a soup-to-nuts, front-end to back-end, idea-to working-webapp all-in-one package.

As an example of what is possible, real estate giant Zillow now has non-technical employees designing, building, and deploying apps that are now used every day in the operations of the company. Employees at all levels are encouraged to make apps for problems that they face in their jobs, and by describing to Replit’s AI what they want the app to do to, they can have Replit turn that description into working code. This allows employees who understand their business problems very well to create working tools without having to know how to code. The barrier to custom software tools for everyone is falling fast.

A tweet from a Zillow employee celebrates being an app-maker as an English major.

AI designs world’s lightest and strongest nanomaterial

An international team of researchers, led by the University of Toronto, have developed an AI model that can design nanomaterials with specific mechanical properties, and have used it to help create a nearly-indestructible lightweight carbon lattice that can support over a million times its own mass. The new material is described as “as strong as carbon steels, but with the density of Styrofoam.” Applications are seen in medicine, such as in prosthetics and implants that need to be strong but light, or even in vehicle construction where reducing the weight of solid components now made of steel could save a significant amount of fuel over the vehicle’s lifespan.

An AI model designed a carbon lattice that is both lightweight and as strong as steel.

Robots

Global market for humanoid robots could be $38+ billion by 2035

Robots are on fire these days. (Metaphorically, of course - except for the ones that are literally on fire like the firefighting robots, flamethrowing dogbots, drone-fighting warbots, and others we have reported on previously.) Humanoid robots are currently having “A Moment” because they are starting to get so good so quickly that the big money people are starting to smell Big Money. (See next story.) Investment banking colossus Goldman Sachs projected a global market for humanoid robots of $38 billion a year ago, and this is still the most-cited statistic. More recent estimates from knowledgeable but less august sources range from $80 billion to $300 billion-plus. Little wonder that dollars are raining down on top robotics companies like Figure AI, featured in the article below.

Goldman projects rapid growth in the humanoid robot market.

Robotics startup Figure raising $1.5 billion at 15x last year’s valuation

Silicon Valley robotics startup Figure AI is in talks to raise an eye-popping $1.5 billion at a valuation of $39.5 billion, which is 15 times the value assigned to the company when it raised $675 million just last year. Figure is already generating revenue from 2 major customers, and CEO Brett Adcock is a master at creating and sustaining buzz around his company, including using his personal social media accounts to break news of new milestones achieved. Figure appears to be leading the pack of US-based humanoid robot producers, but China has multiple companies whose robots could give Figure a run for its money.

Figure’s flagship Figure 02 humanoid robot is already making cars in a US BMW assembly plant.

AI in Medicine

AI excels at catching arrhythmias in ambulatory EKG monitoring

Ambulatory EKGs (e.g. Holter monitors), in which patients wear a device that records the electrical activity of their hearts over a period of days or weeks, are increasingly used to detect episodic heart arrhythmias such as atrial fibrillation. The tracings of these vast numbers of recorded heartbeats (upwards of 100,000 beats per day) are currently reviewed by technicians, who identify potentially problematic rhythms for review by a physician. This manual method is time-consuming, costly, and prone to error from human fatigue at reviewing so much repetitive data. An international team of researchers have developed an AI model, affectionately known as DRAI MARTINI, to automatically review the tracings and flag the suspicious beats. The AI model was significantly better at identifying critical arrhythmias, finding 98.6% of them, as compared to 80.3% found by human technicians, with only a modest increase in false positives.

AI model had an average of 14 times fewer missed arrhythmias compared to human technicians.

Patients prefer chatbots to therapists

A team of researchers from Utah, the Massachusetts General Hospital, and elsewhere tested whether a random sample of ordinary citizens could distinguish therapeutic responses generated by ChatGPT 4.0 from those written by mental health professionals. They could not. The participants also rated the responses of the chatbot more highly than those of the professionals on therapeutically relevant dimensions. The authors foresee a future in which online AI models can significantly expand the availability of psychotherapy to the populace.

A sample prompt to ChatGPT 4.0 to provide a therapeutic response to a clinical vignette.

That's a wrap! More news next week.