- AI Weekly Wrap-Up
- Posts
- New Post 2-21-2024
New Post 2-21-2024
Top Story
OpenAI’s Sora text-to-video is a smash hit
OpenAI just announced Sora, the most viral webapp since ChatGPT. Social media has been on fire since the announcement of this newest groundbreaking text-to-video webapp. Sora uses a blend of ChatGPT-like Transformer technology, with MidJourney-like image diffusion technology. The result is, literally, eye-popping. Extremely realistic detailed images with natural movements seem almost entirely authentic (but there can still be some weirdness with the hands.) For now, it only generates 60-second clips, and it’s in very limited beta release, for safety testing. There is a massive clamor on social media for an early general release. Click the top link below to view the demos and see what all the fuss is about.

Clash of the Titans
AI powered app-less smartphone with voice assistant
Imagine a smartphone with no apps, but a voice interface that turns it into an AI-powered assistant. German telephone company Deutsche Telekom has partnered with cellphone chipmaker Qualcomm, and with AI interface startup Brain AI to develop a completely voice interface to your phone. They describe it as what Siri should have been, a digital concierge that performs a wide range of tasks for you just by talking to it.

Microsoft announces dual agent assistant for Windows
Microsoft releases its own version of an intelligent assistant for Windows. Researchers posted an open source “UI-Focused Agent” (UFO) which can carry out complex, multi-step user requests. In the example diagramed below (taken from the paper) an agent responds to a user request by extracting information from a Word file, summarizing it, combining that summary with images on the computer to create a PowerPoint presentation, then emailing the PowerPoint file to a list of recipients. The agent is able to “see” the screen, and control mouse and keyboard appropriately to complete the requested task.

Google upgrades Gemini Pro to 1 million tokens
Google’s recently-released Gemini Ultra AI model is very good, equal in almost all respects to OpenAI’s top-ranked GPT-4. The Gemini Pro model is more or less equal to ChatGPT 3.5. Now Google steps up the competition by giving Gemini Pro a huge upgrade in context size, or “memory”- it can now handle up to 1 million tokens, or about 700,000 words. This is equivalent to several novels, or to an hour of video. ChatGPT’s context window is only 16,000 tokens, or about 11,000 words. Google has stumbled badly in the past year, giving OpenAI a huge early lead in the AI race, but Google now seems to be regaining its footing. Look for more announcements from Google soon.

Fun News
AI model cage match: compare results to your prompt
Want to run head-to-head comparisons of different AI models to the same prompt? Chatbot Arena, a project of researchers from UC Berkeley, makes it easy and fun. And it contributes to science! Click the link below to go to the contest page. You enter a prompt, and 2 randomly-chosen widely available AI models are given that prompt simultaneously. You then view the results and choose the better response. This is entertaining for users, but it also contributes to ongoing research at Berkely on user preferences for different AI models. Highly recommended. Does ChatGPT or Llama write the better haiku on beer pong? Does Mistral or Claude come up with a better recipe for eggplant lasagna? The site includes a continuously updated leaderboard. Currently GPT-4 is ahead by a nose, with Gemini, Mistral, and Claude close behind. The much-loved open-source Llama is trailing (but Mistral was recently revealed to be a fine-tuned version of Llama - ?!?)

Reddit sells user posts to AI company before its IPO
Reddit has been preparing for a potential multi-billion-dollar IPO for a while now. The company stirred a user revolt recently with some of their monetization moves. Now Reddit (self-proclaimed “Front page of the Internet”) has entered into a reported $60 million deal with an unnamed AI company to allow their user posts to be scraped for AI model training. This is part of a trend whereby AI companies are coughing up cash to gain access to rich sources of human-generated texts while avoiding copyright lawsuits. Everyone wins - except maybe the users who actually generate the content being sold and scraped.

ChatGPT changing demand for freelancers
Much ink has been spilled over the potential of AI to replace jobs for… well, maybe everybody? One clever analyst took a look at perhaps the most vulnerable jobs - gigs for freelancers (no messy HR issues, amirite?) Blogposter Henley Wing analyzed 5 million job postings on freelancer online marketplace Upwork to see what jobs were being most affected by AI. TL;DR - demands for 3 categories of jobs (Writing, Translation, and Customer Service) decreased substantially. Almost all other categories saw an increase in job postings. This makes some sense - writing tasks, customer service chatbots, and translation are all strengths of AI models, and require little user experience to get started. And although AI can do graphic design and video editing, the current tools are clunky, not easy to work with, and hard to control sufficiently to get a particular desired end product. The tools will improve (see our Top Story this week about video generation) and users will get more expert with them, so these categories of jobs may see an impact in the not-so-distant future.
/

UPenn researchers develop super-fast photonic AI chip
Engineers at the University of Pennsylvania have developed a new computer chip that performs calculations with light, not electrons as in conventional computers. The physics of this configuration make it possible to radically speed up vector matrix multiplications - the types of computing most needed by AI.

AI in Medicine
AI model can determine autism patients’ sex from brain scans
Stanford University researchers studying Autism Spectrum Disorder (ASD) developed a spatiotemporal deep neural network to analyze functional MRI scans of patients with ASD. The model learned to classify the sex of patients scanned with approximately 85% accuracy, based on subtle differences in brain organization between the sexes. The authors conclude that gender-specific therapies may be needed for patients with ASD.
Machine Learning model predicts avoidable ER visits in Medicaid
Medicaid patients suffer a variety of socioeconomic barriers to care, and many state Medicaid programs employ active outreach to vulnerable populations, in an attempt to head off complications that lead to ER visits and hospitalizations. Historically, these efforts have been hampered by poor predictive models of who is most likely to benefit from outreach efforts. Researchers studied a sample of 10 million Medicaid patients from 26 states and the District of Columbia, and developed a machine learning model that tripled the success in targeting patients most at risk for these types of complications.
AI Scribes for physicians start to gain traction
AI Scribes automatically produce encounter notes for physicians by transforming the oral conversation between physician and patient to a text transcript, which is then summarized into a note suitable for entering into the medical record. A number of companies are actively pursuing this potential multi-billion-dollar market, including Nuance (a subsidiary of Microsoft), Abridge, Nabla, Suki, and others. A recent survey indicates that up to one-third of US Primary Care Physicians have tried at least one such system.
That's a wrap! More news next week.