Not So Artificial
Posts
Windsurf Rides the AI Wave

Windsurf Rides the AI Wave

Plus: Check these AI Tools

Not So Artificial Newsletter
May 16, 2025

In partnership with

Good morning, AI enthusiasts! 🤖 Ready to dive into the wild world of artificial intelligence? From groundbreaking launches to global expansions, it’s a whirlwind out there, and we’re here to make sense of it all. Let’s get started.

Want to get the most out of ChatGPT?

ChatGPT is a superpower if you know how to use it correctly.

Discover how HubSpot's guide to AI can elevate both your productivity and creativity to get more things done.

Learn to automate tasks, enhance decision-making, and foster innovation with the power of AI.

Download the free guide

🏄‍♂️ Windsurf’s In-House AI Takes the Helm

Making waves in the coding seas.

AI coding platform Windsurf just launched SWE-1, its first family of in-house AI models specifically designed to assist with the entire software engineering lifecycle—not just code generation.

The Breakdown:

The SWE-1 family features three models: SWE-1 (full-size, for paid users), SWE-1-lite (replacing Cascade Base for all users), and SWE-1-mini.
Internal benchmarks show SWE-1 outperforms all non-frontier and open-weight models, just behind Claude 3.7 Sonnet.
Unlike traditional models focused on code generation, SWE-1 is built to handle multiple environments: editors, terminals, and browsers.
Its “flow awareness” system creates a shared timeline between users and AI, enabling seamless handoffs during development.

Why it matters: Windsurf is moving beyond being just an app layer for third-party models. This bold move comes days after a rumored $3B acquisition by OpenAI. Clearly, there’s more behind that deal than meets the eye.

📊 Poe Usage Charts: AI Popularity Shifts

Model wars are heating up.

AI platform Poe just released its Spring 2025 Model Usage Trends report, shedding light on major shifts in AI preferences across text, reasoning, image, and video generation.

Key Takeaways:

GPT-4.1 and Gemini 2.5 Pro captured 10% and 5% of message share within weeks of launch. Meanwhile, Claude saw a 10% drop in the same window.
Reasoning models surged from just 2% to 10% of all text messages since January, with Gemini 2.5 Pro dominating a third of the subcategory.
Image generation saw GPT-image-1 gain 17% usage, directly challenging Black Forest Labs’ FLUX and Google’s Imagen3.
In video, China’s Kling family took over with ~30% usage right after release, while ElevenLabs still holds 80% of audio.

Why it matters: Poe’s report is a real-world snapshot of user preferences, highlighting how quickly new models can shake up the leaderboard. At this rate, next quarter’s list might look completely different.

😵‍💫 LLMs Struggle with Back-and-Forth Chats

Turns out, patience isn’t a strong suit.

A new study from Microsoft and Salesforce researchers revealed that LLMs seriously underperform during multi-turn conversations where instructions are gradually revealed, often getting “lost” and failing to recover.

Study Highlights:

15 leading LLMs, including Claude 3.7 Sonnet, GPT-4.1, and Gemini 2.5 Pro, were tested across six generation tasks.
Models hit 90% success in single-turn scenarios, but that plummeted to 60% during multi-turn exchanges.
The main issue? LLMs tend to jump to conclusions, building on incorrect assumptions without recalibration.

Why it matters: This exposes a blind spot in LLM capabilities, proving that real-world, multi-turn dialogues are still a massive challenge—something developers need to factor into design.

👨‍⚕️ World’s First AI-Doctor Clinic Opens in Saudi Arabia

Virtual medicine is here.

Chinese tech firm Synyi AI has launched the world’s first AI-guided medical center in Saudi Arabia, marking its debut in the international market.

The Scoop:

The clinic features a virtual doctor, Dr. Hua, who handles initial diagnoses and drafts treatment plans for review by a human physician.
Currently focused on 30 respiratory conditions, with plans to expand to 50 by year’s end.

Why it matters: Synyi AI is setting the stage for global AI-driven healthcare. This Saudi launch could pave the way for a new era of automated medical services.

Other News

🤖 You.com’s ARI outperforms OpenAI’s Deep Research with a 76% win rate, adding enterprise features.
🚀 Meta delays Llama Behemoth to Fall, citing performance issues.
🌎 OpenAI launches OpenAI to Z Challenge with a $250k prize for discovering archaeological sites.
🏢 Salesforce acquires Convergence AI, integrating it into Agentforce.
💉 Intelligent Internet debuts II-Medical-9B, a small, local-run medical model comparable to GPT 4.5.
🎨 Manus AI introduces image generation for step-by-step visual planning.
💰 NVIDIA locks in chip deals with Saudi Arabia’s Humain and the UAE, following strategic meetings.

💻 Mastery tools of the day

TikTok AI Alive - Turn static images into dynamic videos for TikTok Stories
CodeRabbit - AI code reviews directly in Cursor, Windsurf, and VSCode
LegoGPT - Create stable, buildable LEGO designs from text prompts
Emergent - World’s first agentic vibe coding platform — taking you from idea to fully functional application — ready for real users

💡What else are we reading and seeing?

Working on Complex Systems
Stack overflow is almost dead
LLMs are Making Me Dumber
Microsoft pulls plug on Bing Search APIs
YouTube launches weekly top podcast list to rival Spotify and Apple
Microsoft's CEO on How AI Will Remake Every Company, Including His
If AI is so good at coding … where are the open source contributions?
Robot chefs take over at South Korea's highway restaurants, to mixed reviews

That’s it for today, folks! As always, stay curious, stay informed, and keep pushing the boundaries of what AI can do. See you tomorrow! ✌️