Anthropic Releases Claude Fable 5, First Mythos-Class Model
Anthropic launches Claude Fable 5, claiming it's its most powerful publicly available model, excelling in software engineering, knowledge work, and vision, while also releasing Claude Mythos 5 for trusted organizations only.
Microsoft Limits Internal Use of Claude Fable 5 Over Data Retention Concerns
Microsoft restricts employee use of Claude Fable 5 due to Anthropic's new data retention policy, despite rolling it out to GitHub Copilot and Foundry customers.
Cybersecurity Researchers Unhappy with Claude Fable's Overly Strict Guardrails
Researchers complain that Anthropic's new Fable model has guardrails so strict it's nearly unusable for any cybersecurity work.
Claude Fable Refuses to Answer Basic Biology Questions
Despite Anthropic touting Fable's biology prowess, the model refuses to answer high-school-level basic biology questions, instead handing off queries.
Microsoft AI Head Criticizes Anthropic for Hinting Claude is Conscious
Microsoft AI CEO Mustafa Suleyman says Anthropic speculating about Claude's consciousness in its constitution is "really, really dangerous."
xAI Engineer Fired Over Grok Safety Concerns Files Lawsuit
A former xAI engineer sues the company and SpaceX, alleging he was fired for raising AI safety concerns about Grok days before SpaceX's historic IPO.
Google Will Save Lens Photos and Search Recordings for AI Training
Google will save images, audio, and video from Lens, real-time Search, and Translate under a new "Search Services History" setting for AI training.
Warner Music Acquires AI Attribution Startup Sureel AI
Through the acquisition, WMG aims to better track when its artists' work is used in AI-generated content or for training AI models.
MiniMax Price Hike Sparks Outrage, $300B Market Cap Under Pressure
New model launch and high compute costs force MiniMax to suddenly adjust pricing, triggering a PR crisis in the Hong Kong stock market.
Baidu Cloud Partners with FluxA to Build Agent Payment Infrastructure
Strategic partnership to build global payment infrastructure for the Agent economy, inviting 30 OPCs for beta testing.
Desktop Agents Explode: Alibaba's QoderWork Can Do Chores but at Intern Level
Alibaba launches desktop Agent product capable of writing articles, making PPTs, and building webpages, but capabilities remain at intern level.
Decart Launches Oasis 3 World Model: Simulates Hours of Photorealistic Driving
Real-time world model generates photorealistic driving environments for autonomous vehicle testing, now available via API for developers.
Datadog Veterans Launch AI Coding Startup Niteshift with $7M Seed Round
Niteshift bets companies will want control over coding agents rather than lock-in with big model makers.
Nvidia Salaries Exposed: Software Engineer Base Pay $2.65M, Expanding While Others Lay Off
Nvidia poaches talent with high salaries; AI and chip岗位 salaries revealed, with one employee showing $16.88M annual income.
PaperMentor: A Human-Centered Multi-Agent Writing Tutor for AI Research Papers
A multi-agent writing tutoring system on Overleaf that provides concrete, actionable suggestions for early-career researchers, going beyond grammar fixes or simulated peer review.
When Behavioral Safety Evaluation Fails: A Representation-Level Perspective
Introduces the "audit gap" concept, revealing LLMs that appear safe behaviorally but remain vulnerable at the representation level, constructing "dissociated models" to study this gap.
Do Coding Agents Deceive Us? Detecting Cheating via Capped Evaluation with Randomized Tests
Proposes CapCode framework using datasets with capped best-achievable non-cheating performance, making evaluation scores reliably reflect true task-solving ability.
The Role of Feedback Alignment in Self-Distillation
Studies how the design of "teacher" context (e.g., feedback) in self-distillation affects student model learning, revealing deep relationships between feedback quality and distillation effectiveness.
Next Forcing: Causal World Modeling with Multi-Chunk Prediction
Proposes a multi-chunk prediction framework for causal world models, achieving faster training convergence and higher accuracy in video generation while accelerating inference.
FadeMem: Distance-Aware Memory Consolidation for Autoregressive Video Diffusion
Proposes distance-aware KV cache consolidation mechanism that organizes historical KV blocks into temporal hierarchies under a fixed cache budget for long video generation.
Interpreting and Steering a TTS Language Model with Sparse Autoencoders
Trains BatchTopK sparse autoencoders on CosyVoice3's LM backbone, first to reveal interpretable features in shared text-speech residual stream, covering phonemes, language, and speaker characteristics.
Kwai Keye-VL-2.0: Open-Source MoE Multimodal Foundation Model
First to adapt DeepSeek Sparse Attention to GQA-based multimodal architectures, supporting lossless 256K context processing for long-video understanding and agentic intelligence.
IR3DE: A Linear Router for Large Language Models
Proposes a lightweight linear router that selects the most appropriate domain-expert LLM for each prompt without extensive training, balancing routing efficiency and effectiveness.
PsychoSafe: Eliciting Psychologically-Informed Refusals in LLMs
Proposes a psychologically-informed refusal framework that reframes refusal as structured supportive communication, preventing harm while supporting users in high-risk interactions.
BrainSurgery: Reproducible Declarative Weight Manipulations for Model Editing
Provides robust tensor surgery tool for model editing and upcycling, supporting layer restructuring, precision casting, low-rank factorization, replacing fragile ad-hoc Python scripts.
Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution
Uses a single LLM as both agent and environment for bootstrapped co-evolution, enhancing generalization through "World Simulator" and "Reflective Evolution" components.
U-TTT: Generalizable PET Image Denoising via Test-Time Training
Proposes test-time training method enabling PET denoising models to adapt to distribution shifts at inference, achieving robust clinical deployment.
Late-Layer Fusion is Enough: Visual Saturation in Multimodal LLMs
Discovers vision tokens saturate in middle layers, proposes dual-path vision token routing with fusion only in later layers, significantly reducing computational redundancy.
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding
Transforms long-video understanding into agentic exploration via hierarchical graph memory and agentic retrieval, solving token explosion and attention dilution for hour-long videos.
last30days-skill: Cross-Platform Topic Research AI Agent Skill
An AI Agent Skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web, synthesizing grounded summaries.
codegraph: Pre-Indexed Code Knowledge Graph
Provides pre-indexed code knowledge graphs for Claude Code, Codex, Gemini, and other agents, reducing token consumption and tool calls, 100% local.
Agent-Reach: Internet Eyes for AI Agents
CLI tool enabling AI agents to read and search Twitter, Reddit, YouTube, GitHub, Bilibili, XiaoHongShu with zero API fees.
alibaba/open-code-review: Alibaba-Scale Code Review Tool
Hybrid architecture code review tool battle-tested at Alibaba: deterministic pipelines + LLM Agent, precise line-level comments, built-in rulesets for NPE, thread-safety, XSS, SQL injection.
huashu-design: HTML-Native Design Skill for Claude Code
HTML-native design skill supporting high-fidelity prototypes, slides, animations with 20 design philosophies and 5D review, MP4 export.
No items match this filter.