Anthropic Claude 4 Review: Creative Genius Trapped by Old Limitations

May 24, 2025

Anthropic's Claude AI Evolves, Delivering Mixed Results

San Francisco-based Anthropic has released the fourth generation of its Claude AI models, showcasing a mix of strengths and limitations compared to its industry peers. While tech giants like Google and OpenAI push the boundaries of AI capabilities, Anthropic has taken a more focused approach, doubling down on specific areas while leaving others relatively unchanged.

Creative Writing: Sonnet 4 Shines, but Opus 4 Stumbles

In the realm of creative writing, Anthropic's Claude Sonnet 4 model emerges as the clear winner, outperforming its counterpart, Claude Opus 4. Sonnet 4 produced vivid prose with exceptional atmospheric details and psychological nuance, crafting an immersive narrative that balanced action, introspection, and philosophical insights. However, the margin of victory was narrow, and Anthropic's focus on creative writing improvements appears to have waned compared to previous iterations.

Coding Prowess: Claude Outpaces Gemini, but Maintainability Lags

In the coding evaluation, Claude Opus 4 demonstrated superior functionality, delivering a sophisticated game with advanced mechanics and rich gameplay elements. While Google's Gemini 2.5 Pro produced a cleaner, more maintainable codebase, Claude's implementation offered users a more engaging experience. Developers may still prefer Gemini's consistent coding standards and modular structure, which can facilitate easier improvements and debugging.

Mathematical Reasoning: OpenAI's o3 Outshines Claude's Transparency

When it comes to mathematical problem-solving, OpenAI's o3 model emerged victorious, achieving 100% accuracy on the challenging FrontierMath benchmark. In contrast, Claude Opus 4 displayed its reasoning process, allowing for error analysis and educational benefits, but ultimately fell short in terms of pure problem-solving accuracy.

Strategic Communication: Claude Excels in Nuanced Stakeholder Management

In the evaluation of strategic communication skills, Claude demonstrated exceptional thinking, crafting a detailed three-pillar messaging framework to address the diverse needs and perspectives of multiple stakeholder groups during a medical center crisis. The model's responses were more practical and adaptable compared to the more generic frameworks provided by ChatGPT.

Context Retrieval: Gemini's Expansive Capabilities Outshine Claude's Limitations

Claude's context retrieval capabilities proved reliable within an 85,000-token document, but the model's 200,000-token context window limitation became a significant constraint when processing larger documents. In contrast, competitors like Google's Gemini can handle context windows exceeding one million tokens, making it a more suitable choice for users working with extensive documentation.

Conclusion: Claude 4 Shines in Specific Areas, but Limitations Persist

Anthropic's Claude 4 models offer a mixed bag of capabilities, excelling in certain areas like creative writing and strategic communication while falling short in others, such as context retrieval and mathematical reasoning. Power users who appreciate Claude's strengths will find it a valuable tool, but novice users and those requiring more comprehensive AI capabilities may need to explore alternative options.