Testing Different AI Models - Tue, Feb 18, 2025
Article showcasing different AI models and testing them by making a Tic-Tac-Toe HTML game.
Tic-Tac-Toe AI Evaluation Report
This report evaluates several AI models for a Tic-Tac-Toe game, focusing on performance, design, and gameplay in different modes. The evaluation includes different versions of AI models: ChatGPT-4, Cloud 3.5-Sonnet, Copilot, Gemini 2.0 Flash, and Gemini 2.0 Flash-Experimental.
the promt was runed only once per AI
1. ChatGPT-4
Speed: Fast.
Player vs AI: Works well, providing a smooth experience.
Design: Simple, intuitive, but slightly lacking in aesthetics compared to others.
Issues: Occasionally, the game incorrectly prints the letter “O” multiple times (twice or five times) without reason.
Summary: ChatGPT-4 delivers a solid player vs. AI experience, but with minor glitches in output. The overall gameplay remains strong.
2. Cloud 3.5-Sonnet
Speed: Slightly slower than ChatGPT-4.
Player vs AI: Effective and functional. The AI behaves appropriately.
Design: Good overall design with a clean interface and user-friendly features.
Issues: None specific to functionality, but some users may prefer faster performance.
Summary: Cloud 3.5-Sonnet offers a reliable and visually appealing design with a slightly slower response time.
3. Copilot
Speed: Slow.
Player vs AI: Works, but the AI logic isn’t as responsive as the others.
Design: The design is minimalistic, but the game lacks the player vs. player mode. Only player vs. AI is supported.
Issues: Slow response times during AI turns can disrupt the gameplay experience.
Summary: Copilot’s lack of player-vs-player mode and slower AI makes it less optimal for smooth gameplay, but the AI function is still decent.
4. Gemini 2.0 Flash-Experimental
Speed: Reasonably fast, but the AI responses can be slow.
Player vs AI: The AI responds slower compared to other models, and during the AI turn, players can accidentally click and act as the AI.
Design: Good design with a clean interface, although not as polished as the Cloud 3.5-Sonnet.
Issues: The ability to click during AI turns can lead to confusing behavior.
Summary: Gemini 2.0 Flash-Experimental has solid design and gameplay, but the AI responsiveness and interaction could be improved.
5. Gemini 2.0 Flash
Speed: Reasonable but inconsistent.
Player vs AI: The player-vs-AI mode is non-functional, with only one mode available—AI vs AI.
Design: The design is relatively weak, with unattractive buttons and interface elements.
Issues: The mode switch functionality is absent, and the game doesn’t perform as expected with the player-vs-AI mode.
Summary: Gemini 2.0 Flash has a weak design and lacks essential modes, making it less practical compared to the others.
General Observations
Design Consistency: Models like Cloud 3.5-Sonnet and Gemini 2.0 Flash-Experimental offer the best visual experience, while Copilot and Gemini 2.0 Flash have room for improvement in design.
Functionality: Most AI models work well for player vs AI, but only ChatGPT-4 and Cloud 3.5-Sonnet perform consistently in that mode. Copilot lacks player vs player support, and Gemini 2.0 Flash has severe limitations.
Speed: While ChatGPT-4 and Gemini 2.0 Flash-Experimental provide faster gameplay, Copilot and Gemini 2.0 Flash fall behind in terms of response time.
Conclusion
For optimal performance and design in Tic-Tac-Toe gameplay, ChatGPT-4, Gemini 2.0 Flash-Experimental, and Cloud 3.5-Sonnet are the best options. They offer smooth player vs. AI gameplay and decent design, though minor glitches in output may affect user experience. Copilot and Gemini 2.0 Flash fall short in certain areas, with Gemini 2.0 Flash not offering the expected player-vs-AI experience at all.