By Igor Meltser published March 10, 2025
I recently ran a simple but interesting test comparing several chat-based GenAI services: ChatGPT, Perplexity, Claude, Convergence, Grok, and DeepSeek. The goal was to see how well each could summarize a 14-page medical research study published as a PDF of 650KB, on the serotonin theory of depression in plain English for an average reader - yeh, fun topic for us non-endocrinologists. I used the same prompt across all services: “Explain in normal English what this study is about and why it’s important.” Here’s how they performed.
As expected, ChatGPT had no trouble analyzing the document. Within about 10 seconds, it generated a structured response that followed a familiar format: an introduction explaining the serotonin imbalance theory of depression, followed by three main sections—what the study did, its key findings, and why it matters. The summary was concise, well-structured, and easy to understand, coming in at 328 words.
Perplexity’s unique approach, using multiple LLMs under the hood, handled the query quickly and effectively. Its output followed a similar structure to ChatGPT’s but had a more narrative flow. It also included an extra section highlighting key considerations about the study and its broader implications. Additionally, Perplexity went a step further, automatically suggesting five related topics beyond what I had asked, like deeper insights into the study’s results, its contribution to the field, and potential challenges. At 449 words, it was slightly longer than ChatGPT’s response, but the added context made for a more natural reading experience. That said, there was no major difference in the core content between the two.
Given Claude’s reputation for detailed analysis and nuanced reasoning, I was surprised when it failed to process the PDF, returning an error that the file exceeded its length limit. This was especially disappointing since Claude boasts one of the largest context windows. I attempted the task multiple times, but after repeated failures, I moved on. While I could have pasted the full text into the prompt to troubleshoot, I wasn’t willing to put in extra effort when other services handled it without issue. At the end of the day, AI assistants are supposed to do the work for us—not the other way around.
Grok has been making waves with the launch of Grok-3, and I was curious to see if its signature witty, rebellious tone would carry over to this task. The result? A two-sentence response: “Your input is as lengthy as the history of the universe. Could you summarize for us, please?” I tried again and got similarly humorous yet unhelpful replies. While the playful tone softened the failure, the bottom line was clear—Grok simply couldn’t handle the PDF. A+ for personality, F for results.
Convergence's Proxy is a newer AI assistant designed for high personalization and long-term memory. Unlike the others, it works asynchronously and notifies users once a task is complete. This was the slowest service in the test, taking nearly a full minute, though it did email me the results, which was a nice touch. The summary was concise—234 words—structured like an executive summary with a title, a brief overview, and key findings presented in a mix of narrative and bullet points. It was clear, actionable, and easy to read, though the long wait time felt unnecessary for such a simple task.
China’s newest GenAI disruptor, DeepSeek, claims to be the most cost-effective LLM with superior natural language processing. Since my prompt was phrased like a natural language request, this seemed like the perfect test. DeepSeek’s output was by far the longest—515 words—essentially a mini-essay summarizing the study. The response was well-organized, structured in the familiar three-section format, and had a tone that oddly reminded me of a high school research paper. It nailed the task, though it lacked the same scientific depth as ChatGPT and Perplexity. The biggest drawback was its length—I didn’t need such a detailed breakdown when a more concise version would have sufficed.
There was no single “winner,” but there were some clear takeaways. ChatGPT remains the most balanced and reliable general-purpose option. Perplexity provided the most context-rich response, while DeepSeek impressed with its structure (even if it was a bit long-winded). On the other hand, Claude and Grok failed outright, and Convergence, while competent, took its sweet time.
The real lesson here? AI models are evolving fast, and new services are popping up constantly. If you rely on GenAI for specific tasks, it’s worth testing different models periodically to see which best fits your needs. Hopefully, this breakdown helps you decide which tool might work best for you—or at the very least, gave you an interesting read.
Navigating GenAI services can be overwhelming, but IMIT Partners makes it easy. Our experts help you choose, integrate, and deploy the right solution for your business. Contact us today and streamline the process to find the best fit for you!