Anthropic, an artificial intelligence (AI) and “public benefit” company, launched Claude 2 on July 11, marking another milestone in a year full of seemingly nonstop progress from the burgeoning generative AI sector.
Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at https://t.co/uLbS2JNczH in the US and UK. pic.twitter.com/jSkvbXnqLd
— Anthropic (@AnthropicAI) July 11, 2023
According to a company blog post, Claude 2 shows improvements across nearly every measurable category. Perhaps most noteworthy among the differences between it and its predecessor is how the researchers discuss their work.
There’s no mention of traditional machine learning benchmarking or computational scores against similar models in the blog post announcing Claude 2. Instead, Anthropic tested both Claude and Claude 2 head-to-head on numerous tests meant to represent real-world knowledge, skills and problem-solving tests.
Claude 2 beat its predecessor across the board on knowledge, coding and other exams and, according to Anthropic, even scores well against human averages:
“When compared to college students applying to graduate school, Claude 2 scores above the 90th percentile on the GRE reading and writing exams, and similarly to the median applicant on quantitative reasoning.”
It is worth noting that many experts believe comparisons between human and AI test takers are inefficacious due to the nature of human cognitive reasoning and the likelihood that a large language model’s training data set contains test information. Essentially, tests designed for humans may not actually “test” an AI’s ability to reason or provide a proper demonstration of actual knowledge or skill.
Along with the launch of Claude 2, Anthropic debuted a beta version of a web-based “Talk to Claude” interface providing general access to the chatbot for users in the United States and the United Kingdom.
Related: How to land a high-paying job as an AI prompt engineer
Cointelegraph conducted brief testing of the new version and, anecdotally speaking, the improvements were immediately noticeable. Claude 2 responded to Cointelegraph prompts near instantly with clear, concise answers.
According to Anthropic, the new model’s prompt limit is 100,000 tokens, or about the equivalent of 75,000 words. The site’s user interface indicates that users can upload PDF, TXT, CSV and similar documents for parsing; however, this functionality did not work in Cointelegraph’s limited testing prior to publishing this article.
Collect this article as an NFT to preserve this moment in history and show your support for independent journalism in the crypto space.