- Joined
- Jan 28, 2018
So, Claude 3 (haiku, sonnet and opus, opus being the strongest model right now) by anthropic was released. Very light preliminary testing by yours truly gives me the impression that Opus is on or slightly above GPT4 level, which also fits to the benchmarks. This testing is hard to objectively quantify but it seems more "there" than GPT4, for whatever that is worth. With that I mostly mean better reasoning and memory recall. I'd have to see it in action longer but the differences are at least so small that they are at the very, very least on the same level.
Google might as well give up at this point.
A test I like to do with these is making them write german prose and converse in german. Usually the models don't fully fail at this but they sound very unnatural, using a weird vocabulary. Mixtral and Mistral came very close to sounding good so far, and these prouded themselves on being multilingual models. Opus speaks german in such a natural way that it sounds more natural to me than it's english. This is probably due to me not having picked out these LLM language patterns it probably also has in german, I still wanted to note it. I am very impressed.
Also Antropic seems to have removed a lot of the guardrails that made series two perform so poorly. This might be amazon's influence.
Google might as well give up at this point.
A test I like to do with these is making them write german prose and converse in german. Usually the models don't fully fail at this but they sound very unnatural, using a weird vocabulary. Mixtral and Mistral came very close to sounding good so far, and these prouded themselves on being multilingual models. Opus speaks german in such a natural way that it sounds more natural to me than it's english. This is probably due to me not having picked out these LLM language patterns it probably also has in german, I still wanted to note it. I am very impressed.
Also Antropic seems to have removed a lot of the guardrails that made series two perform so poorly. This might be amazon's influence.