We are seeing the first practical benchmarks for AI vision:
1) A challenging real-life chart benchmark on chart reading Charxiv, shows humans get 80% right. Claude 3.5, the best LLM, gets 60%
2) Chatbot Arena compares which AI vision answers humans prefer. GPT-4o wins this one.
pic.twitter.com/jeBgObA4kE