Machine Achieves Better Results Than Humans In Understanding Visuals For The First Time

Alibaba secured first place in the latest global VQA (Visual Question Answering) Leaderboard, better than a human’s performance in the same context. This is the first time that a machine has outperformed humans in understanding images for answering text questions, with Alibaba’s algorithm recording an 81.26% accuracy rate in answering questions related to images, comparing to human’s performance of 80.83% (in test-standard part).

The Challenge, organized annually since 2015 by the worldwide leading visual conference CVPR, attracts global players including Facebook, Microsoft and Stanford University. The evaluation presents an image and a related natural language question, to which participants are asked to provide an accurate natural language answer. This year, the challenge contained more than 250,000 images and 1.1 million questions.

The breakthrough of machines intelligence in answering image-related questions was made possible thanks to the innovative algorithm design from Alibaba DAMO Academy, the global research and development initiative of Alibaba Group. By leveraging its proprietary technologies – including diverse visual representations, multimodal pretrained language models, adaptive cross-modal semantic fusion and alignment technology, the Alibaba team was able to make significant progress in not only analyzing the images and understanding the intent of the questions, but also in answering them with proper reasoning while expressing it in a human-like conversational style.

The VQA technology has already been widely applied across Alibaba’s ecosystem. For example, it has been used in Alibaba’s intelligent chatbot Alime Shop Assistant, which is used by tens of thousands of merchants on Alibaba’s retail platforms.

“We are proud that we have achieved another significant milestone in machine intelligence, which underscores our continuous efforts in driving the research and development in related AI fields,” said Si Luo, Head of Natural Language Processing (NLP) at Alibaba DAMO Academy. “This is not implying humans will be replaced by robots one day. Rather, we are confident that smarter machines can be used to assist our daily work and life, and hence, people can focus on the creative tasks that they are best at.”

VQA can be used in a wide range of areas, Si Luo added. For example, it can be used when searching for products on e-commerce sites, for supporting the analysis of medical images for initial disease diagnosis, as well as for smart driving, as the auto AI assistant can offer basic analysis of photos captured by the in-car camera.

This is not the first time Alibaba’s machine-learning model has eclipsed others. Alibaba’s model also topped the GLUE benchmark rankings, an industry table perceived as the most-important baseline test for the NLP model. Alibaba’s model significantly outperformed the human baselines, marking a key milestone in the development of robust natural language understanding systems.

In 2019, Alibaba’s model exceeded human scores when tested by the Microsoft Machine Reading Comprehension dataset, one of the artificial-intelligence world’s most challenging tests for reading comprehension. The model scored 0.54 in the MS Marco question-answer task, outperforming the human score of 0.539, a benchmark provided by Microsoft. In 2018, Alibaba also scored higher than the human benchmark in the Stanford Question Answering Dataset – also one of the most-popular machine reading-comprehension challenges worldwide.

Alibaba’s model “AliceMind” earned the top spot in the global VQA Challenge 2021

Vivant Water Acquires Majority Stake In Puerto Princesa Wastewater Facility

Unilever Philippines Boosts Local Production Capabilities Manufacturing In Cebu

She Means Business: Real Stories Of Tala Empowering Women Toward Financial Freedom

ReVerb Team Steps Forward As Learn2Lead Graduates

Machine Achieves Better Results Than Humans In Understanding Visuals For The First Time

Machine Achieves Better Results Than Humans In Understanding Visuals For The First Time

How do you feel about this story?

Finally, The Philippines Is United: Filipinos Throw Support Behind Dara

Triplet Cake Sliced Into 10K Pieces To Highlight 2026 Strawberry Fest

DENR Urges Expansion Of Women-Led Climate Initiatives

Northern Samar To Open Agriculture Scholarship

PhilHealth Expands YAKAP Clinic To More Partners In Misamis Occidental

Vivant Water Acquires Majority Stake In Puerto Princesa Wastewater Facility

Tino-Hit Residents In Negros Occidental Town Avail Of PHP2.2 Million TUPAD Aid

More Ilocos Norte LGUs Backs Philippine Coconut Planting Program

Farmers, Fishers To Receive PHP100 Million Fuel Aid As Fuel Hikes Loom

Unilever Philippines Boosts Local Production Capabilities Manufacturing In Cebu

Pinoy-Made AI Film Earns Global Recognition

Philippine Swimmers Win 4 More Golds In ASEAN Para Games

Practical Shooters Fuel Productive Philippines Day With 4 Golds In SEA Games

Kayla On Her 3 Gold, 5 Silver Feat: ‘I Hope I Made The Country Proud’

Finally, The Philippines Is United: Filipinos Throw Support Behind Dara

Triplet Cake Sliced Into 10K Pieces To Highlight 2026 Strawberry Fest

DENR Urges Expansion Of Women-Led Climate Initiatives

Northern Samar To Open Agriculture Scholarship

PhilHealth Expands YAKAP Clinic To More Partners In Misamis Occidental

Vivant Water Acquires Majority Stake In Puerto Princesa Wastewater Facility

Tino-Hit Residents In Negros Occidental Town Avail Of PHP2.2 Million TUPAD Aid

More Ilocos Norte LGUs Backs Philippine Coconut Planting Program

Farmers, Fishers To Receive PHP100 Million Fuel Aid As Fuel Hikes Loom

Unilever Philippines Boosts Local Production Capabilities Manufacturing In Cebu

DENR Urges Expansion Of Women-Led Climate Initiatives

Northern Samar To Open Agriculture Scholarship

More Ilocos Norte LGUs Backs Philippine Coconut Planting Program

50 PDLs Complete Agri Training In Oriental Mindoro Town