Alibaba launches AI model that can understand images
As the global race for leadership in artificial intelligence heats up, Alibaba launched a new artificial intelligence model on Friday that it says can understand images and conduct more complex conversations than previous products.
The Chinese technology giant announced that its two new models, Qwen-VL and Qwen-VL-Chat, will be open source — meaning researchers, academics and companies worldwide can use them to create their own AI applications without needing to train their own systems, saving time and money.
Alibaba claims Qwen-VL can generate captions for images based on open-ended questions.
Similarly, Qwen-VL-Chat enables more “complex interactions,” according to Alibaba, such as comparing multiple image inputs and answering multiple questions. As well as writing stories and creating images from photos, Alibaba says Qwen-VL-Chat can also solve mathematical equations.
Alibaba provided an example of a Chinese hospital sign input. Through the interpretation of the sign image, AI can answer questions about the locations of certain hospital departments.
So far, much generative AI has focused on responding to text based on human inputs. The latest version of OpenAI’s ChatGPT also has the ability to understand images and respond in text, much like Qwen-VL-Chat.