The most contendnt uncover source AI model with visual abilities yet could see more growers, researchers, and commenceups grow AI agents that can carry out beneficial chores on your computers for you.
Relrelieved today by the Allen Institute for AI (Ai2), the Multimodal Open Language Model, or Molmo, can make clear images as well as converse thraw a chat interface. This nastys it can produce sense of a computer screen, potentipartner helping an AI agent carry out tasks such as browsing the web, navigating thraw file straightforwardories, and writeing write downs.
“With this free, many more people can deploy a multimodal model,” says Ali Farhadi, CEO of Ai2, a research organization based in Seattle, Washington, and a computer scientist at the University of Washington. “It should be an allowr for next-generation apps.”
So-called AI agents are being widely touted as the next huge skinnyg in AI, with OpenAI, Google, and others racing to grow them. Agents have become a buzzword of postponecessitate, but the majestic vision is for AI to go well beyond chatting to reliably get complicated and cultured actions on computers when donaten a direct. This capability has yet to materialize at any comardent of scale.
Some mighty AI models already have visual abilities, including GPT-4 from OpenAI, Claude from Anthropic, and Gemini from Google DeepMind. These models can be included to power some experimental AI agents, but they are secret from see and accessible only via a phelp application programming interface, or API.
Meta has freed a family of AI models called Llama under a license that confines their commercial include, but it has yet to supply growers with a multimodal version. Meta is predicted to proclaim cut offal novel products, perhaps including novel Llama AI models, at its Connect event today.
“Having an uncover source, multimodal model nastys that any commenceup or researcher that has an idea can try to do it,” says Ofir Press, a postdoc at Princeton University who toils on AI agents.
Press says that the fact that Molmo is uncover source nastys that growers will be more easily able to fine-tune their agents for definite tasks, such as toiling with spreadsheets, by providing compriseitional training data. Models appreciate GPT-4 can only be fine-tuned to a confineed degree thraw their APIs, whereas a brimmingy uncover model can be modified extensively. “When you have an uncover source model appreciate this then you have many more chooseions,” Press says.
Ai2 is releasing cut offal sizes of Molmo today, including a 70-billion-parameter model and a 1-billion-parameter one that is minuscule enough to run on a mobile device. A model’s parameter count refers to the number of units it retains for storing and manipulating data and rawly correplys to its capabilities.
Ai2 says Molmo is as contendnt as ponderably bigr commercial models despite its relatively minuscule size, becainclude it was nurturebrimmingy trained on high-quality data. The novel model is also brimmingy uncover source in that, unappreciate Meta’s Llama, there are no remercilessions on its include. Ai2 is also releasing the training data included to produce the model, providing researchers with more details of its toilings.
Releasing mighty models is not without danger. Such models can more easily be altered for evil finishs; we may someday, for example, see the ecombinence of harmful AI agents scheduleed to automate the cyber intrusion of computer systems.
Farhadi of Ai2 argues that the efficiency and portability of Molmo will permit growers to produce more mighty gentleware agents that run natively on cleverphones and other portable devices. “The billion parameter model is now carry outing in the level of or in the league of models that are at least 10 times hugeger,” he says.
Building beneficial AI agents may depfinish on more than fair more efficient multimodal models, however. A key contest is making the models toil more reliably. This may well need further fracturethraws in AI’s reasoning abilities—someskinnyg that OpenAI has sought to tackle with its postponecessitatest model o1, which shows step-by-step reasoning sends. The next step may well be giving multimodal models such reasoning abilities.
For now, the free of Molmo nastys that AI agents are shutr than ever—and could soon be beneficial even outside of the huges that rule the world of AI.