You are here: Business > Technology, Media and Telecom >

Amazon announces multimodal AI models Nova

: 0 Comment(s)Print E-mail Xinhua, December 4, 2024

Adjust font size:

Amazon Web Services (AWS), Amazon's cloud computing division, announced Tuesday a new family of generative AI, multimodal models called Nova at its re:Invent conference.

There are four text-focused models in total: Micro, Lite, Pro, and Premier. The first three are available for AWS customers on Tuesday, while Premiere will launch in early 2025.

"We've continued to work on our own frontier models," Amazon CEO Andy Jassy said, "and those frontier models have made a tremendous amount of progress over the last four to five months."

The text-focused Nova models, which are optimized for 15 languages, are mainly differentiated by their capabilities and sizes.

Micro can only take in text and output text, and delivers the lowest latency of the bunch -- processing text and generating answers the fastest. Lite can process image, video, and text inputs reasonably quickly. Pro offers the best combination of accuracy, speed, and cost for various tasks. And Premier is the most capable, designed for complex workloads, according to AWS.

Micro has a 128,000-token context window, which can process up to around 100,000 words. Lite and Pro have 300,000-token context windows, which works out to around 225,000 words, 15,000 lines of computer code, or 30 minutes of video, it said.

In early 2025, certain Nova models' context windows will expand to support over 2 million tokens, AWS said.

"We've optimized these models to work with proprietary systems and APIs, so that you can do multiple orchestrated automatic steps -- agent behavior -- much more easily with these models," Jassy said.

In addition, there's an image-generation model, Nova Canas, and a video-generating model, Nova Reel. Both have launched on AWS.

Jassy said AWS is also working on a speech-to-speech model for the first quarter of 2025, and an "any-to-any" model for around mid-2025. "You'll be able to input tech, images, or video and output text, speech, images, or video," Jassy said of the any-to-any model.

Follow China.org.cn on Twitter and Facebook to join the conversation.
ChinaNews App Download