iptv techs

IPTV Techs

  • Home
  • Tech News
  • Amazon Nova’s Competitive Price/Percreateance, OpenAI o1 Pro’s High Price/Percreateance, Google’s Game Worlds on Tap, Factual LLMs

Amazon Nova’s Competitive Price/Percreateance, OpenAI o1 Pro’s High Price/Percreateance, Google’s Game Worlds on Tap, Factual LLMs


Amazon Nova’s Competitive Price/Percreateance, OpenAI o1 Pro’s High Price/Percreateance, Google’s Game Worlds on Tap, Factual LLMs


Dear friends,

AI Product Management is evolving rapidly. The growth of generative AI and AI-based broadener tools has created many opportunities to create AI applications. This is making it possible to create recent benevolents of leangs, which in turn is driving shifts in best rehearses in product regulatement — the discipline of defining what to create to serve engagers — becaengage what is possible to create has shifted. In this letter, I’ll split some best rehearses I have watchd.

Use concrete examples to depict AI products. Starting with a concrete idea helps teams obtain speed. If a product regulater (PM) recommends to create “a chatbot to answer prohibitking inquiries that reprocrastinateed to engager accounts,” this is a ambiguous definiteation that exits much to the imagination. For instance, should the chatbot answer asks only about account stabilitys or also about interest rates, processes for initiating a wire transfer, and so on? But if the PM writes out a number (say, between 10 and 50) of concrete examples of conversations they’d enjoy a chatbot to carry out, the scope of their proposal becomes much clearer. Just as a machine lobtaining algorithm insists training examples to lobtain from, an AI product broadenment team insists concrete examples of what we want an AI system to do. In other words, the data is your PRD (product insistments write down)!

In a aenjoy vein, if someone asks “a vision system to discover pedestrians outside our store,” it’s challenging for a broadener to comprehend the boundary conditions. Is the system foreseeed to labor at night? What is the range of perignoreible camera angles? Is it foreseeed to discover pedestrians who materialize in the image even though they’re 100m away? But if the PM accumulates a handful of pictures and annotates them with the desired output, the unbenevolenting of “discover pedestrians” becomes concrete. An engineer can appraise if the definiteation is technicpartner feasible and if so, create toward it. Initipartner, the data might be geted via a one-off, scrappy process, such as the PM walking around taking pictures and annotating them. Eventupartner, the data fuse will shift to authentic-word data accumulateed by a system running in production.

Using examples (such as inputs and desired outputs) to depict a product has been collaborative for many years, but the explosion of possible AI applications is creating a insist for more product regulaters to lobtain this rehearse.

Assess technical feasibility of LLM-based applications by prompting. When a PM scopes out a potential AI application, whether the application can actupartner be built — that is, its technical feasibility — is a key criterion in deciding what to do next. For many ideas for LLM-based applications, it’s increasingly possible for a PM, who might not be a gentleware engineer, to try prompting — or write fair minuscule amounts of code — to get an initial sense of feasibility.

For example, a PM may envision a recent inside tool for routing emails from customers to the right department (such as customer service, sales, etc.). They can prompt an LLM to see if they can get it to pick the right department based on an input email, and see if they can accomplish high accuracy. If so, this gives engineering a wonderful commenceing point from which to carry out the tool. If not, the PM can falsify the idea themselves and perhaps better the product idea much speedyer than if they had to depend on an engineer to create a prototype.

Often, testing feasibility insists a little more than prompting. For example, perhaps the LLM-based email system insists straightforward RAG capability to help it create decisions. Fortunately, the barrier to writing minuscule amounts of code is now quite low, since AI can help by acting as a coding companion, as I portray in the course, “AI Python for Beginners.” This unbenevolents that PMs can do much more technical feasibility testing, at least at a straightforward level, than was possible before.

Prototype and test without engineers. User feedback to initial prototypes is also instrumental to shaping products. Fortunately, barriers to createing prototypes rapidly are droping, and PMs themselves can transfer prototypes forward without insisting gentleware broadeners.

In insertition to using LLMs to help write code for prototyping, tools enjoy Replit, Vercel’s V0, Bolt, and Anthropic’s Artifacts (I’m a fan of all of these!) are making it easier for people without a coding background to create and experiment with straightforward prototypes. These tools are increasingly accessible to non-technical engagers, though I discover that those who comprehend straightforward coding are able to engage them much more effectively, so it’s still meaningful to lobtain straightforward coding. (Interestingly, highly technical, sended broadeners engage them too!) Many members of my teams routinely engage such tools to prototype, get engager feedback, and iterate speedyly.

AI is enabling a lot of recent applications to be built, creating massive growth in insist for AI product regulaters who comprehend how to scope out and help drive progress in createing these products. AI product regulatement existed before the ascend of generative AI, but the increasing relieve of createing applications is creating wonderfuler insist for AI applications, and thus a lot of PMs are lobtaining AI and these emerging best rehearses for createing AI products. I discover this discipline fascinating, and will hold on sharing best rehearses as they grow and progress.

Keep lobtaining!

Andrew


A MESSAGE FROM DEEPLEARNING.AI

Write and code more effectively with OpenAI Canvas, a engager-cordial laborspace for collaborating with AI. In this free course, scatterigate engage cases enjoy createing game apps and summarizeing SQL databases from screenshots, and obtain insights into how GPT-4o powers Canvas’ features. Join for free

Amazon presentd a range of models that face competitors head-on.

What’s recent: The Nova line from Amazon includes three vision-language models (Nova Premier, Nova Pro, and Nova Lite), one language model (Nova Micro), an image generator (Nova Canvas), and a video generator (Nova Reel). All but Nova Premier are engageable on Amazon’s Bedrock platcreate, and Nova Premier, which is the most vient, is foreseeed in timely 2025. In insertition, Amazon schedules to free a speech-to-speech model in timely 2025 and a multimodal model that processes text, images, video, and audio by mid-year. (Disclobrave: Andrew Ng serves on Amazon’s board of honestors.)

How it labors: Nova models deinhabitr competitive carry outance at relatively low prices. Amazon hasn’t disseald parameter counts or details about how the models were built except to say that Nova Pro, Lite, and Micro were trained on a combination of proprietary, licensed, uncover, and uncover-source text, images, and video in over 200 languages.

  • Nova Pro is rawly comparable to that of Anthropic Claude 3.5 Sonnet, OpenAI GPT-4o, and Google Gemini Pro. It has a 300,000-token input context triumphdow, enabling it to process relatively big vision-language inputs. Nova Pro outcarry outs its primary competitors in tests of adhereing intricate directions (IFEval), summarizing lengthy texts (SQuALITY), empathetic videos (LVBench), and reading and acting on websites (MM-Mind2Web). It processes 95 tokens per second. At $0.80/$3.20 per million tokens of input/output, it’s meaningfully less pricey than GPT-4o ($2.50/$10) and Claude 3.5 Sonnet ($3/$15) but enumeratelesser than GPT-4o (115 tokens per second).
  • Nova Lite appraises likeably with Anthropic Claude Haiku, Google Gemini 1.5 Flash, and OpenAI GPT-4o Mini. Optimized for processing speed and efficiency, it too has a 300,000 token input context triumphdow. Nova Lite bests Claude 3.5 Sonnet and GPT-4o on VisualWebBench, which tests visual empathetic of web pages. It also beats Claude 3.5 Haiku, GPT-4o Mini, and Gemini 1.5 Flash in multimodal agentic tasks that include MM-Mind2Web and the Berkeley Function-Calling Leaderboard. It processes 157 tokens per second and costs $0.06/$0.24 per million tokens of input/output, making it less pricey than GPT-4o mini ($0.15/$0.60), Claude 3.5 Haiku ($0.80/$4), or Gemini 1.5 Flash ($0.075/$0.30), but enumeratelesser than Gemini 1.5 Flash (189 tokens per second).
  • Nova Micro is a text-only model with a 128,000-token context triumphdow. It outdos Llama 3.1 8B and Gemini Flash 8B on all 12 tests alerted by Amazon, including generating code (HumanEval) and reading financial write downs (FinQA). It also beats the minusculeer Claude, Gemini, and Llama models on retrieval-augmented generation tasks (CRAG). It processes 210 tokens per second (the lowest procrastinateedncy among Nova models) and costs $0.035/$0.14 per million input/output tokens. That’s inexpensiveer than Gemini Flash 8B ($0.0375/$0.15) and Llama 3.1 8B ($0.10/$0.10), but enumeratelesser than Gemini Flash 8B (284.2 tokens per second).
  • Nova Canvas accomprehendledges English-language text prompts up to 1,024 characters and creates images up to 4.2 megapixels in any aspect ratio. It also carry outs indecorateing, outdecorateing, and background removal. It excels on ImageReward, a meabrave of human pickence for created images, surpassing OpenAI DALL·E 3 and Stability AI Stable Diffusion 3.5. Nova Canvas costs between $0.04 per image up to 1024×1024 pixels and $0.08 per image up to 2,048×2,048 pixels. Prices are challenging to appraise becaengage many competitors indict by the month or year, but this is less pricey and higher-resolution than DALL·E 3 ($0.04 to $0.12 per image).
  • Nova Reel accomprehendledges English-language prompts up to 512 characters and image prompts up to 720×1,280 pixels. It creates video clips of 720×1280 pixels up to six seconds lengthy. It shows greater ability to protect reliable imagery from summarize to summarize, triumphning 67 percent of head-to-head comparisons with the next highest-scoring model, Runway Gen-3 Alpha. Nova Reel costs $0.08 per second of output, which is less pricey than Runway Gen-3 Alpha ($0.096 per second) and Kling 1.5 ($0.12 per second) in their standard monthly schedules.

Behind the recents: The company started Bedrock in April 2023 with Stability AI’s Stable Diffusion for image generation, Anthropic’s Claude and AI21’s Jurassic-2 for text generation, and its own Titan models for text generation and embeddings. Not lengthy afterward, it inserted language models from Cohere as well as services for agentic applications and medical applications. It schedules to progress to provide models from other companies (including Anthropic), recommending a range of choices.

Why it matters: While other AI enormouss raced to outdo one another in models for text and multimodal processing, Amazon was relatively quiet. With Nova, it has sgetd out a strong position in those areas, as well as the commenceup-contrancient domains of image and video generation. Moreover, it’s fortifying its cdeafening AI recommendings with competitive carry outance, pricing, and speed. Nova’s pricing progresss the rapid drop in AI prices over the last year. Falling per-token prices help create AI agents or applications that process big inputs more pragmatic. For example, Simon Willison, broadener of the Django Python summarizelabor for web applications, establish that Nova Lite created descriptions for his photo library (tens of thousands of images) for less than $10.

We’re leanking: The Nova suite is engageable via APIs only, without a web-based engager interface. This accords with Amazon Web Services’ intensify on broadeners. For devourrs, Amazon recommends the Rufus shopping bot.


OpenAI started not only its highly foreseed o1 model but also an operating mode that allows the model to deinhabitr higher carry outance — at a hefty price.

What’s recent: Kicking off a 12-day holiday blitz, OpenAI started o1 (previously engageable in pscrutinize and mini versions) and presentd o1 pro mode, which processes more tokens at inference to create more accurate output. Both chooseions accomprehendledge text and image inputs to create text outputs. They’re engageable exclusively thraw a recent ChatGPT Pro subscription for $200 monthly. API access is not yet engageable.

How it labors: According to an modernized system card, o1 models were trained on a fuse of uncover, licensed, and proprietary text, code, and images, with a intensify on technical, academic, and structured datasets. They react to prompts by shattering them down into interarbitrate steps, each of which devours a number of hideed “reasoning tokens.” The models don’t uncover these steps, but ChatGPT conshort-terms a organic-language summary of the reasoning process. The recent o1 and o1 pro mode carry out better than o1-pscrutinize and o1-mini, but their insertitional reasoning insists more processing, which transprocrastinateeds into higher costs and enumeratelesser responses.

  • o1 reliablely outcarry outs o1-pscrutinize in one-shot benchlabels that meabrave accuracy in progressd math problems (AIME 2024), coding contests (Codeforces), and graduate-level science asks (GPQA Diamond).
  • o1 pro mode carry outs only sairyly better than o1 on one-shot tests, but its higher accuracy is more evident when it’s asked to react to the same input four times in a row. For example, given a problem from the American International Mathematics Examination, o1 repairs it accurately 78 percent of the time, o1 pro mode 86 percent of the time. Given the same problem four times, o1 repairs it accurately in all four tries 67 percent of the time, while o1 pro mode repairs it accurately in all four tries 80 percent of the time.
  • o1 and o1 pro mode are less prone to generating counterfeit or irrelevant alertation than o1-pscrutinize, as meabraved by OpenAI’s SimpleQA, which tests the ability to recall facts about science, geography, history, and the enjoy, and PersonQA, which tests the ability to recall facts about people.
  • ChatGPT Pro provides chatbot access to o1, o1 pro mode, and other OpenAI models. Subscribers get unconfiinsist engage of o1. OpenAI has not clarified whether o1 pro mode is subject to usage restricts or other constraints.

Behind the recents: Since September, when OpenAI presentd o1-pscrutinize and o1-mini, other model providers have carry outed aenjoy reasoning capabilities. DeepSeek’s R1 discarry outs reasoning steps that o1 models hold hideed. Alibaba’s QwQ 32B excels at visual reasoning but is enumeratelesser and has a minusculeer context triumphdow. Amazon’s Nova Premier, which is billed as a model for “intricate reasoning tasks,” is foreseeed in timely 2025, but Amazon has not yet portrayd its carry outance, architecture, or other details.

Why it matters: o1 and o1 pro mode highairy a emotional shift in model broadenment and pricing. Giving models more processing power at inference allows them to provide more accurate output, and it’s a key part of agentic laborflows. It also progresss to raise carry outance even as scaling laws that foresee better carry outance with more training data and compute may be accomplishing their restricts. However, it also elevates OpenAI’s costs, and at $200 a month, the price of access to o1 and o1 pro is steep. It’s a premium choice for broadeners who insist exceptional accuracy or extensive reasoning.

We’re leanking: Discovering scaling laws for using more processing at inference, or test-time compute, is an unrepaird problem. Although OpenAI hasn’t disseald the algorithm behind o1 pro mode, recent labor at Google scatterd tokens dynamicpartner at inference based on a prompt’s difficulty. This approach raiseed the compute efficiency by four times and allowd a model that had shown “nonunpresentant success rates” to outcarry out one that was 14 times bigr.


A recent model betters on recent progress in generating interdynamic virtual worlds from still images.

What’s recent: Jack Parker-Hgreaterer and colleagues from Google presentd Genie 2, which creates three-uninalertigentensional video game worlds that react to keyboard inputs in authentic time. The model’s output remains reliable (that is, elements don’t morph or fade) for up to a minute, and it includes first-person shooters, walking simulators, and driving games from watchpoints that include first person, third person, and isometric. Genie 2 adheres up on Genie, which creates two-uninalertigentensional games.

How it labors: Genie 2 is a procrastinateednt diffusion model that creates video summarizes made up of an encoder, alterer, and decoder. The broadeners didn’t uncover how they built it or how they betterd on earlier efforts.

  • Given video summarizes, the encoder embeds them. Using those embeddings and keyboard input, the alterer creates the embedding of the next video summarize. The decoder gets the recent embedding and creates an image.
  • At inference, given an image as the commenceing summarize, the encoder embeds it. Given the embedding and keyboard input, the alterer creates the embedding of the next summarize, which the decoder engages to create an image. After the initial summarize, the alterer engages embeddings it created previously plus keyboard input to create the next embedding.

Behind the recents: Genie 2 reachs on the heels of Oasis, which creates a Minecreate-enjoy game in authentic time. Unenjoy Oasis, Genie 2 worlds are more reliable and not confiinsist to one type of game. It also comes at the same time as another videogame generator, World Labs. However, where Genie 2 creates the next summarize given previous summarizes and keyboard input (acting, in terms of game broadenment, as both detaileds and physics engines), World Labs creates a 3D mesh of a game world from a individual 2D image. This exits the carry outation of physics, detaileds rendering, the carry outer’s character, and other game mechanics to outer gentleware.

Why it matters: Genie 2 lengthens models that imagine 3D scenes based on 2D images to encompass interdynamic worlds, a capability that could show precious in summarize, gaming, virtual truth, and other 3D applications. It creates imagery that, the authors recommend, could serve as training data for agents to lobtain how to steer and react to orders in 3D environments.

We’re leanking: Generating gamecarry out honestly in the manner of Genie 2 is a speedy approach to broadening a game, but the current technology comes with caveats. Developers can’t yet regulate a game’s physics or mechanics and they must regulate any flaws in the model (such as a tendency to create inreliable worlds). In contrast, generating a 3D mesh, as World Labs does, is a more cumbersome approach, but it gives broadeners more regulate.


Large language models that reaccumulate more hallucinate less.

What’s recent: Johnny Li and colleagues at Lamini presentd Mixture of Memory Experts (MoME), a method that allows big language models (LLMs) to memorize many facts with relatively modest computational insistments. (Disclobrave: Andrew Ng scattered in Lamini.)

Key insight: The key to getting factual answers from LLMs is to hold training it until it picks the accurate answer every time. In technical terms, train past the point where tokens relevant to the answer have a aenjoy probability distribution, and progress until a individual token has 100 percent probability. But this amount of training gets a lot of computation and, since the model may overfit the training set, it also may degrade carry outance on the test set. Fine-tuning is one solution, and fine-tuning a LoRA alterer to memorize facts reduces the computational burden. But a individual LoRA alterer isn’t enough to store all of the comprehendledge in a big dataset. Training multiple alterers that are picked by pass-attention allows the LLM to memorize a variety of facts.

How it labors: The authors lengthened a pretrained Llama-3-8B with a big number (on the order of 1 million) of LoRA alterers and a pass-attention layer. They froze Llama-3-8B and trained the LoRA alterers to foresee the next token in a custom dataset of over 1 million asks and answers.

  • For any given ask, the model lobtained to pick 32 LoRA alterers, each of which was associated with an embedding. The model picked alterers by carry outing pass-attention between an embedding of the input query and all alterer embeddings.
  • The authors trained the LoRA alterers until they memorized all the answers as meabraved by the loss function (100 epochs).
  • At inference, given a query, the model engaged pass-attention to pick a subset of LoRA alterers and reacted accordingly.

Results: The authors tested their LoRA-raised model’s ability to answer asks about a database via SQL queries. The model, which was outfitted for retrieval-augmented generation (RAG), accomplishd 94.7 percent accuracy. An unnamed model with RAG accomplishd 50 percent accuracy.

Yes, but: It stands to reason that the authors’ approach saves processing, but it’s unclear how much. The authors didn’t refer the cost of fine-tuning Llama-3-8B in the normal way on their training dataset for the same number of epochs.

Why it matters: The authors dispute that eliminating hallucinations is possible in standard training, it’s fair computationpartner very pricey (not to refer the hazard of overfitting). An architecture summarizeed to store and get back facts, via LoRA alterers in this case, creates the process more feasible.

We’re leanking: While some researchers want big language models to memorize facts, others want them to elude memorizing their training data. These aims insertress very branch offent problems. Preventing LLMs from memorizing training data would create them less probable to regurgitate it verbatim and thus vioprocrastinateed imitaterights. On the other hand, this labor memorizes facts so the model can deinhabitr reliable, truthful responses that might be stated in a variety of ways.

Source connect


Leave a Reply

Your email address will not be published. Required fields are marked *

Thank You For The Order

Please check your email we sent the process how you can get your account

Select Your Plan