AI Pioneer Fei-Fei Li Has a Vision for Computer Vision

Stanford University professor Fei-Fei Li has already acquireed her place in the history of AI. She carry outed a convey inant role in the proestablish lacquireing revolution by tardydious for years to produce the ImageNet dataset and competition, which contestd AI systems to acunderstandledge objects and animals apass 1,000 categories. In 2012, a neural netlabor called AlexNet sent shockwaves thraw the AI research community when it resoundingly outcarry outed all other types of models and won the ImageNet contest. From there, neural netlabors took off, powered by the immense amounts of free training data now participateable on the Internet and GPUs that hand over unpretreatnted compute power.

In the 13 years since ImageNet, computer vision researchers mastered object recognition and shiftd on to image and video generation. Li coestablished Stanford’s Institute for Human-Caccessed AI (HAI) and persistd to push the boundaries of computer vision. Just this year she begined a commenceup, World Labs, which produces 3D scenes that participaters can study. World Labs is promised to giving AI “spatial ininestablishigence,” or the ability to produce, reason wilean, and participate with 3D worlds. Li hand overed a keynotice yesterday at NeurIPS, the massive AI conference, about her vision for machine vision, and she gave IEEE Spectrum an exclusive interwatch before her talk.

Why did you title your talk “Ascending the Lcompriseer of Visual Ininestablishigence”?

Fei-Fei Li: I leank it’s instinctive that ininestablishigence has branch offent levels of complicatedity and sophistication. In the talk, I want to hand over the sense that over the past decades, especiassociate the past 10-plus years of the proestablish lacquireing revolution, the leangs we have lacquireed to do with visual ininestablishigence are fair breathtaking. We are becoming more and more contendnt with the technology. And I was also encouraged by Judea Pearl’s “lcompriseer of causality” [in his 2020 book The Book of Why].

The talk also has a subtitle, “From Seeing to Doing.” This is someleang that people don’t appreciate enough: that seeing is seally coupled with participateion and doing leangs, both for animals as well as for AI agents. And this is a departure from language. Language is fundamenhighy a communication tool that’s participated to get ideas apass. In my mind, these are very complementary, but equassociate proestablish, modalities of ininestablishigence.

Do you uncomardent that we instinctively react to certain sights?

Li: I’m not fair talking about instinct. If you see at the evolution of perception and the evolution of animal ininestablishigence, it’s proestablishly, proestablishly intertprospered. Every time we’re able to get more adviseation from the environment, the evolutionary force pushes capability and ininestablishigence forward. If you don’t sense the environment, your relationship with the world is very subleave outive; whether you eat or become eaten is a very subleave outive act. But as soon as you are able to get cues from the environment thraw perception, the evolutionary prescertain reassociate heightens, and that drives ininestablishigence forward.

Do you leank that’s how we’re creating proestablisher and proestablisher machine ininestablishigence? By apshowing machines to notice more of the environment?

Li: I don’t understand if “proestablish” is the adjective I would participate. I leank we’re creating more capabilities. I leank it’s becoming more complicated, more contendnt. I leank it’s absolutely real that tackling the problem of spatial ininestablishigence is a fundamental and critical step towards brimming-scale ininestablishigence.

I’ve seen the World Labs demos. Why do you want to research spatial ininestablishigence and produce these 3D worlds?

Li: I leank spatial ininestablishigence is where visual ininestablishigence is going. If we are solemn about cracking the problem of vision and also joining it to doing, there’s an excessively basic, lhelp-out-in-the-dayweightless fact: The world is 3D. We don’t live in a flat world. Our physical agents, whether they’re robots or devices, will live in the 3D world. Even the virtual world is becoming more and more 3D. If you talk to artists, game prolongers, depicters, architects, doctors, even when they are laboring in a virtual world, much of this is 3D. If you fair get a moment and acunderstandledge this basic but proestablish fact, there is no ask that cracking the problem of 3D ininestablishigence is fundamental.

I’m inquisitive about how the scenes from World Labs support object permanence and compliance with the laws of physics. That experiences appreciate an exciting step forward, since video-generation tools appreciate Sora still fumble with such leangs.

Li: Once you esteem the 3D-ness of the world, a lot of this is organic. For example, in one of the videos that we posted on social media, basketballs are dropped into a scene. Becaparticipate it’s 3D, it apshows you to have that charitable of capability. If the scene is fair 2D-produced pixels, the basketball will go nowhere.

Or, appreciate in Sora, it might go somewhere but then fade. What are the biggest technical contests that you’re dealing with as you try to push that technology forward?

Li: No one has repaird this problem, right? It’s very, very difficult. You can see [in a World Labs demo video] that we have getn a Van Gogh coloring and produced the entire scene around it in a reliable style: the creative style, the weightlessing, even what charitable of produceings that neighborhood would have. If you turn around and it becomes skyscviolationrs, it would be finishly unconvincing, right? And it has to be 3D. You have to steer into it. So it’s not fair pixels.

Can you say anyleang about the data you’ve participated to train it?

Li: A lot.

Do you have technical contests think abouting compute burden?

Li: It is a lot of compute. It’s the charitable of compute that the accessible sector cannot afford. This is part of the reason I experience excited to get this sabbatical, to do this in the personal sector way. And it’s also part of the reason I have been advocating for accessible sector compute access becaparticipate my own experience underscores the convey inance of innovation with an ample amount of resourcing.

It would be pleasant to empower the accessible sector, since it’s usuassociate more encouraged by acquireing understandledge for its own sake and understandledge for the advantage of humanity.

Li: Knowledge uncovery necessitates to be helped by resources, right? In the times of Galileo, it was the best telescope that let the astronomers watch novel celestial bodies. It’s Hooke who authenticized that increaseing glasses can become microscopes and uncovered cells. Every time there is novel technoreasoned tooling, it helps understandledge-seeking. And now, in the age of AI, technoreasoned tooling participates compute and data. We have to acunderstandledge that for the accessible sector.

What would you appreciate to happen on a federal level to supply resources?

Li: This has been the labor of Stanford HAI for the past five years. We have been laboring with Congress, the Senate, the White Hoparticipate, industry, and other universities to produce NAIRR, the National AI Research Resource.

Assuming that we can get AI systems to reassociate understand the 3D world, what does that give us?

Li: It will unlock a lot of creativity and productivity for people. I would cherish to depict my hoparticipate in a much more efficient way. I understand that lots of medical usages participate comardent a very particular 3D world, which is the human body. We always talk about a future where humans will produce robots to help us, but robots steer in a 3D world, and they need spatial ininestablishigence as part of their brain. We also talk about virtual worlds that will apshow people to visit places or lacquire concepts or be delighted. And those participate 3D technology, especiassociate the hybrids, what we call AR [augmented truth]. I would cherish to walk thraw a national park with a pair of glasses that give me adviseation about the trees, the path, the cboisterouss. I would also cherish to lacquire branch offent sends thraw the help of spatial ininestablishigence.

What charitable of sends?

Li: My feeble example is if I have a flat tire on the highway, what do I do? Right now, I uncover a “how to alter a tire” video. But if I could put on glasses and see what’s going on with my car and then be guided thraw that process, that would be chilly. But that’s a feeble example. You can leank about cooking, you can leank about sculpting—fun leangs.

How far do you leank we’re going to get with this in our lifetime?

Li: Oh, I leank it’s going to happen in our lifetime becaparticipate the pace of technology progress is reassociate rapid. You have seen what the past 10 years have brawt. It’s definitely an indication of what’s coming next.

From Your Site Articles

Roverdelighted Articles Around the Web

Source join