AI systems are like three-year-olds who see the world around them and try to understand it. If you give them the right training wheels, they can put together a perfect photo album entirely unaided, says Fei-Fei Lee.

The fact that some AI systems can “see” the world is thanks to researcher Fei-Fei Li. For more than 15 years, she has drilled down into the topic, first at the California Institute of Technology and subsequently as director of the Stanford Artificial Intelligence and Stanford Vision Labs. In this capacity, Li was largely responsible for the development of ImageNet, a huge database of images used to help software learn more quickly what a table, animal or human looks like. The technology sets the standard for the major photo-sharing services.


As of fall 2016, the Chinese scientist has headed up the Google Cloud AI and Machine Learning research group. Their aim is to give users access to artificial intelligence as and when needed, in much the same way as storage and computing capacity are available on demand today.


Li likes to compare a contemporary AI system’s performance with the intellectual curiosity of a three-year-old child who can initially only name simple objects but advances rapidly from there. “If you consider a child’s eyes as a pair of biological cameras, they take one picture about every 200 milliseconds. So by age three, a child will have seen hundreds of millions of pictures of the real world.”


Li and her team applied the same technique when teaching computers to not only develop an understanding of objects and people based on a billion images but also to take the next step and associate the  visual information with words. With the result that the software can use a full sentence to accurately describe an unknown picture.


The road to intelligent machine vision is a long and winding one, says Li. “We have prototyped cars that can drive by themselves, but without smart vision, they cannot really tell the difference between a crumpled paper bag on the road, which can be run over, and a rock that size, which should be avoided. We have made fabulous megapixel cameras, but we have not delivered sight to the blind. Security cameras are everywhere, but they do not alert us when a child is drowning in a swimming pool.”


Fei-Fei Li’s vision is a symbiotic relationship between man and machine.“Photos and videos are becoming an integral part of global life. First, we teach machines to see. Then, they help us to see better. We will not only use the machines for their intelligence, we will also collaborate with them in ways that we cannot even imagine.”

Seeing is understanding.