Aided by AI language models, Google’s robots are getting smart

Aone-armed robot stood in front of a table. On the table sat three plastic figurines: a lion, a whale and a dinosaur. An engineer gave the robot an instruction: “Pick up the extinct animal.”

The robot whirred, then its arm extended and its claw opened. It grabbed the dinosaur.

Until very recently, this would have been impossible. Robots weren’t able to reliably manipulate objects they hadn’t seen before, and certainly weren’t capable of making the logical leap from “extinct animal” to “plastic dinosaur”.

A quiet revolution is underway in robotics, one that piggybacks on recent advances in so-called large language models. Google has begun plugging state-of-the-art language models into its robots, giving them the equivalent of artificial brains. I got a glimpse of that progress during a private demonstration of Google’s latest robotics model, called RT-2.

“We’ve had to reconsider our entire research programme as a result of this change,” said Vincent Vanhoucke, Google DeepMind’s head of robotics. “A lot of the things that we were working on before have been invalidated.”

Robots still fall short of human-level dexterity and fail at some basic tasks, but Google’s use of AI language models to give robots new skills of reasoning and improvisation represents a promising breakthrough, said Ken Goldberg, a robotics professor at the University of California, Berkeley, US. “What’s very impressive is how it links semantics with robots,” he said.

For years, the way engineers at Google and other companies trained robots to do a mechanical task — flipping a burger, for example — was by programming them with a specific list of instructions. Robots would then practise the task again and again, with engineers tweaking the instructions each time until they got it right.

But training robots this way is slow and labour-intensive. It requires collecting lots of data from real-world tests. And if you wanted to teach a robot to do something new — to flip a pancake instead of a burger, say — you usually had to reprogram it from scratch.

Researchers at Google had an idea. What if, instead of being programmed for specific tasks one by one, robots could use an AI language model — one that had been trained on vast swaths of Internet text — to learn new skills?

“We started playing with these language models around two years ago, and then we realised that they have a lot of knowledge in them,” said Karol Hausman, a Google research scientist.

Google’s first attempt to join language models and physical robots was a research project called PaLM-SayCan. But its usefulness was limited. The robots lacked the ability to interpret images — a crucial skill, if you want them to be able to navigate the world.

Google’s new robotics model, RT-2, can do just that. It’s what the company calls a “vision-language-action” model, or an AI system that has the ability not just to see and analyse the world around it, but to tell a robot how to move.

It does so by translating the robot’s movements into a series of numbers — a process called tokenising — and incorporating those tokens into the same training data as the language model. Eventually, just as ChatGPT or Bard learns to guess what words should come next in a poem or an essay, RT-2 can learn to guess how a robot’s arm should move to pick up a ball or throw an empty soda can into the recycling bin.

In an hourlong demonstration, which took place in a Google office kitchen littered with objects from a dollar store, my podcast co-host and I saw RT-2 perform a number of impressive tasks. One was successfully following complex instructions such as “move the Volkswagen to the German flag,” which RT-2 did by finding and snagging a model VW Bus and setting it down on a miniature German flag several feet away.

It could also follow instructions in languages other than English, and make abstract connections between related concepts. When I wanted it to pick up a soccer ball, I said “pick up Lionel Messi”. RT-2 got it right on the first try.

“This really opens up using robots in environments where people are,” Vanhoucke said. “In office environments, in home environments, in all the places where there are a lot of physical tasks.”

Aided by AI language models, Google’s robots are getting smart

Google has begun plugging state-of-the-art language models into its robots, giving them equivalent of artificial brains

RELATED TOPICS

Punjab: 19-year-old youth beaten to death over 'sacrilege' charge at gurdwara in Ferozepur

Trudeau touts Canada as a 'rule-of-law country' after arrest of Indians in Nijjar case

Was keen on alliance with Congress, even went to meet Rahul at 6 am: Abhishek

Rahul Gandhi filed the nomination from Rae Bareli by snatching the rights of his sister

Congress names Jay Narayan Patnaik as Puri candidate after Sucharita Mohanty pulls out over funds

Indian Army chopper makes precautionary landing in Maharashtra's Sangli; no casualty

16-year-old boy shot dead by Australian police after stabbing with 'hallmarks' of terrorism

14-year-old girl in Chhattisgarh kills elder brother for stopping her from using mobile phone

Firing outside Salman Khan's house: Family of accused who died in custody moves HC

‘Cash crunch’ forces Congress candidate for Puri Lok Sabha Sucharita Mohanty to return party ticket

A blurry solar vision in Sanchi: Town ‘powerless’ despite launch of ambitious project

Lost in Mumbai, found in Bengal: A grandmother's vote