GO-1 is based on vision-language models that feed massive amounts of images and videos to the robots so they can better understand human actions. The algorithms for planning and action help the ...