Exploring the Frontier of Embodied AI: Virtual and Real Worlds Collide

A highly intelligent robot as imagined by Midjourney

There has been a significant amount of recent news regarding embodied AI. In the realm of artificial intelligence, an embodied agent refers to an intelligent entity that interacts with its environment through a physical body within said environment. Agents represented graphically with a body, such as a digital avatar or a cartoon animal, are also referred to as embodied agents, even though they possess a virtual rather than a physical embodiment, as explained by Wikipedia.

According to the Financial Times, Meta Platforms is currently developing digital AI agents with distinct personalities. These agents can serve as coaches, facilitators, or conversational partners, while also collecting more personalized user data for Meta. Mark Zuckerberg is contemplating the incorporation of such AI agents into the metaverse, envisioning them as highly intelligent non-player characters.

There is also news about physically embodied AI-agents, in the form of intelligent robots which can generalize knowledge and skills, transferring abilities from one context to the other. Google DeepMind presented RT-2, a model which “translates vision and language into action”. The model also incorporates data from others robots which worked in a company kitchen, those data were “tokenized”, which means they are transformed in numerical form and then integrated in the model as a kind of language. This step is crucial as it enables the robot to move around efficiently, extending beyond mere communication and understanding of instructions.

In conclusion, we are on the verge of encountering intelligent creatures in various forms—virtual space, mixed reality (where digital avatars can coexist), and even in physical embodiments within fully physical environments. While we have already encountered non-human intelligent creatures known as ‘animals,’ we are now venturing into the realm of building our very own intelligent entities from scratch. Perhaps it is time to expand our definition of what an “avatar” can be.

The metaverse as spatial computing

Image generated by Midjourney. Woman playing chess in a mixed reality setting.

Watching the Apple presentation of their headset, Vision Pro, made me ponder different perspectives on our digital future. Is it a metaverse in the sense of an interconnected universe of 3D environments where we perform various activities as avatars? This metaverse would be persistent, possess an economy, and allow the co-presence of millions of avatars. Let’s consider this the more traditional conception of the metaverse.

Alternatively, is it simply our physical environment where we interact with both physical and virtual objects, observe virtual screens, and encounter photorealistic faces? Would it involve interaction with other humans in a sort of hologram form, and with AI agents embodied as avatars or physical entities? This perspective more closely aligns with Apple’s view of the metaverse, which they would prefer to call ‘spatial computing’.

While Apple is keen to differentiate itself from Meta Platforms, their visions of our digital futures may not be that dissimilar. Both are developing glasses that appear ordinary but could enable us to spend extensive periods in a mixed reality, a world where digital and physical elements blend seamlessly. Their approaches, however, differ. Apple enjoys portraying the “real world” enriched with virtual elements, while Meta initially focused on a virtual environment replete with games. Nonetheless, even the upcoming Quest 3 is preparing us for a future more akin to mixed reality. On the other hand, the Vision Pro allows also for fully immersive virtual experiences. Headsets will evolve into glasses which will allow mixed reality and fully immersive experiences. But chances are that most of the time we’ll be in a mixed reality.

Fans of NFTs and web3 need not despair; their digital art will look striking in mixed reality. Those more inclined to experiment with avatars and identities will still have the freedom to do so. After a number of years, for those who will become native citizens of this mixed world, terms like “metaverse” and even “mixed reality” may sound dated. It will simply be the world, their world.

Read also this post on New World Notes: Vision Pro Developer Avi Bar-Zeev On Apple’s Spatial Computing / Personas Model Vs. The Metaverse Vs. Metaverse/Avatars Vision

AI, avatars and the end of humanity

More than 350 AI experts and industry leaders have issued a warning, expressing concerns that AI has the potential to cause the extinction of humanity. This is the message as published by the Center for AI Security (CAIS):

“Mitigating the risk of extinction from AI should be a global priority, along with other societal-scale risks such as pandemics and nuclear war.”

Among the signatories of this message is Sam Altman, the CEO of OpenAI, the company behind ChatGPT. Other prominent figures from OpenAI, as well as experts from partner company Microsoft, have also endorsed the message. Notably, competitors like Google and Anthropic, as well as renowned academic experts, have joined in as well.

The message doesn’t provide specific examples of how AI could lead to our extinction. However, CAIS’s website highlights various dangers associated with AI, ranging from immediate concerns like misinformation and manipulation to potential future risks, such as rogue systems that could mislead humans and pursue their own objectives, ultimately posing a significant threat to humanity.

Believers in artificial general intelligence (AGI), including Altman, envision a scenario where superintelligences exponentially and autonomously surpass human capabilities, potentially leaving us behind.

However, not everyone shares this view. Chief AI scientist Yann LeCun of Meta argues on Twitter, “Super-human AI is nowhere near the top of the list of existential risks, largely because it doesn’t exist yet. Until we have a basic design for even dog-level AI (let alone human level), discussing how to make it safe is premature.”

Expert Andrew Ng highlights various existential risks, such as the next pandemic, climate change, and the threat of massive depopulation, even mentioning the possibility of another asteroid impact. He believes that AI will play a crucial role in addressing these challenges and suggests that accelerating AI development, rather than slowing it down, is essential for the survival and thriving of humanity over the next thousand years.


On the other hand, waiting for AI to make significant advancements before taking action might be too risky. Considering potential future scenarios doesn’t necessarily imply slowing down AI development.

Yuval Harari, an Israeli historian, argues that AI and AGI have the ability to “hack” humanity’s operating system, which is natural language. As humans engage in emotional relationships and discussions with AI agents, these systems gain insights into our behavior and become more proficient in language and argumentation.

In a previous post, I discussed embodied artificial intelligence in the metaverse, leading me to believe that the intimacy we feel when conversing with AI chatbots will only intensify. Harari asserts that AI doesn’t require sentience or a Terminator-like physical form to become highly dangerous and manipulative. While this may hold true, having an avatar representation, whether it’s 2D, 3D, or physical, could make AI even more influential and raise ethical concerns.

Embodied AI: The Next Wave of Artificial Intelligence and the Metaverse

Women seems to pass a screen with a humanoid being
Who is the AI-agent and who the human? Image created on Playground AI

Nvidia’s Jensen Huang delivered an inspiring video message at the recent ITF World conference in Antwerp, Belgium. He explained that the first wave of AI focused on learning perception, while the second wave revolves around understanding and generating information. According to Huang, this new wave is revolutionary due to its ease of use and incredible capabilities. More than a thousand startups are currently inventing new applications for generative AI.

Huang envisions the next wave of AI as embodied AI, referring to intelligent systems that can understand, reason about, and interact with the physical world. He provided examples such as robotics, autonomous vehicles, and even chatbots that are smarter because they possess an understanding of the physical world.


This concept led me to contemplate avatars. In Sanskrit, the term “avatar” literally means “descent.” According to Wikipedia, it signifies the material appearance or incarnation of a powerful deity, goddess, or spirit on Earth. In a virtual environment, an avatar represents a human being. However, artificial intelligence can also manifest as a digital avatar in a virtual world or as a physical avatar in the real world. As an embodied AI, it can learn by interacting in the physical world and acquire new meaning for us. Rather than being a relatively limited and specialized robot, it would possess a more generalized intelligence. We catch glimpses of this potential when we interact with our new generative AI helpers.

When discussing the “physical world” in this context, we are actually referring to a physical world enhanced with digital objects. It will be a mixed reality world where digital objects interact with physical ones, and we can manipulate them. These digital objects can persist alongside physical structures, existing not only when we look at them or use a specific device.


On the other hand, avatars of fellow human beings in virtual worlds will not necessarily be cartoonish. They could very well resemble highly realistic holograms in virtual environments, either fully immersing us or allowing us to maintain a connection with our physical surroundings. From these virtual worlds, we will be able to interact with and manipulate the “physical” reality, potentially with the assistance of embodied avatars powered by artificial intelligence.

In summary, the concept of the metaverse expands to such an extent that it merges with “reality” itself, erasing the boundaries between the virtual and the physical, fully intertwining them. As we reach this point, the term “metaverse” will no longer be used; it will merely serve as a representation of how humans once perceived these ideas.

Navigating the Avalanche: User-Generated Content and the Future of Virtual Worlds

Media confusion, image created by Bing Image Creator

User-generated content is a significant trend in virtual worlds and its importance only appears to be increasing. In a recent development, Epic Games has offered its Unreal Editor for Fortnite to Fortnite creators. Users of Unreal Engine, the famous game engine developed by Epic Games, will find many familiar tools and workflows as well as some key differences and limitations not present in Unreal Engine, as explained in Epic’s blog post. Epic Games is also creating a marketplace and an economic model for its creators, with the intention of maintaining an open ecosystem where creators can sell their products elsewhere.

This move helps broaden Fortnite’s appeal not just for individual creators and small studios, but also for major brands seeking to launch their own unique experiences. Potential connections can be made with major trends such as web3 and generative AI. Non-fungible tokens (NFTs) facilitate ownership, while generative AI simplifies the use of creative tools in virtual environments, making the process more efficient. The practice of equipping creators with sophisticated tools is evolving quickly and is expected to see significant advancements in the coming years.

The medium of virtual worlds has evolved beyond a one-way channel serving prepackaged experiences. Instead, it is now an interactive platform that allows users to become co-creators. However, examining user-generated worlds such as Roblox, Fortnite Creative, Rec Room, and Second Life, it appears that only a minority of people are currently creating new content. This minority is set to grow exponentially as content creation becomes more straightforward. People have a natural inclination towards creation, as demonstrated by platforms like Instagram, TikTok, and YouTube, to name a few. However, the process needs to be accessible (i.e., not requiring professional coding skills) and enjoyable. Generative AI and improved interfaces will simplify the setup of 3D environments.


For content consumers, the challenge will be discovering intriguing material amid the avalanche of digital creations. Interesting content and niche communities will be dispersed across numerous platforms. Major brands and social media influencers will have an edge, but what about content that might be fascinating yet lacks the push from significant social media accounts? Internet users often belong to subcultures and specialized communities, but even there, curation is becoming more difficult. AI might offer a solution, but will we understand how these recommendation engines work and be able to adjust them accordingly?

Content producers may struggle to get noticed. For the majority, reaching a mass audience will be unattainable. However, it may be sufficient to find a smaller group of intensely engaged followers. Perhaps web3 can assist in finding a business model that utilises NFTs for token gating, reserving content for those who hold a particular NFT or a collection of NFTs.

How will the intersection of metaverse, web3 and generative AI look like?

After a period of dormancy, this long-standing blog, founded in 2008, is being resurrected. In the past, our focus centered on the exploration of virtual worlds, with particular attention paid to Second Life and the burgeoning field of online education epitomized by the advent of Massive Open Online Courses (MOOCs)…. Continue reading

The history of the future

I’m reading The History of the Future, a book written by Blake J. Harris, author of Console Wars. It contains a foreword by Ernest Cline, author of Ready Player One, the book which used to be or still is required reading for all those working at Oculus VR. The book… Continue reading

Back to the early days of the web with Docker and Jupyter Notebook

Since I love all things virtual, I really looked forward to the video conversation between Stephen Downes, the facilitator of our course E-learning 3.0 (#el30), and Tony Hirst from the Open University in the UK about virtual boxes and containers. The Open University is all about distance learning. People bring… Continue reading

An Experience API for learning everywhere (also in virtual worlds)

I never heard about Experience API (xAPI) but now I did thanks to a video conversation between education expert Stephen Downes and Shelly Blake-Plock, Co-Founder, President and CEO of Yet Analytics in the course #el30. xAPI is a new specification for learning technology that makes it possible to collect data… Continue reading