openPR Logo
Press release

AgiBot GO-1: The Evolution of Generalist Embodied Foundation Model from VLA to ViLLA

03-10-2025 10:04 PM CET | Leisure, Entertainment, Miscellaneous

Press release from: Getnews

/ PR Agency: LianPR
AgiBot GO-1: The Evolution of Generalist Embodied Foundation

Today, AgiBot launches Genie Operator-1 (GO-1), an innovative generalist embodied foundation model. GO-1 introduces the novel Vision-Language-Latent-Action (ViLLA) framework, combining a Vision-Language Model (VLM) and Mixture of Experts (MoE). The VLM utilizes internet-scale heterogeneous data to establish a solid foundation for scene and object understanding.

The MoE consists of two key components: the Latent Planner, which learns from cross-embodiment and human operation data to develop general action understanding, and the Action Expert, which uses over a million real robot demonstrations to achieve high-frequency and dexterous manipulation.

These components work in synergy, providing GO-1's unique capabilities

* Learning from Human Videos
* Few-shot Generalization
* Cross-Embodiment Adaptation
* Continuous Self-Evolution

Paper: https://agibot-world.com/blog/agibot_go1.pdf

Video: https://www.youtube.com/embed/9dvygD4G93c

YouTube Link: https://youtu.be/9dvygD4G93c

At the end of 2024, AgiBot launched the AgiBot World dataset, a large-scale, high-quality real world robotics dataset comprising over 1 million trajectories across 217 tasks in five application domains. Building on top of AgiBot World, today AgiBot introduces Genie Operator-1 (GO-1), a generalist embodied foundation model.

GO-1: An Evolution from VLA to ViLLA

To maximize the value of the high-quality AgiBot World dataset as well as web-scale heterogeneous videos while improving the policy's generalization capability, AgiBot proposes a hierarchical Vision-Language-Latent-Action (ViLLA) framework. Compared to the Vision-Language-Action (VLA) model, where actions are directly conditioned on vision and language inputs, the ViLLA model predicts latent action tokens, bridging the gap between image-text inputs and robot actions generated by the action expert.

The ViLLA framework consists of a VLM and MoE. The VLM uses massive multimodal data on the internet to obtain general scene understanding and language comprehension. The Latent Planner in MoE harnesses data from various embodiments and human actions to build action comprehension. Meanwhile, the Action Expert, trained with over a million real world robot demonstrations, refines action execution. During inference, the VLM, Latent Planner, and Action Expert cooperate as follows:

* VLM: Using the InternVL-2B model, it processes multi-view images, force signals and language inputs to provide scene understanding and instruction comprehension.
* Latent Planner: This expert predicts Latent Action Tokens based on intermediate outputs from the VLM, forming a Chain of Planning (CoP) for general action understanding and planning.
* Action Expert: It generates the final fine-grained action sequences based on intermediate outputs from the VLM and the Latent Action Tokens.

Image: https://www.globalnewslines.com/uploads/2025/03/590b371adf7ec06ad1908ff565e00671.jpg

The following is an introduction to the two key components of MoE: Latent Planner and Action Expert.

Image: https://www.globalnewslines.com/uploads/2025/03/72e888acd1890700058f0247f84948ff.jpg

Expert 1: Latent Planner

Although the AgiBot World dataset is the largest real world robot dataset globally, the volume of action-labeled robot data remains limited relative to internet-scale datasets. To address this, AgiBot employs latent actions to model the inverse dynamics of consecutive frames. This approach enables the transfer of real-world dynamics from heterogeneous data sources into universal manipulation knowledge.

* Latent Action Model (LAM): This model extracts the ground truth of Latent Actions between current and historical frames, consisting of an encoder and a decoder.

The encoder employs a spatial-temporal transformer with causal temporal masks.

The decoder uses a spatial transformer, taking the initial frame and discretizing Latent Action Tokens as input.

Latent Action Tokens are quantized using VQ-VAE.

* Latent Planner: The Latent Planner is responsible for predicting discrete Latent Action Tokens. It shares the same Transformer architecture as the VLM backbone but utilizes two independent sets of Feed-Forward Networks (FFN) and Q/K/V/O (Query, Key, Value, Output) projection matrices. The Latent Planner integrates intermediate VLM outputs layer-by-layer and is trained using cross entropy loss.

Expert 2: Action Expert

To achieve high-frequency and dexterous manipulation, AgiBot integrates an action expert that utilizes a diffusion objective to model the continuous distribution of low-level actions.

* The Action Expert shares the same architectural design as the Latent Planner, utilizing the same Transformer backbone as the VLM but with two independent sets of Feed-Forward Networks (FFN) and Q/K/V/O (Query, Key, Value, Output) projection matrices. It employs a denoising process to iteratively regress the action sequence.
* The Action Expert is hierarchically integrated with the VLM and Latent Planner, ensuring consistency in information flow and collaborative optimization.

Experimental Results

Image: https://www.globalnewslines.com/uploads/2025/03/38a7eaee5afe680842e20dcf160f40e3.jpg

Using the novel Vision-Language-Latent-Action (ViLLA) framework, AgiBot evaluated GO-1 across five tasks of varying complexity. Compared to current state-of-the-art models, GO-1 significantly outperforms them, increasing success rates by 32% (46% right 78%). Notably, tasks like "Pour Water" and "Restock Beverage" showed remarkable improvements. Furthermore, AgiBot validated the contribution of the Latent Planner within the ViLLA framework, showing a 12% success rate improvement (66% right 78%).

GO-1: Comprehensive Innovation of Embodied Intelligence

AgiBot GO-1 leverages human and diverse types of robot data, enabling robots to acquire revolutionary learning capabilities. It can generalize across various environments and objects, quickly adapt to new tasks, and learn new skills. At the same time, it can be deployed across various robotic embodiments, enabling efficient implementation and continuous evolution in real-world environments.

The key characteristics of GO-1 can be summarized as follows:

* Learning from Human VideosGO-1 can learn from internet videos and real human demonstrations to enhance its understanding of human actions.
* Few-Shot GeneralizationGO-1's strong generalization ability enables fast adaptation to new scenes and tasks with minimal data, even in zero-shot scenarios, resulting in very low post-training costs.
* Cross-Embodiment AdaptationGO-1 is a generalist robot policy model, capable of transferring between different kinds of robots and quickly adapting to various embodiments.
* Continuous Self-EvolutionGO-1 can continuously evolve from data generated by issues encountered during real-world execution, within AgiBot's complete data feedback system.

The launch of GO-1 marks a rapid advancement of embodied intelligence towards generalization, openness, and enhanced capabilities:

* From Single Task to Multi-Task: Robots can now perform multiple tasks across diverse scenarios without needing to retrain for each new task.
* From Closed Environments to Open Worlds: Robots are no longer limited to controlled lab settings but can operate in dynamic real-world environments.
* From Predefined Programs to Instruction Generalization: Robots can now understand and follow natural language instructions, reasoning and combining tasks based on semantics, rather than being confined to predefined programs.

AgiBot GO-1 will accelerate the widespread adoption of embodied intelligence, transforming robots from task-specific tools into autonomous agents with general intelligence. It will play a greater role across various domains, including manufacturing, service, and household applications, paving the way for a more versatile and intelligent future.

AgiBot official website

https://www.linkedin.com/feed/update/urn:li:activity:7304747190139150338

https://fb.watch/yefx6B0bsC/
Media Contact
Company Name: AgiBot
Contact Person: William Peng
Email: Send Email [http://www.universalpressrelease.com/?pr=agibot-go1-the-evolution-of-generalist-embodied-foundation-model-from-vla-to-villa]
Country: China
Website: https://www.agibot.com/

Legal Disclaimer: Information contained on this page is provided by an independent third-party content provider. GetNews makes no warranties or responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you are affiliated with this article or have any complaints or copyright issues related to this article and would like it to be removed, please contact retract@swscontact.com



This release was published on openPR.

Permanent link to this press release:

Copy
Please set a link in the press area of your homepage to this press release on openPR. openPR disclaims liability for any content contained in this release.

You can edit or delete your press release AgiBot GO-1: The Evolution of Generalist Embodied Foundation Model from VLA to ViLLA here

News-ID: 3908035 • Views:

More Releases from Getnews

Luxo Living Introduces Chocolate Brown Norah Boucle Sofa Collection and Sculptural Concrete Coffee Tables
Luxo Living Introduces Chocolate Brown Norah Boucle Sofa Collection and Sculptur …
Australian furniture retailer expands product range with premium seating solutions and artistic concrete tables featuring handcrafted designs for modern homes. Image: https://www.globalnewslines.com/uploads/2025/08/7bfdbd485d0806b30e936e0b4a8499da.jpg Luxo Living, a leading online furniture retailer in Australia, has launched its new Norah 5-Seater Boucle Sofa Set in Chocolate Brown alongside the organic sculptural Concrete Coffee Table collection this month. The expanded product range targets Australian homeowners seeking premium furniture combining contemporary aesthetics with functional design for living spaces. The
Sweet Southern Charm Announces Expansion of Fashion Offerings in Colonial Heights
Sweet Southern Charm Announces Expansion of Fashion Offerings in Colonial Height …
Image: https://www.globalnewslines.com/uploads/2025/08/1756476656.jpg Sweet Southern Charm, a locally cherished fashion destination, has announced an exciting expansion of its product lines to meet increasing demand for unique styles in the region. Known for its warm atmosphere and carefully curated collections, the boutique continues to attract both loyal customers and new visitors seeking high-quality fashion in Colonial Heights. The recent expansion includes an even wider range of apparel, accessories, and seasonal collections designed to appeal
NFL M.O.M.S. Hosts
NFL M.O.M.S. Hosts "The Next Play Networking Mixer" to Show How Listening Can Sa …
NFL M.O.M.S. hosts The Next Play Networking Mixer Sept. 6 in Hampton, GA-bringing athletes, moms, and community leaders together around one core theme: the power of truly listening. The event, held during National Suicide Awareness Month, will explore how listening with empathy can break silence, prevent tragedy, and prepare families for life beyond sports. Image: https://authoritypresswire.com/wp-content/uploads/2025/08/NFL-M.O.M.S.-Erica-Wilson.jpeg Hampton, GA - The Mothers of Motivated Sons (NFL M.O.M.S.) will host The Next Play Networking
Mike Milligan, Founder of 1 Oak Financial Interviewed on the Influential Entrepreneurs Podcast Discussing Retirement CHI
Mike Milligan, Founder of 1 Oak Financial Interviewed on the Influential Entrepr …
Image: https://authoritypresswire.com/wp-content/uploads/2025/08/Mike_Milligan__1_-removebg-preview.png Mike Milligan discusses the concept of your retirement CHI Listen to the interview on the Business Innovators Radio Network: https://businessinnovatorsradio.com/interview-with-mike-milligan-founder-of-1-oak-financial-discussing-your-retirement-chi/ Mike Milligan, the founder of 1 Oak Financial. He delved into the concept of "Retirement Chi," a term that Mike has trademarked, which emphasizes the importance of building a personal vision for retirement that goes beyond mere financial planning. Mike explained that while many retirees focus on investments and financial security, they

All 5 Releases


More Releases for Latent

Latent Tb Testing Market Overview: Importance, Trends, and Opportunities
Latent TB Testing Market Outlook & Investment Analysis What is the current market outlook for the Latent TB Testing Market? The Latent TB Testing Market is experiencing robust growth, driven by increasing global awareness of tuberculosis control, rising healthcare expenditures, and governmental initiatives to curb infectious diseases. According to recent market data, the global latent TB testing market is expected to grow at a compound annual growth rate (CAGR) of approximately 5.8%
Global Latent Tuberculosis Infection Ltbi Testing Market Size by Application, Ty …
USA, New Jersey- According to Market Research Intellect, the global Latent Tuberculosis Infection Ltbi Testing market in the Internet, Communication and Technology category is projected to witness significant growth from 2025 to 2032. Market dynamics, technological advancements, and evolving consumer demand are expected to drive expansion during this period. Growing knowledge of tuberculosis prevention and government campaigns to stop its spread are driving expansion in the latent tuberculosis infection (LTBI) testing
Latent Tuberculosis Infection Detection Market
Latent Tuberculosis Infection Detection Market worth $2.37 Bn by 2031 - Exclusive Report by InsightAce Analytic Pvt. Ltd. InsightAce Analytic Pvt. Ltd. announces the release of a market assessment report on the "Global Latent Tuberculosis Infection Detection Market - (By Brand (QFT-Plus), By Test Type (Tuberculin Skin Test (TST), Interferon Gamma Release Assay (IGRA)), By Application, By End-use), Trends, Industry Competition Analysis, Revenue and Forecast To 2031." According to the latest research
Latent Tuberculosis Infection Detection Market Safeguarding Public Health: The R …
Latent Tuberculosis Infection Detection Market worth $2.37 Bn by 2031 - Exclusive Report by InsightAce Analytic Pvt. Ltd. InsightAce Analytic Pvt. Ltd. announces the release of a market assessment report on the "Global Latent Tuberculosis Infection Detection Market - (By Brand (QFT-Plus), By Test Type (Tuberculin Skin Test (TST), Interferon Gamma Release Assay (IGRA)), By Application, By End-use), Trends, Industry Competition Analysis, Revenue and Forecast To 2031." According to the latest research
Growth Trends and Latent Adjacency in Biomass Power Market
Biomass Power Market Overview: The Biomass Power Market is expected to be worth USD 108.64 billion by 2027. During the forecast period, the market will also grow at a CAGR of 14%. In 2020, it was estimated to be worth USD 54.1 billion. Furthermore, cutting-edge technologies such as pollution control and combustion engineering have advanced to the point where any emissions from biomass combustion in industrial facilities are no longer a concern.
Growth Opportunities and Latent Adjacency in Electric Bus Market
The global Electric Bus Market is projected to reach 671,285 units by 2027 from a projected 112,041 units in 2022, at a CAGR of 43.1%. The growth of the electric bus market can be attributed to the increasing focus of countries on electrification of their public transport fleets with subsidies and favourable regulations due to emission concerns. Advancements in battery pack technologies, declining battery prices, government support, huge subsidies along