openPR Logo
Press release

Agibot Released the Industry First Open-Source Robot World Model Platform - Genie Envisioner

10-17-2025 02:30 PM CET | IT, New Media & Software

Press release from: King Newswire

Agibot Released the Industry First Open-Source Robot World

Shanghai, China, 17th Oct 2025 -- Recently, Agibot has officially launched Genie Envisioner (GE), a unified world model platform for real-world robot control. Departing from the traditional fragmented pipeline of data-training-evaluation, GE integrates future frame prediction, policy learning, and simulation evaluation for the first time into a closed-loop architecture centered on video generation. This enables robots to perform end-to-end reasoning and execution--from seeing to thinking to acting--within the same world model. Trained on 3,000 hours of real robot data, GE-Act not only significantly surpasses existing State-of-The-Art (SOTA) methods in cross-platform generalization and long-horizon task execution but also opens up a new technical pathway for embodied intelligence from visual understanding to action execution.

Current robot learning systems typically adopt a phased development model--data collection, model training, and policy evaluation--where each stage is independent and requires specialized infrastructure and task-specific tuning. This fragmented architecture increases development complexity, prolongs iteration cycles, and limits system scalability. The GE platform addresses this by constructing a unified video-generative world model that integrates these disparate stages into a closed-loop system. Built upon approximately 3,000 hours of real robot manipulation video data, GE establishes a direct mapping from language instructions to the visual space, preserving the complete spatiotemporal information of robot-environment interactions.
01/ Core Innovation: A Vision-Centric World Modeling Paradigm
The core breakthrough of GE lies in constructing a vision-centric modeling paradigm based on world models. Unlike mainstream Vision-Language-Action (VLA) methods that rely on Vision-Language Models (VLMs) to map visual inputs into a linguistic space for indirect modeling, GE directly models the interaction dynamics between the robot and the environment within the visual space. This approach fully retains the spatial structures and temporal evolution information during manipulation, achieving more accurate and direct modeling of robot-environment dynamics. This vision-centric paradigm offers two key advantages:

Efficient Cross-Platform Generalization Capability: Leveraging powerful pre-training in the visual space, GE-Act requires minimal data for cross-platform transfer. On new robot platforms like the Agilex Cobot Magic and Dual Franka, GE-Act achieved high-quality task execution using only 1 hour (approximately 250 demonstrations) of teleoperation data. In contrast, even models like p0 and GR00T, which are pre-trained on large-scale multi-embodiment data, underperformed GE-Act with the same amount of data. This efficient generalization stems from the universal manipulation representations learned by GE-Base in the visual space. By directly modeling visual dynamics instead of relying on linguistic abstractions, the model captures underlying physical laws and manipulation patterns shared across platforms, enabling rapid adaptation.

Accurate Execution Capability for Long-Horizon Tasks: More importantly, vision-centric modeling endows GE with powerful future spatiotemporal prediction capabilities. By explicitly modeling temporal evolution in the visual space, GE-Act can plan and execute complex tasks requiring long-term reasoning. In ultra-long-step tasks such as folding a cardboard box, GE-Act demonstrated performance far exceeding existing SOTA methods. Taking box folding as an example, this task requires the precise execution of over 10 consecutive sub-steps, each dependent on the accurate completion of the previous ones. GE-Act achieved a 76% success rate, while p0 (specifically optimized for deformable object manipulation) reached only 48%, and UniVLA and GR00T failed completely (0% success rate). This enhancement in long-horizon execution capability stems not only from GEs visual world modeling but also benefits from the innovatively designed sparse memory module, which helps the robot selectively retain key historical information, maintaining precise contextual understanding in long-term tasks. By predicting future visual states, GE-Act can foresee the long-term consequences of actions, thereby generating more coherent and stable manipulation sequences. In comparison, language-space-based methods are prone to error accumulation and semantic drift in long-horizon tasks.
02/ Technical Architecture: Three Core Components
Based on the vision-centric modeling concept, the GE platform consists of three tightly integrated components:

GE-Base: Multi-View Video World Foundation Model: GE-Base is the core foundation of the entire platform. It employs an autoregressive video generation framework, segmenting output into discrete video chunks, each containing N frames. The models key innovations lie in its multi-view generation capability and sparse memory mechanism. By simultaneously processing inputs from three viewpoints (head camera and two wrist cameras), GE-Base maintains spatial consistency and captures the complete manipulation scene. The sparse memory mechanism enhances long-term reasoning by randomly sampling historical frames, enabling the model to handle manipulation tasks lasting several minutes while maintaining temporal coherence.

Training uses a two-stage strategy: first, temporal adaptation training (GE-Base-MR) with multi-resolution sampling at 3-30Hz makes the model robust to different motion speeds; subsequently, policy alignment fine-tuning (GE-Base-LF) at a fixed 5Hz sampling rate aligns with the temporal abstraction of downstream action modeling. The entire training was completed in about 10 days using 32 A100 GPUs on the AgiBot-World-Beta dataset, comprising approximately 3,000 hours and over 1 million real robot data instances.

GE-Act: Parallel Flow Matching Action Model: GE-Act serves as a plug-and-play action module, converting the visual latent representations from GE-Base into executable robot control commands through a lightweight architecture with 160M parameters. Its design cleverly parallels GE-Bases visual backbone, using DiT blocks with the same network depth as GE-Base but smaller hidden dimensions for efficiency. Via a cross-attention mechanism, the action pathway fully utilizes semantic information from visual features, ensuring generated actions align with task instructions.

GE-Acts training involves three stages: action pre-training projects visual representations into the action policy space; task-specific video adaptation updates the visual generation component for specific tasks; task-specific action fine-tuning refines the full model to capture fine-grained control dynamics. Notably, its asynchronous inference mode is key: the video DiT runs at 5Hz for single-step denoising, while the action model runs at 30Hz for 5-step denoising. This slow-fast two-layer optimization enables the system to complete 54-step action inference in 200ms on an onboard RTX 4090 GPU, achieving real-time control.

GE-Sim: Hierarchical Action-Conditioned Simulator: GE-Sim extends GE-Bases generative capability into an action-conditioned neural simulator, enabling precise visual prediction through a hierarchical action conditioning mechanism. This mechanism includes two key components: Pose2Image conditioning projects 7-degree-of-freedom end-effector poses (position, orientation, gripper state) into the image space, generating spatially aligned pose images via camera calibration; Motion vectors calculate the incremental motion between consecutive poses, encoded as motion tokens and injected into each DiT block via cross-attention.

This design allows GE-Sim to accurately translate low-level control commands into visual predictions, supporting closed-loop policy evaluation. In practice, action trajectories generated by the policy model are converted by GE-Sim into future visual states; these generated videos are then fed back to the policy model to produce the next actions, forming a complete simulation loop. Parallelized on distributed clusters, GE-Sim can evaluate thousands of policy rollouts per hour, providing an efficient evaluation platform for large-scale policy optimization. Furthermore, GE-Sim also acts as a data engine, generating diverse training data by executing the same action trajectories under different initial visual conditions.

These three components work closely together to form a complete vision-centric robot learning platform: GE-Base provides powerful visual world modeling capabilities, GE-Act enables efficient conversion from vision to action, and GE-Sim supports large-scale policy evaluation and data generation, collectively advancing embodied intelligence.
EWMBench: World Model Evaluation Suite
Additionally, to evaluate the quality of world models for embodied tasks, the team developed the EWMBench evaluation suite alongside the core GE components. It provides comprehensive scoring across dimensions including scene consistency, trajectory accuracy, motion dynamics consistency, and semantic alignment. Subjective ratings from multiple experts showed high consistency with GE-Bench rankings, validating its reliability for assessing robot task relevance. In comparisons with advanced models like Kling, Hailuo, and OpenSora, GE-Base achieved top results on multiple key metrics reflecting visual modeling quality, aligning closely with human judgment.
Open-Source Plan & Future Outlook
The team will open-source all code, pre-trained models, and evaluation tools. Through its vision-centric world modeling, GE pioneers a new technical path for robot learning. The release of GE marks a shift for robots from passive execution towards active imagine-verify-act cycles. In the future, the platform will be expanded to incorporate more sensor modalities, support full-body mobility and human-robot collaboration, continuously promoting the practical application of intelligent manufacturing and service robots.

Contact Details

Organization: Shanghai Zhiyuan Innovation Technology Co., Ltd.

Contact Person: Jocelyn Lee

Website: https://www.zhiyuan-robot.com

Email: Send Email [https://dashboard.kingnewswire.com/release-contact/35600]

City: Shanghai

Country: China

Release Id: 17102535600

The post Agibot Released the Industry First Open-Source Robot World Model Platform - Genie Envisioner appeared first on King Newswire. This content is provided by a third-party source. King Newswire is a press release distribution agency. We do not accept any responsibility or liability for the accuracy, content, images, videos, licences, completeness, legality, or reliability of the information contained in this article. If you have any complaints or copyright concerns related to this article, please contact the company listed in the 'Media Contact' section above.

Legal Disclaimer: Information contained on this page is provided by an independent third-party content provider. king Newswire makes no warranties or responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you are affiliated with this article or have any complaints or copyright issues related to this article and would like it to be removed, please contact support@kingnewswire.com

This release was published on openPR.

Permanent link to this press release:

Copy
Please set a link in the press area of your homepage to this press release on openPR. openPR disclaims liability for any content contained in this release.

You can edit or delete your press release Agibot Released the Industry First Open-Source Robot World Model Platform - Genie Envisioner here

News-ID: 4229509 • Views:

More Releases from King Newswire

Genie Trailblazer Program by Agibot Empowers Embodied Intelligence Technology
Genie Trailblazer Program by Agibot Empowers Embodied Intelligence Technology
Shanghai, China, 17th Oct 2025 -- Recently, Agibot officially launched the "Genie Trailblazer Global Recruitment Program", aiming to invite top researchers worldwide to collaboratively define and create artificial intelligence capable of perceiving, reasoning, and emotionally engaging with the physical world. This initiative provides a platform for scientific collaboration, focusing on three core research directions: general-purpose embodied intelligence models, embodied world models, and advanced teleoperation. Agibot warmly welcomes researchers from diverse backgrounds,
Mobiniti Dedicated Toll-Free Numbers Handle IRS Negotiation Queries
Mobiniti Dedicated Toll-Free Numbers Handle IRS Negotiation Queries
United States, 17th Oct 2025 - Mobiniti's dedicated toll-free numbers are transforming how businesses manage IRS negotiation queries by offering a simple and effective communication tool. Through seamless integration with SMS platforms, businesses can provide clients with immediate access to dedicated lines, ensuring they receive expert assistance on tax-related matters without unnecessary delays. This solution improves customer satisfaction while streamlining the process of handling complex financial inquiries. How Toll-Free Numbers Enhance
NordFX Wins Best Fastest Payout Award at Forex Expo Dubai 2025
NordFX Wins Best Fastest Payout Award at Forex Expo Dubai 2025
NordFX has received the Best Fastest Payout Award at Forex Expo Dubai 2025, held on October 6-7 at the Dubai World Trade Centre. The recognition highlights the company's success in delivering rapid, reliable withdrawals through advanced automation, including its 24/7 automatic crypto withdrawal system. This innovation allows traders to access funds within minutes, reinforcing NordFX's reputation for efficiency and trust. The award reflects the broker's ongoing investment in technology and
NordFX Honored for Fastest Payouts at Forex Expo Dubai 2025
NordFX Honored for Fastest Payouts at Forex Expo Dubai 2025
NordFX has been awarded the title of Best Fastest Payout at Forex Expo Dubai 2025, acknowledging the company's commitment to providing traders with one of the industry's quickest and most reliable withdrawal experiences. The recognition comes as a result of NordFX focus on automation, transparency, and continuous improvement in payment processing. The Forex Expo Dubai, held on October 6-7, 2025, at the Dubai World Trade Centre, remains one of the most

All 5 Releases


More Releases for Model

Animal Model Market
The Global Animal Model Market, detailed in the TechSci Research report titled, "Animal Model Market - Global Industry Size, Share, Trends, Competition Forecast & Opportunities, 2028," exhibited a value of USD 1.67 billion in 2022, poised to grow at a Compound Annual Growth Rate (CAGR) of 7.64% from 2024 to 2028. This comprehensive analysis delves into the intricate web of factors driving the market, including the global burden of chronic
Executing a Business Model Transformation with A Business Model Canvas
Business model transformation is the process of changing the way a company operates, with the aim of improving its overall performance. This transformation often involves significant changes to a company's existing business model. To ensure that the transformation process is successful, companies can use a tool called the business model canvas. Get a free business model canvas here: https://digitalleadership.com/unite-articles/extended-business-model-canvas/ Steps in Executing a Business Model Transformation with A Business Model Canvas Business
Fourth Party Logistics Market : Industry Innovator Model, Solution Integrator Mo …
The fourth party logistics market was valued at $57.9 billion in 2021, and is estimated to reach $111.7 billion by 2031, growing at a CAGR of 6.7% from 2022 to 2031. Fourth party logistics, popularly known as 4PL, is the model of outsourcing of logistics operations, where the service provider integrates with the company's supply chain department. This logistics partner is responsible for assessing, designing, building, running, and measuring integrated supply
How Drizly Works: Business Model & Revenue Model
‘Slow and steady wins the race.’ Such is the case with the world’s leading alcohol marketplace, Drizly which carved a unique, step-by-step growth strategy for itself and eventually, succeeded in transforming the way alcohol is bought and sold across various continents. Co-founded in 2012 by Cory Rellas and Nick Rellas, the Boston-based start-up which was one of the first few on-demand liquor delivery apps to make its way into the market,
Animal Model Market trends, Animal Model Market growth, Animal Model Market size …
Key Findings of Animal Model Market Developing Regions to Overpower Developed Ones With Regard to Demand The Asian territory is on the verge of evolving as the next big destination for animal models. This growth can be attributed to various factors such as American and European pharmaceutical companies diversifying their research activities to curtail extra expenses and save costs. Although U.S and Europe have retained their legacy in the global market, their
Cloud Computing in Healthcare Market By Deployment Model (Public- cloud deployme …
Industry Outlook and Trend Analysis The Cloud Computing in Healthcare Market was worth USD 3.05 billion in 2014 and is expected to reach approximately USD 16.25 billion by 2023, while registering itself at a compound annual growth rate (CAGR) of 20.43% during the forecast period. Cloud computing, commonly alluded to as 'the cloud', is a technique to access and store the information and projects over the internet. The cloud computing is