Press release

Z.ai Launches GLM-4.5V: Open-source Vision-Language Model Sets New Bar for Multimodal Reasoning

08-14-2025 12:08 AM CET | IT, New Media & Software

Press release from: Getnews

/ PR Agency: Global Social

Z.ai Launches GLM-4.5V: Open-source Vision-Language Model

Z.ai (formerly Zhipu) today announced GLM-4.5V, an open-source vision-language model engineered for robust multimodal reasoning across images, video, long documents, charts, and GUI screens.

Multimodal reasoning is widely viewed as a key pathway toward AGI. GLM-4.5V advances that agenda with a 100B-class architecture (106B total parameters, 12B active) that pairs high accuracy with practical latency and deployment cost. The release follows July's GLM-4.1V-9B-Thinking, which hit #1 on Hugging Face Trending and has surpassed 130,000 downloads, and scales that recipe to enterprise workloads while keeping developer ergonomics front and center. The model is accessible through multiple channels, including Hugging Face [http://huggingface.co/zai-org/GLM-4.5V], GitHub [http://github.com/zai-org/GLM-V], Z.ai API Platform [http://docs.z.ai/guides/vlm/glm-4.5v], and Z.ai Chat [http://chat.z.ai], ensuring broad developer access.

Open-Source SOTA

Built on the new GLM-4.5-Air text base and extending the GLM-4.1V-Thinking lineage, GLM-4.5V delivers SOTA performance among similarly sized open-source VLMs across 41 public multimodal evaluations. Beyond leaderboards, the model is engineered for real-world usability and reliability on noisy, high-resolution, and extreme-aspect-ratio inputs.

The result is all-scenario visual reasoning in practical pipelines: image reasoning (scene understanding, multi-image analysis, localization), video understanding (shot segmentation and event recognition), GUI tasks (screen reading, icon detection, desktop assistance), complex chart and long-document analysis (report understanding and information extraction), and precise grounding (accurate spatial localization of visual elements).

Image: https://www.globalnewslines.com/uploads/2025/08/1ca45a47819aaf6a111e702a896ee2bc.jpg

Key Capabilities

Visual grounding and localization

GLM-4.5V precisely identifies and locates target objects based on natural-language prompts and returns bounding coordinates. This enables high-value applications such as safety and quality inspection or aerial/remote-sensing analysis. Compared with conventional detectors, the model leverages broader world knowledge and stronger semantic reasoning to follow more complex localization instructions.

Users can switch to the Visual Positioning mode, upload an image and a short prompt, and get back the box and rationale. For example, ask "Point out any non-real objects in this picture." GLM-4.5V reasons about plausibility and materials, then flags the insect-like sprinkler robot (the item highlighted in red in the demo) as non-real, returning a tight bounding box a confidence score, and a brief explanation.

Image: https://www.globalnewslines.com/uploads/2025/08/8dcbdd7939f12f7a2239bfbb0528b3f7.jpg

Design-to-code from screenshots and interaction videos

The model analyzes page screenshots-and even interaction videos-to infer hierarchy, layout rules, styles, and intent, then emits faithful, runnable HTML/CSS/JavaScript. Beyond element detection, it reconstructs the underlying logic and supports region-level edit requests, enabling an iterative loop between visual input and production-ready code.

Open-world image reasoning

GLM-4.5V can infer background context from subtle visual cues without external search. Given a landscape or street photo, it can reason from vegetation, climate traces, signage, and architectural styles to estimate the shooting location and approximate coordinates.

For example, using a classic scene from Before Sunrise -"Based on the architecture and streets in the background, can you identify the specific location in Vienna where this scene was filmed?"-the model parses facade details, street furniture, and layout cues to localize the exact spot in Vienna and return coordinates and a landmark name. (See demo: https://chat.z.ai/s/39233f25-8ce5-4488-9642-e07e7c638ef6).

Image: https://www.globalnewslines.com/uploads/2025/08/f51fdc9fae815cfaf720bb07467a54db.jpg

Beyond single images, GLM-4.5V's open-world reasoning scales in competitive settings: in a global "Geo Game," it beat 99% of human players within 16 hours and climbed to rank 66 within seven days-clear evidence of robust real-world performance.

Complex document and chart understanding

The model reads documents visually-pages, figures, tables, and charts-rather than relying on brittle OCR pipelines. That end-to-end approach preserves structure and layout, improving accuracy for summarization, translation, information extraction, and commentary across long, mixed-media reports.

GUI agent foundation

Built-in screen understanding lets GLM-4.5V read interfaces, locate icons and controls, and combine the current visual state with user instructions to plan actions. Paired with agent runtimes, it supports end-to-end desktop automation and complex GUI agent tasks, providing a dependable visual backbone for agentic systems.

Built for Reasoning, Designed for Use

GLM-4.5V is built on the new GLM-4.5-Air text base and uses a modern VLM pipeline-vision encoder, MLP adapter, and LLM decoder-with 64K multimodal context, native image and video inputs, and enhanced spatial-temporal modeling so the system handles high-resolution and extreme-aspect-ratio content with stability.

The training stack follows a three-stage strategy: large-scale multimodal pretraining on interleaved text-vision data and long contexts; supervised fine-tuning with explicit chain-of-thought formats to strengthen causal and cross-modal reasoning; and reinforcement learning that combines verifiable rewards with human feedback to lift STEM, grounding, and agentic behaviors. A simple thinking / non-thinking switch allows builders trade depth for speed on demand, aligning the model with varied product latency targets.

Image: https://www.globalnewslines.com/uploads/2025/08/8c8146f0727d80970ed4f09b16f3b316.jpg
Media Contact
Company Name: Z.ai
Contact Person: Zixuan Li
Email: Send Email [http://www.universalpressrelease.com/?pr=zai-launches-glm45v-opensource-visionlanguage-model-sets-new-bar-for-multimodal-reasoning]
Country: Singapore
Website: https://chat.z.ai/

Legal Disclaimer: Information contained on this page is provided by an independent third-party content provider. GetNews makes no warranties or responsibility or liability for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this article. If you are affiliated with this article or have any complaints or copyright issues related to this article and would like it to be removed, please contact retract@swscontact.com

This release was published on openPR.

Permanent link to this press release:

Copy

Please set a link in the press area of your homepage to this press release on openPR. openPR disclaims liability for any content contained in this release.

You can edit or delete your press release Z.ai Launches GLM-4.5V: Open-source Vision-Language Model Sets New Bar for Multimodal Reasoning here

News-ID: 4144632 • Views: …

More Releases from Getnews

01-09-2026 | Science & Education
Getnews

New Book Calls for Renewed National Focus on Moral Education in Schools

Image: https://www.aionewswire.com/storage/images/ckeditor//71vWZ9AyvdL._SL1500__1767884493.jpg As debates about values, civility, and the future of American society continue to intensify, a timely new book is sparking critical national dialogue about the role of schools in shaping character and citizenship. Nurturing Decent Human Beings: The Case for Moral Education in Our Schools [https://www.amazon.com/Nurturing-Decent-Human-Beings-Education-ebook/dp/B0918PXMJM/ref] by educator and scholar Michael Nill, PhD, argues that the United States must reclaim moral education as an essential part of K-12 learning-before…

01-09-2026 | Business, Economy, Finances, B …
Getnews

ICIC Delivers Specialized Training for Managing Construction Companies

Image: https://www.globalnewslines.com/uploads/2026/01/1767848828.jpg The course introduces core management concepts for construction companies with a practical focus on applying them directly to real projects and contracts. Delegacion Alvaro Obregon, CDMX - January 8, 2026 - ICIC announces a new professional program that strengthens management capacity across the construction sector through a structured Business Administration Course [http://icic.org.mx/catalogo-de-cursos/administracion-de-empresas-constructoras/] tailored to real project conditions. The course supports administrative management personnel who operate at manager and middle manager…

01-09-2026 | Business, Economy, Finances, B …
Getnews

MADMIA Brings Elsa and Anna Socks Bringing Frozen Fun to Fashion

Image: https://www.globalnewslines.com/uploads/2026/01/1767845877.jpg MADMIA brings fantasy and fashion together, using character driven design to add color, texture, and creative self expression to everyday wear. Winter Springs, FL - January 8, 2026 - MADMIA brings imagination and style together through the Elsa and Anna Socks, a design that blends fantasy with everyday comfort. The socks reflect the brand focus on creative self expression while adding a playful layer to modern fashion. Each pair of…

01-09-2026 | Industry, Real Estate & Constr …
Getnews

Sedona Waterproofing Solutions Highlights Crawl Space Repair Lowers Energy Bills …

Image: https://www.globalnewslines.com/uploads/2026/01/1767843493.jpg Cold air moving through crawl spaces can affect floors, drafts, and heating patterns in winter, leading some Charlotte homeowners to notice higher energy use. Concord, North Carolina - January 8, 2026 - Sedona Waterproofing Solutions reports a clear winter pattern across the Charlotte area: heat loss often starts under the home. Cold air can move through vent openings and gaps at plumbing and wiring paths, then chill floors and add…

All 5 Releases

More Releases for GUI

10-23-2018 | IT, New Media & Software
Precise Simulation - FEATool Multiphysics

OpenFOAM MATLAB GUI with CFDTool 1.2

CFDTool has been upgraded with built-in and completely seamless OpenFOAM GUI integration. This allows setting up and running, both laminar and turbulent OpenFOAM CFD simulations all within an easy and convenient GUI directly in MATLAB. With CFDTool, OpenFOAM case file generation steps such as - automatic mesh generation, conversion, and extraction of 2D/axi-symmetric grids - handles both constants and general expressions as boundary conditions (for example for parabolic inflow profiles) - calculation of…

10-01-2013 | IT, New Media & Software
froglogic GmbH

froglogic Releases Squish GUI Test Automation Tool Squish GUI Tester 5.0

Hamburg, Germany – 2013-06-25 froglogic GmbH today announced that Squish 5.0—a major new version of the popular Squish GUI Tester —is now available. Squish GUI Tester is the market leading, functional test automation tool for cross-platform and cross-device GUI testing on desktop, embedded and mobile platforms as well as web browsers. More than 3,000 QA departments around the world benefit from its tight integration with each supported GUI technology enabling the…

06-14-2012 | IT, New Media & Software
froglogic GmbH

Mobile GUI Test Automation: Squish Goes Android

Hamburg, Germany – 2012-06-12 froglogic announced Squish, its cross-platform automated GUI testing tool, will support automated testing for Android Apps on Android-powered devices and emulators. The Squish GUI testing tool is the market leading tool for cross-platform GUI test automation on desktop, embedded and mobile platforms. By adding the Android Edition, Android App developers finally have the professional GUI testing tool option for software functional and regression testing. "Squish for Android deeply…

01-14-2012 | IT, New Media & Software
Altia, Incorporated

Altia Unveils Nationwide ‘GUI Design Contest’

COLORADO SPRINGS, CO, January 12, 2011 – Altia, Inc., the leader in user interface tools, has announced a new contest that offers university students the opportunity to create an innovative user interface with the company’s award-winning software kit. Students are invited to use a trial version of Altia Design to build the most visually-appealing graphical user interface (GUI) possible. Projects will be judged by a first-rate panel of user experience…

07-28-2011 | IT, New Media & Software
froglogic GmbH

GUI Test Automation: froglogic Releases Squish 4.1

Hamburg, Germany – 2011-07-27 froglogic GmbH today announced that Squish 4.1 - a new feature release of the Squish GUI Testing Tool - is now available. Squish is the leading functional, cross-platform GUI and regression testing tool that can test applications based on a variety of GUI technologies, including Nokia's Qt Software Development Frameworks, Java SWT/Eclipse RCP, Java AWT/Swing, Windows MFC and .NET, Mac OS X Carbon/Cocoa, iOS CocoaTouch and Web/HTML/AJAX.…

03-10-2008 | IT, New Media & Software
froglogic GmbH

GUI Testing Specialist froglogic Announces Eclipse Strategy

Hamburg, Germany - 2008-10-03 froglogic GmbH will announce its new Eclipse strategy at EclipseCon 2008 taking place in Santa Clara next week. froglogic is the vendor of the leading, cross-platform automated GUI testing tool Squish. Squish supports creating and running automated GUI tests of applications based on a variety of user interface technologies including Trolltech's Qt toolkit, Java AWT/Swing/NetBeans, Java SWT/Eclipse RCP/JFaces, Web/HTML/AJAX and Mac OS X Carbon/Cocoa. froglogic's upcoming Squish 4.0 release will bring many ground-breaking improvements…

Comments about openPR

We publish our press releases on openPR.de because we like people who are "open" - just like us.
Frank Obels, CEO, INCONET - Informationssysteme, Communication und Netzwerkberatung GmbH

Your Press Release on Google News

Press Release in German on openPR.de