Alibaba's Qwen 3.7-Plus supports text, video and imagery inputs at low cost of 1.6 per 1M token — but it's proprietary | Venture Beat
Overview
Alibaba's Qwen 3.7-Plus supports text, video and imagery inputs at low cost of 1.6 per 1M token — but it's proprietary
Credit: Venture Beat made with Open AI Chat GPT-Images-2.0 and Google Nana Banana 2
Details
Credit: Venture Beat made with Open AI Chat GPT-Images-2.0 and Google Nana Banana 2
Alibaba this week released Qwen 3.7-Plus, the latest AI large language model (LLM) in its globally beloved and increasingly expansive Qwen family, boasting more multimodal capabilities and a 60% lower cost than the prior, text-only Qwen 3.7-Max model released just weeks ago.
However, like its immediate predecessor Qwen 3.7-Plus is available only under a "closed" commercial license via proprietary application programming interfaces (API) and Qwen Chat.
That marks a big departure from the Qwen strategy to date, which was focused mainly on releasing powerful, near state-of-the-art open source models. Those enterprises and users who relied on the open source Qwen models — among them, U. S. giants such as Airbnb — will no doubt be disappointed to see that Alibaba is going closed for its newer releases.
Still, the model is worth a look because of its low cost and high performance on multimodal tasks like creating enterprise-grade visuals or analyzing video, imagery and screenshots, which Qwen 3.7-Max cannot do (it's text-only). It is among the cheaper powerful AI models available now, coming in price-wise just above Chinese rival's new Mini Max-M3's limited-time discount pricing.
Venture Beat Frontier AI Model API Pricing Snapshot
Maintaining continuity during complex tool execution loops
For technical decision-makers deploying autonomous agents, the primary bottleneck has rarely been initial model intelligence. Instead, it is state decay—the tendency of an agent framework to lose its analytical trajectory over multi-step, long-horizon tasks.
Qwen 3.7-Plus addresses this architectural vulnerability through a combined approach to context management and reasoning state preservation.
The model ships with a 1-million token context window and allocates up to 256K tokens specifically for internal chain-of-thought processing. To contextualize this capacity, imagine an automated cloud migration agent: it can ingest an entire codebase, map out the dependencies, and spend thousands of tokens quietly evaluating edge cases before executing a single line of bash script.
Crucially, the API exposes a parameter called 'preserve_thinking.' Across Alibaba's ecosystem, the capability serves as a standardized architectural bridge rather than a tiered perk. Alibaba introduced the feature during the prior Qwen 3.6 generation, integrating it into both the open-weight Qwen 3.6-27B and the proprietary Max models.
At its core, the parameter operates at the API and template level to retain internal
This structural continuity solves a critical bottleneck for developers engineering long-horizon tasks. By keeping these internal logic loops intact, the feature prevents the model from dropping its context or needlessly recomputing its cached history midway through an operation.
When a model executes complex, multi-step agentic coding assignments, this retention allows the system to hold onto its original train of thought without losing the plot or forgetting the underlying logic of its previous actions.
Alibaba remains far from alone in recognizing this technical necessity, as the underlying concept now dictates the architecture of nearly all major artificial intelligence laboratories.
Anthropic deploys this exact capability under the moniker "Extended Thinking" for its advanced models, including its latest Claude Opus 4.8. This framework requires developers to feed unmodified thinking blocks directly back into the API on subsequent turns to maintain an unbroken chain of reasoning.
Open AI tackles the same challenge through an encrypted reasoning pass-back mechanism for models like GPT-5.5. Within the Open AI ecosystem, developers must return specific reasoning items generated alongside previous function calls, ensuring the model explicitly remembers the rationale behind its tool executions.
Ultimately, preserve_thinking simply represents Alibaba's terminology for what has rapidly become the undisputed table stakes for modern multi-turn reasoning.
Benchmarks show a competitive, yet sub state-of-the-art model
On raw capability metrics, this deep-thinking architecture translates to structural gains across multimodal and agentic benchmarks. However, it still falls below many of the leading and prior generations of U. S. proprietary models such as Anthropic's Claude Opus 4.6 and Open AI's GPT-5.4.
Qwen 3.7-Plus benchmark comparison chart. Credit: Alibaba Qwen
Qwen 3.7-Plus benchmark comparison chart. Credit: Alibaba Qwen
On Terminal Bench 2.0-Terminus, which measures an model's capability to run actual terminal-level code safely and iteratively, Qwen 3.7-Plus scored 70.3, outperforming Deep Seek-V4-Pro Max (67.9) and Gemini-3.1 Pro (63.5).
On computer vision benchmarks that demand localized interface understanding, such as Screen Spot Pro, the model hit 79.0, significantly outpacing legacy industry standouts like GPT-5.4 (xhigh) at 67.4 and Claude-Opus-4.6 at 49.5. Agent Evaluation Metrics (Selected Benchmarks)
What should enterprises consider Qwen 3.7-Plus for?
For an enterprise architect, the key question when analyzing Qwen 3.7-Plus is clear: What does this replace in our current tech stack?
The model is designed to step in as a direct replacement for premier frontier models (such as GPT-5-tier or Claude-Max-tier models) within high-frequency developer workflows, robotic process automation (RPA), and data engineering pipelines.
Rather than deploying an expensive, general-purpose flagship model to handle repetitive system operations, technical teams can route these tasks to Qwen 3.7-Plus. It handles visual interface interpretation, command execution, and code generation simultaneously.
Alibaba has structured its API delivery to align with existing open-source and proprietary enterprise frameworks. The endpoints are fully Open AI-compatible, meaning swapping out existing dependencies requires minimal infrastructure adjustment. For groups leveraging autonomous terminal frameworks, the integration is natively supported across multiple environments.
Engineers can run Qwen 3.7-Plus directly through their local terminal setups by altering base environment targets.
From a pure cost perspective, running an agent framework that constantly references massive code repositories or visual layout histories can quickly become cost-prohibitive.
Alibaba addresses this by exposing granular caching price points.
Standard input processing sits at
This tier makes high-frequency, multi-turn agent iterations economically practical at an enterprise scale.
No open source license or open weights raises the compliance question for enterprises
When evaluating any model in the Qwen ecosystem, a primary concern for legal and security teams is the licensing framework and operational boundary of the data pipeline.
While previous iterations of the Qwen family gained significant enterprise traction via fully open-source weight availability under the Apache 2.0 or customized open-use licenses, Qwen 3.7-Plus is delivered strictly as a managed, commercial cloud API via Alibaba Cloud Model Studio. For enterprise risk management, this distinction carries specific implications:
No Local Weight Deployment: Organizations cannot download, sandbox, or locally host the weights of Qwen 3.7-Plus within their completely air-gapped internal data centers. All data verification, visual processing, and execution calls must step through Alibaba Cloud's international endpoints (e.g., the Singapore instance highlighted in developer documentation).
No Local Weight Deployment: Organizations cannot download, sandbox, or locally host the weights of Qwen 3.7-Plus within their completely air-gapped internal data centers. All data verification, visual processing, and execution calls must step through Alibaba Cloud's international endpoints (e.g., the Singapore instance highlighted in developer documentation).
Compliance and Sovereignty: Since the model requires cloud-based inference, companies operating under strict sovereign data boundaries (such as healthcare entities subject to local HIPAA/GDPR constraints or defense contractors) must explicitly evaluate whether external API routing complies with their specific data-residency obligations.
Compliance and Sovereignty: Since the model requires cloud-based inference, companies operating under strict sovereign data boundaries (such as healthcare entities subject to local HIPAA/GDPR constraints or defense contractors) must explicitly evaluate whether external API routing complies with their specific data-residency obligations.
Managed Risk Mitigation: Conversely, a managed API structure removes the internal infrastructure burden of provisioning, optimizing, and maintaining multi-GPU clusters (such as dedicated Nvidia H100 arrays) simply to host an internal agent network.
Managed Risk Mitigation: Conversely, a managed API structure removes the internal infrastructure burden of provisioning, optimizing, and maintaining multi-GPU clusters (such as dedicated Nvidia H100 arrays) simply to host an internal agent network.
Still, Qwen 3.7-Plus offers high intelligence across modalities at low cost
The initial reception from developer communities and technical venture capital highlights the shifting economics of agent deployment.
Prominent industry voice and Web 3 venture capitalist @Boxmining highlighted the strategic cost advantage, stating:
"Qwen 3.7 Plus being 40% cheaper than Max changes the conversation. If the output is close enough for most coding and much stronger for visual workflows, do you really need Max every day or only for the heavy terminal-only jobs?"
"Qwen 3.7 Plus being 40% cheaper than Max changes the conversation. If the output is close enough for most coding and much stronger for visual workflows, do you really need Max every day or only for the heavy terminal-only jobs?"
This perspective aligns with the current trend of optimizing enterprise operational budgets: shifting away from raw, unconstrained compute toward targeted task automation. At the same time, specialized researchers deep within the ecosystem point out that this isn't merely an incremental optimization of text generation.
Dunjie Lu, a research intern at Alibaba Qwen, remarked:
"It shows clear gains over Qwen 3.6-Plus in computer-use capabilities, with stronger generalization beyond general desktop tasks into professional workflows such as data engineering and scientific research."
"It shows clear gains over Qwen 3.6-Plus in computer-use capabilities, with stronger generalization beyond general desktop tasks into professional workflows such as data engineering and scientific research."
Ultimately, for enterprise buyers deciding on their next infrastructure roadmap, Qwen 3.7-Plus presents a practical alternative. If your organization's primary objective is building resilient, visual-capable autonomous software loops that interact directly with developer environments and cloud consoles—without blowing out your inference budget—the model provides a compelling reason to shift execution away from more expensive frontier alternatives.
Deep insights for enterprise AI, data, and security leaders
By submitting your email, you agree to our Terms and Privacy Notice.
Key Takeaways
- Credit: Venture Beat made with Open AI Chat GPT-Images-2
- Credit: Venture Beat made with Open AI Chat GPT-Images-2
- Alibaba this week released Qwen 3
- However, like its immediate predecessor Qwen 3



