Google Gemini 2.0 is now open to all, boosts virtual agent push

Aytun Çelebi — Wed, 05 Feb 2025 21:57:02 +0000

Google has opened its latest AI model suite, Gemini 2.0, to the public, marking a significant step in its push toward advanced AI agents. The suite includes Gemini 2.0 Pro Experimental, designed for coding and complex tasks, and Gemini 2.0 Flash Thinking, now available in the Gemini app.

Gemini 2.0 family is now open to all

Gemini 2.0 Pro Experimental is described as Google’s most capable model yet, excelling in coding and handling intricate prompts. It boasts a context window of 2 million tokens, enabling it to process approximately 1.5 million words at once. The model can call tools like Google Search and execute code on behalf of users. Initially teased in the Gemini app’s changelog last week, it is now accessible via Google’s AI development platforms, Vertex AI and Google AI Studio, as well as to Gemini Advanced subscribers in the Gemini app.

Gemini 2.0 Flash, introduced in December, is now generally available. Billed as a “workhorse model,” it is optimized for high-volume, high-frequency tasks and costs developers 10 cents per million tokens for text, image, and video inputs. Additionally, Google unveiled Gemini 2.0 Flash-Lite, its most cost-efficient model, which matches the performance of its predecessor, Gemini 1.5 Flash, at the same price and speed. Flash-Lite costs 0.75 cents per million tokens.

Focus on AI agents

The release aligns with Google’s broader strategy of advancing agentic AI—models capable of performing complex, multistep tasks autonomously. In a December blog post, Google emphasized its focus on developing models that “understand more about the world around you, think multiple steps ahead, and take action on your behalf.” Gemini 2.0 introduces new multimodal capabilities, including native image and audio output, as well as tool use, bringing Google closer to its vision of a universal assistant.

This push places Google in direct competition with other tech giants and startups like Meta, Amazon, Microsoft, OpenAI, and Anthropic, all of which are investing heavily in agentic AI. Anthropic’s AI agents, for instance, can navigate computers similarly to humans, completing tasks with tens or hundreds of steps. OpenAI recently released Operator, an agent capable of automating tasks such as vacation planning and grocery ordering, while Deep Research compiles complex reports for users.

Google also launched its own Deep Research tool in December, which functions as a research assistant exploring topics and compiling detailed reports. CEO Sundar Pichai emphasized the importance of execution over being first, stating in a December strategy meeting, “I think that’s what 2025 is all about.”

Competition with DeepSeek

Google’s releases come amid growing attention to DeepSeek, the Chinese AI startup whose models rival or surpass those of leading American companies. DeepSeek’s R1 model gained significant traction due to its affordability and performance. To counter this, Google is making its Gemini 2.0 Flash Thinking model more accessible through the Gemini app, potentially aiming to draw greater attention to its offerings.

AI can now click, scroll, and type for you—but is that a good thing?

Kerem Gülen — Thu, 30 Jan 2025 14:30:39 +0000

A recent study from Zurich University of Applied Sciences by Pascal J. Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F. Grewe, and Thilo Stadelmann reveals that AI agents have officially outgrown their chatbot phase.

AI agents are running the show, clicking, scrolling, and typing their way through workflows with eerie precision. These instruction-based computer control agents (CCAs) can execute commands, interacting with digital environments like seasoned human operators. But as they edge closer to full autonomy, one thing becomes clear: the more power we give them, the harder it becomes to keep them in check.

How AI agents are learning to use computers like you

Traditional automation tools are glorified macros—repetitive, rigid, and clueless outside their scripted paths. CCAs, on the other hand, are built to improvise. They don’t just follow instructions; they observe, interpret, and act based on what they “see” on a screen, thanks to vision-language models (VLMs) and large language models (LLMs). This allows them to:

Read screens like a human, identifying text, buttons, and input fields without predefined coordinates.
Execute multi-step tasks, like opening an email, copying data, pasting it into a spreadsheet, and hitting send—all without direct supervision.
Understand natural language instructions, removing the need for users to learn complex automation scripts.
Adapt to changing interfaces, making them significantly more flexible than rule-based automation tools.

Tell a CCA to “find today’s top sales leads and email them a follow-up,” and it moves through apps, extracts relevant data, composes an email, and sends it, just like a human assistant. Unlike old-school RPA (Robotic Process Automation) that falls apart when a UI changes, CCAs can adjust in real time, identifying visual elements and making decisions on the fly.

The next frontier? Integration with cloud-based knowledge repositories and autonomous decision-making. The more these agents learn, the more sophisticated their capabilities become—raising questions about just how much trust we should place in them.

How large language models are transforming peer review

The benefits: Productivity, accessibility, and automation

There’s no denying that CCAs come with serious advantages:

Productivity on steroids: Tedious, time-consuming tasks vanish, allowing workers to focus on higher-value decisions rather than clicking through dashboards.
Accessibility revolution: People with disabilities can interact with technology more seamlessly through AI-powered navigation and task automation.
Enterprise-wide scalability: Businesses can automate entire workflows without hiring an army of IT specialists to build custom solutions.
System-wide integration: CCAs work across different platforms and applications, ensuring seamless digital interactions.
Always-on efficiency: Unlike human workers, these agents don’t get tired, distracted, or take lunch breaks.

The risks: Privacy, security, and trust

For every productivity win, there’s an equal and opposite security nightmare lurking in the background. Giving AI control over user interfaces isn’t just automation—it’s granting an unblinking machine access to sensitive workflows, financial transactions, and private data. And that’s where things get complicated.

CCAs operate by “watching” screens and analyzing text. Who ensures that sensitive information isn’t being misused or logged? Who’s keeping AI-driven keystrokes in check?

If an AI agent can log into your banking app and transfer money with a single command, what happens if it’s hacked? We’re handing over the digital keys to the kingdom with few safeguards. If a CCA makes a catastrophic error—deletes the wrong file, sends the wrong email, or approves a disastrous transaction—who’s responsible? Humans can be fired, fined, or trained. AI? Not so much.

And, if a malicious actor hijacks a CCA, they don’t just get access—they get a tireless, automated accomplice capable of wreaking havoc at scale. Lawmakers are scrambling to keep up, but there’s no playbook for AI-driven digital assistants making high-stakes decisions in real-time.

What comes next?

Businesses are moving cautiously, trying to balance the undeniable efficiency gains with the looming risks. Some companies are enforcing “human-in-the-loop” models, where AI agents handle execution but require manual approval for critical actions. Others are investing in AI governance policies to create safeguards before these agents become standard in enterprise operations.

What’s certain is that CCAs aren’t a passing trend—they’re the next phase of AI evolution, quietly embedding themselves into workflows and interfaces everywhere. As they grow more capable, the debate won’t be about whether we should use them, but how we can possibly control them.

Images: Kerem Gülen/Midjourney

agentic ai – Dataconomy