Harnessing Large Language Models for Iterative Optimization

A quick message from us at 1752vc (Formerly Pegasus):

Accelerate: Where Traction Meets Velocity

Our highly personalized three-month program offers $100K in funding, expert mentorship, and a powerful network to help you achieve your business goals.

Apply Here

Optimization has long sat at the center of machine intelligence. It is the mechanism by which models improve, systems adapt, and decisions get refined from rough guesses into usable strategies. For most of modern machine learning, that process has been dominated by numerical methods: gradients where they are available, heuristics where they are not. But a new possibility has begun to emerge. Large language models are not merely objects to be optimized. They can themselves function as optimizers.

That is a subtle but important shift.

Classical optimization methods work best when the world is smooth, numerical, and well-behaved. They assume objectives can be differentiated, approximated, or at least sampled in structured ways. Yet many of the problems that now matter most in agentic systems do not fit neatly into that mold. Prompt design is linguistic. Workflow structure is combinatorial. Tool use is contextual. Multi-step reasoning is neither fully continuous nor easily reducible to a simple loss surface.

This is where large language models begin to matter in a different way. They can operate directly over structured, discrete, and language-rich spaces. They can interpret feedback in words rather than just numbers. They can propose revisions, infer useful directions from critique, and iteratively refine strategies across prompts, plans, tools, and whole workflows. In effect, they offer a new optimization paradigm: one rooted not in calculus alone, but in reasoning.

This article examines that shift. It begins by placing LLM-based optimization in the broader landscape of optimization paradigms. It then explores the main iterative strategies now being used to turn LLMs into optimization engines, including random search, gradient-like refinement, and surrogate modelling. From there, it turns to hyperparameters, dynamic optimization over time, and the emerging theoretical picture of why transformers appear capable of this kind of behavior at all.

The larger point is not that language models replace classical optimization. It is that they expand what optimization can mean.

Optimization Paradigms: From Gradients to Language

Optimization methods differ mainly in one respect: what they assume can be known about the function or system being improved.

The most classical family is gradient-based optimization. These methods rely on explicit gradient information. If the objective is differentiable, then tools such as stochastic gradient descent or Newton-style methods can follow the slope of the landscape toward better solutions. This is the workhorse of deep learning and for good reason: when gradients exist and can be computed efficiently, they are powerful. They make learning scalable, systematic, and surprisingly robust.

But their strength is also their limitation. They assume differentiability. That becomes a problem the moment the optimization target is not a smooth numerical object but a prompt, a graph, a tool chain, or a structured workflow.

That leads to the second class: zeroth-order optimization. These methods do not require gradients. Instead, they infer useful directions from function evaluations alone. Bayesian optimization, evolutionary strategies, and finite-difference methods all belong here. They are useful for black-box problems where the objective can be queried but not differentiated. They broaden the range of tractable problems considerably, but they still tend to assume that the search space is ultimately numerical or at least can be made numerically manageable.

Then comes the emerging third class: LLM-based optimization.

This is different in kind, not just degree. Rather than optimizing only over continuous parameters or black-box numeric objectives, language models can optimize directly in spaces that are symbolic, discrete, and semantically structured. They can revise prompts in natural language. They can critique workflows in prose. They can compare multiple candidate solutions and infer which direction appears more promising. They can even produce something resembling a “gradient” in words: not a vector, but an instruction about how to improve.

That makes them especially well-suited to the design problems surrounding intelligent agents. Prompt refinement, tool orchestration, agent workflows, planning strategies, and multi-stage reasoning are all domains where classical optimization either struggles or becomes awkwardly indirect. Language models can operate there more naturally because the search space already resembles the medium they were trained on.

This does not make them magic. It does make them unusually flexible.

And it helps explain why recent reasoning systems and “slow-thinking” models increasingly look like hybrids between inference engines and optimizers.

Iterative Optimization with LLMs

The central feature of LLM-based optimization is iteration.

The model proposes something, evaluates or receives feedback on it, and then refines the next version accordingly. That broad loop is familiar from optimization theory, even if the objects being updated are now prompts, workflows, plans, or other structured artifacts rather than numerical vectors.

At a high level, the procedure looks simple. Sample a task. Execute the current candidate solution. Evaluate the outcome. Then update the candidate. But the real question is how to perform that update when the space is discrete and richly structured.

Three broad strategies have emerged.

Random search: brute force with structure

The simplest is random search, or more accurately, guided random search.

This family of methods samples a pool of candidate prompts or workflows, evaluates them, and keeps the best-performing ones. New candidates are then generated by mutating or recombining the survivors. In spirit, this resembles evolutionary search.

The attraction is obvious. It is simple, parallelizable, and often surprisingly effective, especially when optimizing a single prompt or a relatively compact workflow. The system does not need to know why one candidate is better than another. It only needs a way to rank them.

That makes random search one of the easiest entry points for LLM-based optimization. It works especially well when evaluation is cheap enough to run at scale and when the search space, though large, still admits useful diversity through recombination or mutation.

Its weakness is cost. Each iteration may require many parallel model calls. For simple prompt tuning that can be tolerable. For large multi-stage agentic workflows, it quickly becomes expensive. The search can also be wasteful, because it explores broadly without learning much about the structure of the objective beyond which candidates happened to work.

It is therefore effective, but blunt.

Gradient approximation: textual descent rather than numerical descent

The second strategy is more interesting. It tries to import the intuition of gradient descent into a non-numerical world.

In continuous optimization, gradients tell you how to move in order to improve the objective. In language-based optimization, there is no literal gradient over prompts or workflows in the classical sense. But there can be something analogous: feedback that points in a better direction.

This is where the idea of textual gradients comes in. Rather than receiving a vector, the system receives a critique, diagnosis, or refinement suggestion. Instead of saying “move 0.02 in this dimension,” it says something like: be more explicit about the edge case, reduce ambiguity in the task decomposition, or clarify the evaluation criteria.

That may sound informal, but it captures something real. These systems use model-generated or judge-generated feedback to infer a direction of improvement. The optimizer then rewrites the prompt or workflow accordingly.

This has clear advantages over random search. It makes use of history. It can incorporate accumulated lessons from previous failures. It often converges faster because it is not merely sampling alternatives blindly; it is revising based on a reasoned critique.

It also scales better to multi-stage workflows, where the system may need to optimize not just a single prompt but an interconnected chain of reasoning modules. In those cases, the analogy to backpropagation becomes stronger. Feedback on downstream failures can sometimes be propagated back to earlier components in the workflow, allowing targeted improvement.

The downside is complexity. Textual-gradient methods require additional design choices: how to aggregate feedback, how to prompt the optimizer itself, how to distinguish useful critique from noise, and how much refinement history to preserve. They are more elegant than random search, but also more delicate.

Surrogate modelling: learning where not to query

The third strategy borrows from Bayesian optimization and amortized search.

The problem with many LLM-based optimization loops is that querying a frontier model is expensive. If each candidate prompt or workflow has to be fully evaluated by the LLM, the optimization process quickly becomes costly in both money and time. Surrogate modelling offers a way around that.

The idea is to build a lighter-weight model of the objective. Instead of asking the LLM to evaluate everything, the system learns a proxy that predicts which candidates are promising. The surrogate may take the form of a Gaussian process, a score predictor, or another learned estimator. It proposes which options are worth testing, and only those candidates are passed through the full expensive evaluation pipeline.

This does two things. First, it reduces the number of LLM calls required. Second, it can smooth over some of the noise that makes direct language-model optimization unstable.

The trade-off is that the optimization process now depends on the quality of the surrogate. If the proxy is poor, the search becomes biased or shortsighted. But when it works, surrogate modelling offers an attractive route toward scaling LLM-based optimization beyond toy settings.

Taken together, these three approaches show how classical optimization ideas are being translated into the LLM era. Random search provides breadth. Textual gradients provide direction. Surrogates provide efficiency.

None is sufficient on its own. All are useful.

Hyperparameters: The Underappreciated Bottleneck

For all the excitement around LLM-based optimization, one awkward fact remains: it is often highly sensitive to hyperparameters, and the field still has only a limited understanding of how best to set them.

This is not unusual. Traditional optimization also depends heavily on hyperparameters such as learning rate, batch size, momentum, exploration schedules, and regularization strength. What is different here is that the hyperparameters often govern not only numerical behavior but reasoning behavior.

Consider feedback aggregation. If an optimizer receives multiple critiques from different examples, how should they be combined? Averaging works in continuous optimization because gradients live in a vector space. In language-based optimization, aggregation is much murkier. The choice of how to synthesize feedback can change the entire trajectory of refinement.

Batch size matters too. Aggregating signals across multiple samples often yields more stable updates, much as minibatches stabilize stochastic gradient descent. But larger batches also increase cost and can blur specific failure modes that would have been useful to preserve.

Momentum has its analogue as well. Systems that incorporate prior refinements and not just the latest critique often perform better, just as momentum can stabilize and accelerate classical descent. Yet how to encode and manage that historical information remains underdeveloped.

Then there are the hyperparameters specific to agentic workflows. Which model should play which role in a multi-agent architecture? Which demonstrations should be included in-context? How frequently should tools be invoked? How deep should a reasoning chain be allowed to go? When should the system branch, stop, or self-correct?

These are not peripheral settings. They often determine whether an agent feels sharp, bloated, brittle, or robust.

The difficulty is that brute-force search over these choices scales badly. The interaction effects are too strong. Prompt style affects tool use. Tool use affects workflow depth. Workflow depth affects latency and cost. In multi-agent systems, role assignment and orchestration structure create additional layers of combinatorial complexity.

That is why meta-optimization has become increasingly important. The idea is to use language models not only to optimize task-level objects, but also to optimize their own optimization strategies: prompts for critique, aggregation schemes, workflow structures, even the policies that govern search itself.

This is where the field begins to turn recursive. The optimizer becomes another thing to optimize.

That recursion is powerful. It is also where things get complicated quickly.

Dynamic and Iterative Optimization Over Time

A numerical optimizer is often imagined as something static: an objective, a gradient, an update rule, repeated until convergence. LLM-based optimization is more fluid. It operates not only across a search space, but across time.

This temporal dimension matters.

One way to think about it is through depth. In many workflows, optimization happens across a structured sequence of stages. One component generates a plan. Another critiques it. Another retrieves information. Another synthesizes the final answer. In that sense, the workflow resembles a feedforward computation graph: each layer or module contributes to the final result.

But LLM-based optimization also has a second dimension, closer to recurrence. It often revisits earlier states, incorporates past feedback, and updates behavior over multiple cycles. That makes it less like a one-pass network and more like a recurrent or iterative system that learns through repeated engagement.

This opens up a rich design space.

An agent may improve not just because each individual workflow stage is better designed, but because the system keeps learning from its previous attempts. It may revise prompts after each run. It may adjust planning depth depending on past failures. It may retain useful refinements as persistent strategies. It may change the allocation of compute over time, devoting more effort to uncertain or failure-prone stages.

The analogy with recurrent architectures is therefore helpful. Just as recurrent systems refine hidden state over time, LLM-based optimizers refine strategies over repeated task cycles.

Yet this area remains relatively underexplored. Classical engineering has developed many tricks for managing iterative optimization efficiently, such as checkpointing intermediate states or truncating backpropagation to reduce cost. Similar ideas have barely begun to be translated into LLM-based workflows.

That looks like a major opportunity.

There is no reason, in principle, that a language-based optimizer should be limited to naïve full-loop revision every time. Smarter temporal strategies could make these systems more stable, cheaper, and better suited for long-horizon agentic tasks.

The challenge is that temporal refinement introduces its own risks. Early mistakes can propagate. Long iterative chains can drift. Exploration and exploitation become harder to balance. Uncertainty compounds. What looks like learning can become instability if feedback is noisy or misinterpreted.

The future of LLM optimization will depend in part on how well these temporal dynamics are understood and controlled.

Why Transformers Can Optimize at All

The practical success of LLM-based optimization raises a deeper question: why should transformers be able to do this in the first place?

Part of the answer lies in in-context learning.

A growing body of work suggests that transformers can behave like implicit learners inside the forward pass itself. In sufficiently rich contexts, they appear capable of approximating procedures that look surprisingly similar to regression, Bayesian updating, or gradient-based adaptation. They do not update parameters in the conventional sense, but they can still adapt behavior based on the examples and patterns provided in context.

This matters because many forms of optimization can be reinterpreted as a special case of structured adaptation. If a model can absorb evidence from prior attempts, infer what went wrong, and generate a better next step, then it is already behaving like an optimizer, even if the mechanism is quite unlike classical gradient descent.

Mechanistic interpretability pushes this question further. Recent work has begun to isolate circuits inside transformers that appear responsible for particular reasoning behaviors or contextual adjustments. The field is still young, and much of the evidence remains incomplete, but the broader implication is clear: transformers are not only storing language. They may also be implementing reusable computational motifs that support search, revision, and internal adaptation.

That said, the theory is still far behind the practice.

The biggest weakness appears under uncertainty. While transformers can reason impressively in structured settings, they often struggle in situations requiring robust exploration, calibrated confidence, or strategic action under stochastic conditions. That matters because optimization in the real world is rarely clean. Tasks are noisy. Evaluations are imperfect. Environments shift. A system that optimizes beautifully in a deterministic benchmark may falter badly when uncertainty becomes central.

This is why the theoretical agenda matters so much. Without a stronger account of how transformers perform optimization-like computation — and where that computation breaks down — the field risks overestimating both capability and reliability.

For now, the fairest conclusion is that transformers seem able to act as optimizers in ways that are empirically useful but not yet fully understood.

That is promising. It is also a warning.

The Limits of LLM-Based Optimization

It is tempting to treat LLM-based optimization as a universal solvent for messy agent design. That would be a mistake.

The first limitation is cost. Iterative optimization with large models can quickly become expensive, especially when candidate generation, critique, evaluation, and revision all require separate model calls. Even elegant algorithms can become financially impractical at scale.

The second is variance. Language-model outputs are noisy. The same input can yield different refinements, and some of those refinements may be superficially plausible but directionally wrong. This makes optimization less stable than in classical settings.

The third is weak guarantees. A gradient descent method on a well-defined objective has mathematical structure behind it. LLM-based optimization is often heuristic and interpretive. It can work extremely well without providing much assurance about when or why.

The fourth is brittleness under uncertainty. As noted earlier, these systems can perform poorly when environments are dynamic, ambiguous, or stochastic. Their talent for fluent refinement does not automatically translate into robust decision-making.

The fifth is overfitting to the judge. If optimization depends on another model’s evaluation, there is always a risk that the system learns to satisfy the evaluator rather than the task itself. This is not new — reward hacking is an old problem — but it takes on new forms when both optimizer and evaluator are language models.

These limitations do not invalidate the paradigm. They simply make clear that it is still a frontier, not a settled engineering discipline.

What This Means for Intelligent Agents

The significance of LLM-based optimization becomes clearest when viewed through the lens of agent design.

Agents are not just models. They are systems composed of prompts, workflows, memory mechanisms, tool interfaces, action policies, evaluation loops, and decision procedures. Optimizing such systems through purely numerical means is often cumbersome, indirect, or simply infeasible. Language models open a new path because they can optimize in the same medium in which many of these systems are specified.

That creates a powerful possibility: agents that can revise their own instructions, redesign their own workflows, improve their tool strategies, and increasingly participate in their own development.

In other words, LLM-based optimization is one of the enabling mechanisms of self-evolution in intelligent agents.

It is not the whole story. Reinforcement learning, search, external verifiers, tool systems, and environment feedback all remain essential. But language models provide something distinct: a flexible optimization layer that can bridge symbolic structure, natural language reasoning, and iterative refinement.

That is why this paradigm matters. It is not merely about making prompts better. It is about giving intelligent systems a more natural way to improve the structures through which they think and act.

What Comes Next in This Series

This article has focused on a crucial mechanism inside the emerging agent stack: how large language models can serve not just as generative engines, but as iterative optimizers.

That matters because self-improving agents need optimization machinery. They need ways to refine prompts, workflows, tools, and strategies without requiring everything to be hand-tuned or fully differentiable. LLM-based optimization is one of the clearest candidates for how that improvement loop may actually function in practice.

But once agents can optimize their own components, another question comes into view.

How should we think about agents not as isolated optimizers, but as participants in broader systems of coordination and collective behavior?

The next article turns to that larger problem. It moves from optimization within the agent to interaction among agents, examining how multiple intelligent systems communicate, coordinate, divide labor, and form larger architectures of collective intelligence.

If this chapter was about how agents learn to improve, the next is about what happens when many such agents begin to improve and operate together.

Series Note: Derived from Advances and Challenges in Foundation Agents

This series draws heavily from the paper Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems (Aug 2, 2025). The work brings together an impressive group of researchers from institutions including MetaGPT, Mila, Stanford, Microsoft Research, Google DeepMind, and many others to explore the evolving landscape of foundation agents and the challenges that lie ahead. We would like to sincerely thank the authors and researchers who contributed to this outstanding work for compiling such a comprehensive and insightful resource. Their research provides an important foundation for many of the ideas explored throughout this series.

Learn More

Visit us at 1752.vc

For Aspiring Investors

Venture Fellow Program

Designed for aspiring venture capitalists and startup leaders, our program offers deep insights into venture operations, fund management, and growth strategies, all guided by seasoned industry experts.

Emerging Angel Program

Break the mold and dive into angel investing with a fresh perspective. Our program provides a comprehensive curriculum on innovative investment strategies, unique deal sourcing, and hands-on, real-world experiences, all guided by industry experts.

For Founders

1752vc offers four exclusive programs tailored to help startups succeed—whether you're raising capital or need help with sales, we’ve got you covered.

Accelerate

Our highly selective, 12-week, remote-first accelerator is designed to help early-stage startups raise capital, scale quickly, and expand their networks. We invest $100K and provide direct access to 850+ mentors, strategic partners, and invaluable industry connections.

The GTM Accelerator

A 12-week, results-driven program designed to help early-stage startups master sales, go-to-market, and growth hacking. Includes $1M+ in perks, tactical guidance from top operators, and a potential path to $100K investment from 1752vc.

Ignite

The ultimate self-paced startup academy, designed to guide you through every stage—whether it's building your business model, mastering unit economics, or navigating fundraising—with $1M in perks to fuel your growth and a direct path to $100K investment. The perfect next step after YC's Startup School or Founder University.

Ignite DTC

A 12-week accelerator helping early-stage DTC brands scale from early traction to repeatable, high-growth revenue. Powered by 1752vc's playbook, it combines real-world execution, data-driven strategy, and direct investor access to fuel brand success.

Launchpad

A 12-week, self-paced program designed to help founders turn ideas into scalable startups. Built by 1752vc, it provides expert guidance, a structured playbook, and investor access. Founders who execute effectively can position themselves for a potential $100K investment.

Spark xyz

An all-in-one platform that connects startups, investors, and accelerators, streamlining fundraising, deal flow, and cohort management. Whether you're a founder raising capital, an investor sourcing deals, or an organization running programs, Sparkxyz provides the tools to power faster, more efficient collaboration and growth.

Apply now to join an exclusive group of high-potential startups!

Harnessing Large Language Models for Iterative Optimization

Learn More

For Founders

Keep Reading

VC Unfiltered