Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts
arXiv AIArchived May 20, 2026✓ Full text saved
arXiv:2605.19093v1 Announce Type: new Abstract: System prompts are a central control mechanism in modern AI systems, shaping behavior across conversations, tasks, and user populations. Yet they are difficult to tune when feedback is available only as aggregate metrics rather than per-example labels, failures, or critiques. We study this aggregate feedback setting as sample-constrained black-box optimization over discrete, variable-length text. We introduce ReElicit, a Bayesian optimization frame
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Artificial Intelligence
[Submitted on 18 May 2026]
Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts
Zhiyuan Jerry Lin, Benjamin Letham, Samuel Dooley, Maximilian Balandat, Eytan Bakshy
System prompts are a central control mechanism in modern AI systems, shaping behavior across conversations, tasks, and user populations. Yet they are difficult to tune when feedback is available only as aggregate metrics rather than per-example labels, failures, or critiques. We study this aggregate feedback setting as sample-constrained black-box optimization over discrete, variable-length text. We introduce ReElicit, a Bayesian optimization framework based on \emph{embedding by elicitation}. Given a task description, previously evaluated prompts, and scalar scores, an LLM elicits a compact, interpretable feature space and maps prompts into it. Leveraging a probabilistic Gaussian process surrogate, an acquisition function then selects target feature vectors, which the LLM realizes and refines into deployable system prompts. Re-eliciting the feature space as new evaluations arrive lets the representation adapt to the observed prompt-score history. We evaluate the setting using offline benchmark accuracy as a controlled aggregate proxy: the optimizer observes one scalar score per prompt and no per-example labels, errors, or critiques. Across ten system prompt optimization tasks with a 30 total evaluation budget, ReElicit achieves the strongest aggregate performance profile among representative aggregate-only prompt-optimization baselines. These results suggest that LLMs can serve as adaptive semantic representation builders, not only prompt generators, for Bayesian optimization over natural-language artifacts.
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2605.19093 [cs.AI]
(or arXiv:2605.19093v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2605.19093
Focus to learn more
Submission history
From: Zhiyuan Jerry Lin [view email]
[v1] Mon, 18 May 2026 20:28:17 UTC (142 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.AI
< prev | next >
new | recent | 2026-05
Change to browse by:
cs
cs.LG
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)