InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language
arXiv AIArchived Apr 24, 2026✓ Full text saved
arXiv:2604.21061v1 Announce Type: new Abstract: The application of artificial intelligence (AI) in IVF has shown promise in improving consistency and standardization of decisions, but often relies on annotated data and does not make use of the multimodal nature of IVF data. We investigated whether foundational vision-language models can be fine-tuned to predict natural language descriptions of embryo morphology and development. Using a publicly available embryo time-lapse dataset, we fine-tuned
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Artificial Intelligence
[Submitted on 22 Apr 2026]
InVitroVision: a Multi-Modal AI Model for Automated Description of Embryo Development using Natural Language
Nicklas Neu, Thomas Ebner, Jasmin Primus, Raphael Zefferer, Bernhard Schenkenfelder, Mathias Brunbauer, Florian Kromp
The application of artificial intelligence (AI) in IVF has shown promise in improving consistency and standardization of decisions, but often relies on annotated data and does not make use of the multimodal nature of IVF data. We investigated whether foundational vision-language models can be fine-tuned to predict natural language descriptions of embryo morphology and development. Using a publicly available embryo time-lapse dataset, we fine-tuned PaliGemma-2, a multi-modal vision-language model, with only 1,000 images and corresponding captions, describing embryo morphology, embryonic cell cycle and developmental stage. Our results show that the fine-tuned model, InVitroVision, outperformed a commercial model, ChatGPT 5.2, and base models in overall metrics, with performance improving with larger training datasets. This study demonstrates the potential of foundational vision-language models to generalize to IVF tasks with limited data, enabling the prediction of natural language descriptions of embryo morphology and development. This approach may facilitate the use of large language models to retrieve information and scientific evidence from relevant publications and guidelines, and has implications for few-shot adaptation to multiple downstream tasks in IVF.
Comments: 15 pages, 2 figures
Subjects: Artificial Intelligence (cs.AI)
Cite as: arXiv:2604.21061 [cs.AI]
(or arXiv:2604.21061v1 [cs.AI] for this version)
https://doi.org/10.48550/arXiv.2604.21061
Focus to learn more
Submission history
From: Florian Kromp [view email]
[v1] Wed, 22 Apr 2026 20:05:37 UTC (1,535 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.AI
< prev | next >
new | recent | 2026-04
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)