Mirage is an AI-native video platform that intelligently orchestrates production and editing through natural language. Our models leverage contextual awareness to execute the same creative decisions a professional editor would — dramatically improving productivity for experienced teams, while making video creation accessible to anyone. We’re an interdisciplinary team addressing some of the most difficult technical and creative challenges in generative media. As an early member of our team, you’ll tackle foundational problems that remain largely unsolved across the industry, driving an outsized impact on the future of creative expression. More
about us Product (Captions by Mirage) Research (Seeing Voices, technical-white-paper) Updates (Mirage on X / twitter) TechCrunch , Forbes AI 50 , Fast Company (press) Our Investors We’re very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, General Catalyst , Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more. Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square)
About
the Role Mirage is seeking a Research Scientist to push the boundaries of large language models for multimodal creative tasks. You’ll develop new approaches for adapting and extending LLMs to understand and operate over complex, real-world data, particularly video This role focuses on advancing model capabilities, improving reasoning and control, and enabling new forms of interaction between language and time-based media.
Responsibilities Develop novel approaches for training and adapting large language models Design new objectives, datasets, and fine-tuning strategies Explore multimodal reasoning and structured generation Run systematic experiments to improve model behavior and reliability Design evaluation frameworks for complex, real-world tasks in video analysis Analyze failure modes and iterate on model improvements What makes you a great fit MS/PhD in ML, CS, or related field Strong track record in LLMs, NLP, video understanding, or multimodal learning Deep understanding of transformers and modern LLM techniques Experience with fine-tuning, alignment, or post-training methods, especially for adapting models to generate structured outputs Strong experimental rigor and research taste
Benefits: Comprehensive medical, dental, and vision plans 401K with employer match Commuter
Benefits Catered lunch multiple days per week Dinner stipend every night if you're working late and want a bite! Grubhub subscription Health & Wellness
Perks Multiple team offsites per year with team events every month Generous PTO policy Captions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. Please note
benefits apply to full time employees only.
Salary
$175,000 - $275,000
Location
New York, NY, USA
Total raised
$175.0M
Last stage
Growth
Investors
No applications, no recruiter spam. Just the intro.
A few questions to make sure this role is the right shape for you. Two minutes.
I write the intro, send it to the founder, and handle the back-and-forth.
If they’re a yes, I book the chat. You show up — that’s the whole job-hunt.