Data Engineer 3-5 Years Full-Time Who are we? We are Spyne , redefining how cars are marketed and sold with cutting-edge Generative AI. What started as a bold idea—using AI-powered visuals to help dealers sell online faster—has evolved into a full-fledged AI-first automotive retail ecosystem. Backed by $16 M in Series A funding from Vertex Ventures, Accel, and other top investors, we're scaling fast: ✔ Expanded across the US & EU markets ✔ Launched industry-first AI-powered Image & 360° solutions ✔ Achieved a 5× revenue surge in 15 months, aiming for 3–4× growth this year 🚀 Know Our Journey 2020 : Launched as a visual merchandising platform 2023 : Pivoted to AI-driven automotive retail solutions 2024 : Achieved 5× revenue growth in 15 months, aiming for 3–4× more Today : Driving the GenAI revolution with AI-powered sourcing, pricing, CRM, and Agentic AI for dealerships 👉 Read more
about us: Studio AI Product Vini AI Product Series A Announcement on YourStory Series A Coverage in Economic Times Autocar Pro News What Are We Looking For? We're seeking a highly skilled Data Engineer to establish and own Spyne's dedicated data engineering function—our first. As Spyne AI accelerates market penetration across US rooftops and processes massive, high-velocity streams of unstructured computer vision payloads (Studio AI) and conversational state events (Vini AI), our data warehousing volume and complexity have scaled exponentially. This is not a maintenance role. You will actively restructure our entire data platform—taking absolute ownership from our DevOps team and raising our data infrastructure to true tech-industry standards. You'll build the foundational data layer that powers our BI platforms, ML observability, and long-term analytics roadmap. 📍 Location: Gurugram (Work from Office, 5 days a week) 🖥 Role: Full-Time, Data Engineer What Will You Do? Data Warehousing Architecture & Modeling: Design, scale, and own our core enterprise Data Warehouse built on ClickHouse Cloud—implementing robust data modeling methodologies, efficient time-based partitioning, and the centralized foundational data layer that powers all downstream BI platforms. CDC & Schema Evolution: Spearhead our transition from self-hosted Debezium/Kafka to managed ClickPipes; design performant ELT pipelines using ClickHouse Materialized Views, JSONExtract, and arrayJoin functions to parse deep, complex MongoDB Atlas JSON arrays into clean, flattened analytical tables. Advanced ClickHouse Engine Tuning: Manage SharedReplacingMergeTree tables partitioned by time; handle complex edge cases including cross-partition physical deletions (MongoDB tombstone events) and eliminate Cartesian explosions during array joins and LEFT JOIN operations. Event Sourcing for ML Pipelines: Maintain and optimize our append-only observability architecture (SQS → Lambda → ClickHouse Async Inserts) to track GPU and CPU ML workloads orchestrated via AWS Step Functions and AWS Batch, leveraging AggregatingMergeTree and anyLast state combinators to unify partial state updates. Performance Optimization & OOM Prevention: Troubleshoot and optimize heavy analytical queries to prevent concurrent Out-Of-Memory crashes when Metabase dashboards fire heavy models simultaneously; aggressively push down filters and leverage GLOBAL IN hash-lookups to eliminate broadcast overhead. AWS Data Networking: Navigate secure cross-account data transit within public-internet-denied cloud perimeters using AWS PrivateLink, VPC Lattice Service Networks, and MSK Multi-VPC connectivity secured via IAM authentication. Data Platform Ownership: Define and drive our long-term data warehousing roadmap; document architecture decisions, establish data quality standards, and reduce bandwidth currently falling on the DevOps team. Collaboration: Partner closely with ML Engineering, Product, and DevOps teams to ensure data pipelines are reliable, observable, and aligned with evolving product
requirements. What You Must Have? Experience: 3-5 years in a dedicated data engineering role, with proven ownership of production-grade data warehouse or analytics infrastructure. ClickHouse: Deep, hands-on expertise with ClickHouse—including engine selection (ReplacingMergeTree, AggregatingMergeTree), Materialized Views, partitioning strategies, and query optimization. ClickHouse Cloud experience is strongly preferred. ELT & CDC Pipelines: Demonstrated experience designing and operating Change Data Capture pipelines using Debezium, Kafka, or managed equivalents (ClickPipes, AWS DMS); strong command of schema evolution and data transformation patterns. Complex Data Modeling: Proficiency in parsing and flattening deeply nested JSON structures (JSONExtract, arrayJoin); experience modeling data from NoSQL sources (MongoDB Atlas) into analytical schemas. Event-Driven & Streaming Architectures: Hands-on experience with event sourcing patterns, async insert architectures, and streaming systems such as Apache Kafka / AWS MSK and AWS SQS/Lambda. AWS Infrastructure: Strong working knowledge of AWS data services—S3, Lambda, Step Functions, Batch, MSK—and AWS networking constructs including PrivateLink, VPC, and IAM-based authentication. Performance Debugging: Proven ability to diagnose and resolve OOM errors, slow analytical queries, and pipeline bottlenecks at scale. Scripting & Automation: Proficient in Python and SQL for pipeline development, data validation, and operational tooling. Multi-Database Expertise (Strong Plus): Architecture experience and performance tuning across MySQL, PostgreSQL, MongoDB, and Kafka—comfort navigating polyglot data environments is highly valued. Education: Bachelor's or master's degree in Computer Science, Data Engineering, or a related field. Why is Spyne an Employee-Centric Company? 🚀 Comprehensive Health & Life Coverage – GMC, GPA, and GTLI
benefits for you and your family Performance-Driven Growth – Fast career progression, ownership from Day 1, and stock options for top performers Elevate Learning & Development – Access LinkedIn Learning, mentorship programs, and hands-on AI data projects to upskill daily Collaborative Office Culture – Thrive in our energetic, innovation-first workplace Why Spyne? Strong Culture: A supportive, transparent environment with high autonomy Competitive
Compensation: Market-leading salary, equity, and
benefits Dynamic Growth: Join us at a pivotal growth stage—be the architect of our entire data platform, not just a contributor Cutting-Edge Tech: Work with ClickHouse Cloud, distributed ML pipelines, and real-time automotive AI data at a scale very few engineers encounter Apply for this job Share with someone awesome View all job openings
Total raised
$25.4M
Last stage
Series A
Investors
No applications, no recruiter spam. Just the intro.
A few questions to make sure this role is the right shape for you. Two minutes.
I write the intro, send it to the founder, and handle the back-and-forth.
If they’re a yes, I book the chat. You show up — that’s the whole job-hunt.