Software Engineer (Hadoop and Spark)

Dremio

The Intelligent Lakehouse Platform - Reinventing the data warehouse for the AI Era

About the Company

Dremio is the unified lakehouse platform for self-service analytics and AI, serving hundreds of global enterprises including Maersk, Amazon, Regeneron, NetApp, and S&P Global. Customers rely on Dremio for cloud, hybrid, and on-prem lakehouses to power data mesh, data warehouse migration, data virtualization, and unified data access use cases. Built on open source technologies such as Apache Iceberg and Apache Arrow, Dremio provides an open lakehouse architecture that enables the fastest time to insight and platform flexibility at a fraction of the cost.

About the Role

This role involves working on Dremio’s core features, including Reflections — a key component of Dremio’s query engine that combines materialized views with sophisticated automatic query rewrites through deep integration with Dremio’s distributed query optimizer. The position offers opportunities for leadership growth through mentoring, collaboration with other developers, and ownership of complex issues to deliver high-quality distributed systems at massive scale.

Responsibilities

  • Own design, implementation, testing, and support of next-generation features related to Dremio’s Query Planner and Reflections technologies
  • Contribute to open source projects such as Apache Calcite and Apache Iceberg
  • Apply modular design patterns to deliver an architecture that is elegant, simple, extensible, and maintainable
  • Solve complex technical problems and customer issues while improving telemetry and instrumentation to proactively detect issues and enhance debugging efficiency
  • Design and deliver architectures optimized for public clouds including GCP, AWS, and Azure
  • Mentor team members to maintain high quality and strong design standards
  • Collaborate with Product Management to innovate and deliver customer requirements, and work with Support and field teams to ensure customer success

Required Skills

  • Bachelor’s or Master’s degree (or equivalent experience) in Computer Science or related technical field
  • Minimum 2 years of experience developing production-level software
  • Fluency in Java, C++, or another modern programming language
  • Strong foundation in data structures, algorithms, multi-threaded and asynchronous programming models, with experience developing distributed and scalable systems
  • Experience delivering, deploying, and managing microservices
  • Passion for learning, quality, zero downtime upgrades, availability, resiliency, and uptime of platforms, using the latest technologies
  • Motivation to work in a fast-moving startup environment with a collaborative and accomplished team

Preferred qualifications

  • Strong database fundamentals, including SQL, performance tuning, and schema design
  • Understanding of distributed file systems such as S3, ADLS, or HDFS
  • Experience with AWS, Azure, Google Cloud Platform, and large-scale data processing systems (e.g., Hadoop, Spark)
  • Familiarity with materialized views and incremental view maintenance
  • Experience with distributed query engines and query processing or optimization
  • Knowledge of concurrency control, data replication, code generation, networking, storage systems, heap management, Apache Arrow, SQL operators, caching techniques, and disk spilling
  • Hands-on experience with multi-threaded and asynchronous programming models

Visit the official website below to access the full details of this vacancy:

Copyright © 2025 hadoop-jobs. All Rights Reserved.