La BI au service des entreprises.
About the Company
DGTL Performance is a digital services company specializing in DATA and business intelligence. Founded in 2018, the company serves clients in France and abroad, offering a variety of services such as subcontracting, pre-employment, recruitment, and business outsourcing. With a focus on proximity, fair pricing, and ethical responsibility, DGTL supports clients in optimizing their data strategies and solutions.
About the Role
The Data Engineer (Spark/Hadoop) will play a crucial role in managing and optimizing big data pipelines. The position involves working with cutting-edge technologies in the Hadoop ecosystem, focusing on high-performance batch processing, data transformations, and system optimizations. The engineer will be part of a dynamic team, collaborating with clients in the insurance sector, and ensuring the effective implementation of data solutions.
Responsibilities
- Lead and mentor the team, providing guidance on both technical and functional aspects.
- Analyze and simplify complex technical concepts for easier understanding and communication.
- Manage and optimize Spark-based data pipelines, ensuring high efficiency and performance.
- Work on Hadoop ecosystem components such as Hive, HDFS, Yarn, and HBase.
- Implement data quality measures, data transformations, and mapping processes.
- Apply Agile methodologies (SAFe) and tools like JIRA in daily operations.
- Support data governance practices and ensure effective data lifecycle management.
- Collaborate with teams in the insurance industry (life insurance and provident insurance).
- Ensure optimal performance of batch processing in Spark.
Required Skills
- Java: Strong experience in Java programming for data solutions.
- Spark: Deep expertise in Spark and its optimization for data processing.
- Hadoop Ecosystem: Excellent knowledge of Hadoop components, including Hive, HDFS, Yarn, HBase.
- Databases: Experience with both relational and NoSQL databases.
- CI/CD Tools: Familiarity with Git, Jenkins, and Kubernetes for continuous integration and deployment.
- Data Governance: Strong understanding of data governance and management practices.
Preferred Qualifications
- Experience with streaming systems like Spark Streaming and Kafka.
- Familiarity with MapR, Oozie, and AirFlow.
- Expertise in data modeling for analytical and transactional systems.
- Knowledge of best practices for data governance and lifecycle management.