Initiate. Ideate. Innovate.
About the Company
Salem Infotech is a global product engineering company that delivers strategic, technology-driven solutions through an integrated Software-as-a-Service (SaaS) model. Headquartered in Herndon, Virginia (USA) with a development center in Chennai, India, the company serves clients across three continents in key sectors such as Education, Logistics, and Healthcare. Leveraging the Salesforce platform, Salem Infotech specializes in application development, cloud-based services, platform re-engineering, predictive analytics, digital transformation (VR/AR, NLP), and API management. As a full-service consulting partner, the companyโs goal is to empower organizations to unlock business value through CRM, Cloud, and Digital technologies.
About the Role
The Hadoop Developer will play a key role in designing, developing, and optimizing large-scale data engineering pipelines within a cloud-native environment. This position requires strong expertise in Apache Spark, AWS Glue, and data lake architectures, with an emphasis on performance, scalability, and governance. The successful candidate will collaborate with cross-functional teams to ensure high-quality data delivery that supports analytical and business intelligence initiatives.
Responsibilities
- Design, develop, and maintain scalable data pipelines using Apache Spark, PySpark, and AWS Glue.
- Implement and optimize ETL workflows to support data ingestion, transformation, and analytics.
- Manage and monitor data storage solutions within AWS S3 and AWS Lake Formation.
- Apply SQL and data modeling best practices to build high-performance analytical systems.
- Work with orchestration tools such as Airflow or AWS Step Functions to automate data workflows.
- Integrate data governance frameworks and implement IAM-based access controls for compliance.
- Contribute to CI/CD pipelines and use infrastructure-as-code tools like Terraform or CloudFormation for deployment automation.
- Collaborate with analysts, architects, and business users to understand data requirements and deliver reliable solutions.
Required Skills
- Minimum 8 years of experience in software development or data engineering.
- Strong hands-on expertise with Apache Spark and PySpark.
- Proven experience with AWS Glue (v2 or v3), AWS S3, Athena, and AWS Lake Formation.
- Proficient in SQL, data partitioning, and data modeling techniques.
- Familiarity with CI/CD, Git, and infrastructure-as-code tools (Terraform, CloudFormation).
- Experience working with Redshift, Snowflake, or similar cloud data warehouses.
- Knowledge of data governance, IAM permissions, and Lake Formation access policies.
- Exposure to Glue Studio, Glue DataBrew, or AWS DataZone.
Preferred Qualifications
- Strong problem-solving skills and ability to work independently.
- Excellent communication and documentation skills.
- Familiarity with cloud cost optimization and data performance tuning.
- Certification in AWS Data Analytics or AWS Solutions Architect is an advantage.