Value Driven Solutions Delivered
About the Company
HSTechnologies LLC specializes in technology-intensive transformations, offering a comprehensive suite of data engineering services powered by Big Data tools. The company’s expertise spans product design and development, Big Data strategy and advisory, decision science and AI, technology evaluation, and end-to-end data services including integration, mining, profiling, cleansing, and enrichment.
About the Role
Lead the architecture, design, and development of components and services to enable Machine Learning at scale. Responsible for data ingestion from disparate sources like SAP and non-SAP systems such as iEnergy and MDMS. Maintain and extend the big data platform infrastructure that supports the client’s business use cases. Identify and recommend the most appropriate paradigms and technologies for batch and real-time scenarios.
Responsibilities
- Find cluster-level solutions for complex systems and develop enterprise-level applications followed by unit testing.
- Build data pipelines from Source to SFTP and from SFTP to Hadoop landing layer using Talend.
- Develop an automated data ingestion framework using Talend to synchronize Hadoop data with SAP HANA and vice versa.
- Run complex queries and optimize using bucketing, partitioning, joins, and sub-queries.
- Write advanced big data business application code using both functional and object-oriented programming.
- Implement complex transformations and actions using DataFrames and Datasets from Spark and Scala.
- Develop standalone Spark/Scala applications that read error logs from multiple upstream sources and run validations.
- Write build scripts using Apache Maven, Ant, and SBT; deploy code via Jenkins for CI/CD.
- Write complex workflow jobs using Redwood and set up multiple program schedulers to manage Hadoop, Hive, Sqoop, and Spark jobs.
- Monitor pipeline jobs, troubleshoot failed jobs, and configure Redwood SC properties.
- Develop Kafka producers to listen to streaming data within specified durations.
- Mentor and teach other engineers on the team.
- Document functional and technical requirements following company methodologies.
- Perform data cleanups and validations on streaming data using Spark, Spark Streaming, and Scala.
Required Skills
- Bachelor’s degree in Computer Science or equivalent.
- Expertise with Cloudrea Hadoop (CDH), Cloudera Manager, Informatica Big Data Edition (BDM), HDFS, Yarn, MapReduce, Hive, Impala, KUDU, Sqoop, Spark, Kafka, HBase, Teradata Studio Express, Teradata, Tableau, Kerberos, Active Directory, Sentry, TLS/SSL, Linux/RHEL, Unix, Windows.
- Experience with SBT, Maven, Jenkins, Oracle, MS SQL Server, Shell scripting, Eclipse IDE, Git, and SVN.
- Strong problem-solving and analytical skills.
- Ability to identify complex problems, evaluate options, and implement effective solutions.