Value Driven Solutions Delivered
Job Summary:
We’re looking for a skilled professional to lead the design, architecture, and development of scalable machine learning components and services. You’ll play a key role in enabling advanced analytics by managing large-scale data ingestion and infrastructure.
Key Responsibilities:
-
Design and develop systems for large-scale machine learning applications.
-
Manage data ingestion from a variety of sources, including SAP and non-SAP systems like iEnergy and MDMS.
-
Maintain and enhance a big data platform tailored to the organization’s needs.
-
Recommend the right tools, frameworks, and architectures for batch and real-time data processing.
-
Design solutions at the cluster level and develop enterprise-grade applications with thorough unit testing.
-
Build data pipelines from source systems to SFTP, then from SFTP to Hadoop landing zones using Talend.
-
Create automated ingestion frameworks in Talend to synchronize data between Hadoop and SAP HANA in both directions.
-
Execute complex data queries using techniques such as bucketing, partitioning, joins, and sub-queries.
-
Develop advanced big data applications using both functional and object-oriented programming techniques.
-
Use Spark/Scala for complex data transformations using DataFrames and Datasets.
-
Build standalone Spark/Scala apps that analyze error logs from multiple sources and perform data validations.
-
Write build scripts with tools like Apache Maven, Ant, and SBT, and deploy code through Jenkins as part of a CI/CD pipeline.
-
Develop complex workflow jobs in Redwood and configure job scheduling systems for Hadoop, Hive, Sqoop, and Spark.
-
Monitor and troubleshoot pipeline jobs and configure Redwood Scheduler properties.
-
Create Kafka producers to process real-time streaming data.
-
Mentor junior engineers and contribute to team knowledge sharing.
-
Document technical and functional specifications in line with company standards.
-
Perform real-time data validation and cleanup using Spark, Spark Streaming, and Scala.
Required Qualifications:
-
Bachelor’s degree in Computer Science or a related field.
-
Experience with technologies such as Cloudera Hadoop (CDH), Cloudera Manager, Informatica BDM, HDFS, YARN, MapReduce, Hive, Impala, KUDU, Sqoop, Spark, Kafka, HBase, Teradata, Tableau, Kerberos, Active Directory, Sentry, TLS/SSL, Linux/RHEL, Unix, Windows, SBT, Maven, Jenkins, Oracle, MS SQL Server, Shell scripting, Eclipse IDE, Git, and SVN.
-
Strong analytical and problem-solving skills.
-
Ability to assess complex issues, analyze related data, and develop effective solutions.
To Apply:
If you’re interested in a dynamic, fast-paced work environment and enjoy challenging and impactful projects, please send your resume to:
HSTechnologies LLC
2801 W Parker Road, Suite #5
Plano, TX – 75023
Email: [email protected]