Full Time
San Francisco
Posted 3 days ago
150 - 200

InfoCepts

The mission of the Big Data Operations team is to help teams harness the power of Big Data by providing a reliable and robust platform. We’re currently building a Next‑Gen Big Data platform on AWS, while we maintain and scale the existing platform in our data centers to meet current demands. We’re responsible for building capacity planning, security, and disaster recovery for our Next‑Gen platform in AWS. It is very important for us to provide greater visibility into the operational telemetry of our Big Data platform via collecting logs and metrics from various sources and setting alarms accordingly to identify issues proactively rather than reacting to them.

Responsibilities

Design, build, scale, and maintain the infrastructure in both data center and AWS to support Big Data applications.
Design, build, and own the end‑to‑end availability of the Big Data platform in both AWS and data center.
Improve the efficiency, reliability, and security of our Big Data infrastructure, while ensuring a smooth experience for developers and analysts.
Work on automation to build and maintain new platform on AWS.
Build custom tools to automate day‑to‑day operational tasks.
Be responsible for setting the standards for our production environment.
Take part in 24×7 on‑call rotation with the rest of the team and respond to pages and alerts to investigate issues in our platform.

Qualifications

Strong experience with Hadoop ecosystem like HDFS, Yarn, Hive, Spark, Oozie, Presto and Ranger.
MUST have strong experience with Amazon EMR.
Good working experience with RDS and good understanding of IaaS, PaaS.
Strong foundation of Hadoop security including SSL/TLS, Kerberos, role‑based authorization.
Performance tuning experience of Hadoop clusters, ecosystem components and MR/Spark jobs.
Experience with infrastructure automation using Terraform, CI/CD pipelines (Git, Jenkins, etc.), and configuration management tools like Ansible.
Able to leverage technologies like Kubernetes/Docker (ELK) to help our Data Engineers/Developers scale their efforts in creating new and innovative products.
Experience with providing and implementing monitoring solutions based on logs using CloudWatch, CloudTrail, and Lambda.
Ability to do post‑mortem if something bad happens to your systems; identify what went wrong and provide detailed RCA.
Proficiency in Bash & Python or Java.
Good understanding of all aspects of JRE/JVM and GC tuning.
Hands‑on experience with RDBMS (Oracle, MySQL) and basic SQL.
Hands‑on experience with Snowflake is a plus.
Hands‑on experience with Qubole and Airflow is a plus.

#J-18808-Ljbffr

Apply Now