As the Lead Kafka Engineer, you will be responsible for leading technical direction and implementation of the software pipeline and observability platform. You will be responsible for the technical aspects of ensuring that our hybrid cloud Kafka infrastructure is highly available, sustainable, performant, cost effective, and usable by hundreds of developers. You will work closely with the Kafka Engineering Manager, software development leadership, technical operations leadership, architecture, PMO and vendors from around the globe to deliver these services.
ROLE RESPONSIBILITIES:
Work with the Manager – Kafka Engineering to assess the needs of Software Engineering within the Kafka ecosystem
Establish, communicate, and advocate best practices and design patterns related to Kafka consumption
Participate in and occasionally lead the daily, weekly, sprint cycle team ceremonies and ensure efficient activities of the team aligned to goals
Provide mentoring to team members
Assess and size effort associated with work backlog and participate in grooming
Advise and inform a program of work to mature the streams processing service offering
Collaborate with other operations team to ensure highly available service and response respective to developer support
Interview and participate in building a team to establish a center of excellence and expertise in streams processing
Participate in regular planning cycles to align business priorities to programs of work within your organization
Inform recommendations, including resourcing, of strategic projects to mature and improve the service
Lead proofs of concepts, engineering, and implementation projects
TECHNICAL REQUIREMENTS:
Strong operational background running Kafka clusters at scale
Knowledge of both physical/onprem systems and public cloud infrastructure
Strong understanding of Kafka broker, connect, and topic tuning and architectures
Strong understanding of Linux fundamentals as related to Kafka performance
Filesystem tuning and related kernel tuning and troubleshooting
Storage hardware and trade-offs
Network TCP stack tuning and troubleshooting
Background in both Systems and Software Engineering
Competent developing software in 1 or more of high level language
Competent with configuration management in code/IaC including Ansible and Terraform
Competent operating Java Runtime Environment (JRE) in large scale environments - ( runtime settings, JMX, troubleshooting, garbage collection, etc… )
Passionate about data driven operations; building and leveraging observability with tools such as Prometheus and Grafana
Knowledge and experience of containers and Kubernetes cluster
Hands on experience delivering complex software in an enterprise environment
Experience working in a remote team across multiple regions and time zones
Comfortable working with structured Change and Incident Management