• Location: Remote
  • Type: Direct Hire
  • Job #257


As the Lead Kafka Engineer, you will be responsible for leading technical direction and implementation of the software pipeline and observability platform. You will be responsible for the technical aspects of ensuring that our hybrid cloud Kafka infrastructure is highly available, sustainable, performant, cost effective, and usable by hundreds of developers. You will work closely with the Kafka Engineering Manager, software development leadership, technical operations leadership, architecture, PMO and vendors from around the globe to deliver these services.



  • Work with the Manager – Kafka Engineering to assess the needs of Software Engineering within the Kafka ecosystem
  • Establish, communicate, and advocate best practices and design patterns related to Kafka consumption
  • Participate in and occasionally lead the daily, weekly, sprint cycle team ceremonies and ensure efficient activities of the team aligned to goals
  • Provide mentoring to team members
  • Assess and size effort associated with work backlog and participate in grooming
  • Advise and inform a program of work to mature the streams processing service offering
  • Collaborate with other operations team to ensure highly available service and response respective to developer support
  • Interview and participate in building a team to establish a center of excellence and expertise in streams processing
  • Participate in regular planning cycles to align business priorities to programs of work within your organization
  • Inform recommendations, including resourcing, of strategic projects to mature and improve the service
  • Lead proofs of concepts, engineering, and implementation projects



  • Strong operational background running Kafka clusters at scale
  • Knowledge of both physical/onprem systems and public cloud infrastructure
  • Strong understanding of Kafka broker, connect, and topic tuning and architectures
  • Strong understanding of Linux fundamentals as related to Kafka performance
    • Filesystem tuning and related kernel tuning and troubleshooting
    • Storage hardware and trade-offs
    • Network TCP stack tuning and troubleshooting
  • Background in both Systems and Software Engineering
  • Competent developing software in 1 or more of high level language
  • Competent with configuration management in code/IaC including Ansible and Terraform
  • Competent operating Java Runtime Environment (JRE) in large scale environments - ( runtime settings, JMX, troubleshooting, garbage collection, etc… )
  • Passionate about data driven operations; building and leveraging observability with tools such as Prometheus and Grafana
  • Knowledge and experience of containers and Kubernetes cluster
  • Hands on experience delivering complex software in an enterprise environment
  • Experience working in a remote team across multiple regions and time zones
  • Comfortable working with structured Change and Incident Management


Attach a resume file. Accepted file types are DOC, DOCX, PDF, HTML, and TXT.

We are uploading your application. It may take a few moments to read your resume. Please wait!