Spartan Capital Group is seeking a Software Engineer/ Site Reliability Engineer (SRE) to be part of the new enterprise Site Reliability Engineering (SRE)
In This Role, You Will:
Lead complex technology initiatives including those that are companywide with broad impact;
Act as a key participant in developing standards and companywide best practices for engineering complex and large scale technology solutions for technology engineering disciplines;
Design, code, test, debug, and document for projects and programs;
Review and analyze complex, large-scale technology solutions for tactical and strategic business objectives, enterprise technological environment, and technical challenges that require in-depth evaluation of multiple factors, including intangibles or unprecedented technical factors;
Make decisions in developing standard and companywide best practices for engineering and technology solutions requiring understanding of industry best practices and new technologies, influencing and leading technology team to meet deliverables and drive new initiatives;
Collaborate and consult with key technical experts, senior technology team, and external industry groups to resolve complex technical issues and achieve goals;
24x7 on call responsibilities for major incident response, ability to lead advanced troubleshooting across the stack in complex failure scenarios;
Establish the Site Reliability Engineering practice, igniting the practice, principles, and culture leading by example. Assist in training the organization by building the practice, contributing to the Spartan SRE Playbook, Engineering Academies, Tech Talks, and Demo's;
Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLA adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications, alerting/notification systems;
Share support responsibilities for critical applications and customer journeys onboarded to SRE including remediation of issues through Agile, conduct blameless post mortems, root cause analysis and introduce continuous improvement solving problems once and for all with the goal of no repeats.
5+ years of Software Engineering experience.
5+ years of systems analysis experience, systems programming experience or combination of both.
5+ years of experience w/ Agile methodologies .
3+ years of experience in DevOps.
Knowledge with message systems (e.g. Apache Kafka).
Strong systems engineering and administration skills.
Experience with monitoring tools as Prometheus .
Experience working in an SOA environment.
Experience working with cloud based systems such as AWS.
Working knowledge of one or more database products (e.g. Oracle, MySQL, Mongo).