About the company:
Avenue Code is the leading software consultancy focused on delivering end-to-end development solutions for digital transformation across every vertical. We’re privately held, profitable, and have been on a solid growth trajectory since day one. We care deeply about our clients, our partners, and our people. We prefer the word ‘partner’ over ‘vendor’, and our investment in professional relationships is a reflection of that philosophy. We pride ourselves on our technical acumen, our collaborative problem-solving ability, and the warm professionalism of our teams.
Avenue Code has been believing in and promoting plurality actions for over 10 years, understanding that recognizing differences and fostering a safe environment, employment opportunities, representation, and support are the best ways to promote an increasingly equitable culture.
About the opportunity:
We are seeking a highly motivated temporary worker with experience in Site Reliability Engineering to join the SiteOps Engineering team and collaborate closely with one of our clients, a leading multinational finance company.
As an SRE, you will be reporting to the Senior Manager of Engineering and will be responsible for ensuring the reliability of our ecosystem (both Native and Web), contributing to overall change, incident and problem management, and partnering with cross-functional teams to drive continuous improvement.
This is a hybrid opportunity to work in Oakland, CA, or Charlotte, NC, 2-3x at the office.
Responsibilities:
- Contribute to overall change, incident and problem management in our environment with a focus on troubleshooting and fast restoration of our essential services and preventing future outages.
- Participate in a once-a-month 24×7 on-call rotation and take leadership of severe incidents to help minimize impact.
- Assist engineering teams by conducting truly blameless post-mortems with focused action items to drive continuous improvements.
- Provide insights on trends of issues affecting reliability and partner in cross-functional projects to provide scalable solutions.
- Review and advise on high-risk platform changes to minimize impact to the site and maximize success for stakeholders.
- Work within a large distributed system based on Cloud Native services.
- Maintain an automation-centric vision and incorporate SRE methodologies to increase reliability and decrease toil.
- Create operating standards to help drive reliability.
Required Qualifications:
- Experience with Site Reliability Engineering with a focus on Infrastructure, Platform, and Application (Cloud, Containerization, Container orchestration, Network, Application Reliability, Database Architecture) and an understanding of full stack and SDLC practices (Software Development Life Cycle) in DevOps or continuous release environment.
- Experience in running critical incidents in a global or company-wide context, engaging with executives and senior leadership, and leading root cause analysis sessions.
- Experience running and monitoring applications at scale, using metrics and tracing tools like, New Relic, Data Dog, Stackdriver, Zipkin, Prometheus, etc.
- Professional experience with Python, Go, or similar programming languages.
- Familiarity with SRE methodologies; passionate about solving operational challenges by using automation and software.
- Ability to communicate effectively vertically and horizontally within the organization through demonstrating written and verbal communication skills.
Nice to Have:
- Ability to drive troubleshooting through a pragmatic and collaborative approach.
- Can construct clear and concise insights from data to promote and champion measurable improvements.
- Experience working with Cloud Native services in a Public Cloud, e.g. Google Cloud Platform, AWS, Azure.
Avenue Code discloses salary range information based on our commitment to fairness and transparency. We consider a wide range of factors such as internal equity, geographic location, relevant education, qualifications, certifications, experience, skills, seniority, business or organizational needs, and others. At Avenue Code, it is not typical for an individual to be hired at or near the top of the range for their role, and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range for a Senior Site Reliability Engineer is from 120k to 150k per year.
Avenue Code reinforces its commitment to privacy and to all the principles guaranteed by the most accurate global data protection laws, such as GDPR, LGPD, CCPA and CPRA. The Candidate data shared with Avenue Code will be kept confidential and will not be transmitted to disinterested third parties, nor will it be used for purposes other than the application for open positions. As a Consultancy company, Avenue Code may share your information with its clients and other Companies from the CompassUol Group to which Avenue Code’s consultants are allocated to perform its services.
#Li-Remote