Job Locations
US-MO-Saint Louis
Primary Posting Location : City
|
Saint Louis
|
Primary Posting Location : State/Province
|
MO
|
Primary Posting Location : Postal Code
|
63101
|
Primary Posting Location : Country
|
US
|
Requisition ID |
2024-434297
|
Position Type |
Full Time
|
Category |
Professional: (IT, Finance, Legal, HR, Talent Acquisition, Administrative, Customer Service)
|
Minimum |
USD $111,700.00/Yr.
|
Maximum |
USD $145,200.00/Yr.
|
Summary
As an SRE Architect with a specialization in Devops, monitoring and diagnostics, you will play a critical role in ensuring the reliability, availability, and performance of our mission-critical services. You will design and implement end-to-end monitoring solutions, build observability pipelines, and help create scalable systems for proactive incident detection, diagnostics, and root cause analysis. In this role, you will work closely with engineering, product, and operations teams to drive a culture of reliability and continuous improvement. Monitoring & Observability:
Design and implement comprehensive monitoring and alerting solutions for production systems across multiple environments (cloud, on-prem, hybrid).
- Develop and refine metrics collection and visualization strategies using tools like Prometheus, Grafana, OpenTelemetry, and others.
- Build dashboards and custom monitoring solutions to ensure system health, performance, and security.
- Establish SLIs (Service Level Indicators), SLOs (Service Level Objectives), and SLAs (Service Level Agreements) to align with business goals.
Incident Management & Diagnostics:
- Develop and implement tools and systems for real-time diagnostics and root cause analysis during incidents.
- Lead post-mortem analysis and drive remediation of systemic issues to prevent future incidents.
- Design diagnostic tools and automation to reduce mean time to detection (MTTD) and mean time to resolution (MTTR).
- Collaborate with engineering teams to define monitoring standards and ensure that new features and services meet reliability and observability requirements.
System Design & Architecture:
- Architect scalable, resilient, and highly available systems with observability baked in from the start.
- Apply SRE principles to design and optimize services for reliability, availability, and performance.
- Identify and address single points of failure, bottlenecks, and other operational risks in production environments.
Automation & Tooling:
- Create, maintain, and improve automation tools that enhance monitoring, diagnostics, and incident response.
- Integrate monitoring and observability tools into CI/CD pipelines for proactive issue detection and remediation.
- Contribute to the development of custom diagnostic tools for troubleshooting complex, distributed systems.
Collaboration & Knowledge Sharing:
- Collaborate with software engineering, platform engineering, and DevOps teams to ensure seamless integration of monitoring and diagnostics practices.
- Mentor and coach junior SREs and other team members on best practices for observability and incident management.
- Stay up-to-date with the latest industry trends and innovations in monitoring, diagnostics, and reliability engineering.
Education & Training Experience:
- Experience with advanced observability techniques, such as synthetic monitoring, canary deployments, and feature flags.
- Certification in cloud platforms (AWS, GCP, Azure), or monitoring tools (e.g., Prometheus Certified Associate).
- Previous experience in an SRE or DevOps leadership role.
- Knowledge of serverless architecture, microservices, and edge computing environments.
- Strong experience in distributed systems, cloud platforms (AWS, GCP, Azure), and container orchestration (Kubernetes, Docker).
- Deep knowledge of monitoring tools such as Datadog and Cloud Monitoring
- Proficient in instrumentation techniques (e.g., OpenTelemetry, StatsD, custom metrics).
- Experience with log aggregation and analysis tools like ELK Stack (Elasticsearch, Logstash, Kibana), Splunk, or similar.
- Expertise in alerting and notification systems, including PagerDuty, Opsgenie, or VictorOps.
Architect position This position is an individual contributor. Travel required: 5% Job Will Remain Open Until Filled
Responsibilities
The Company is one of North America's leading sales and marketing agencies specializing in outsourced sales, merchandising, category management, and marketing services to manufacturers, suppliers, and producers of food products and consumer packaged goods. The Company services a variety of trade channels including grocery, mass merchandise, specialty, convenience, drug, dollar, club, hardware, consumer electronics, and home centers. We bridge the gap between manufacturers and retailers, providing consumers access to the best products available in the marketplace today. Responsibilities
- Leads medium-large-scale projects throughout the entire lifecycle: solution architecture, engineering design, development, testing, production, and subsequent fixes and improvements
- Provides technical guidance to the executive team and makes wide-scale architectural and design decisions. Estimates, assesses, and manages project timelines with the management and executive teams
- Reviews designs and mission critical code to ensure is clear, concise, tested, and easily understood by others as well as meets standards, architectural principles, and NFRs
- Mastered understanding of all components of key features and architecture for multiples products, with a high-level understanding of several other products, integrations, and capabilities.
- Understands, advocates, and contributes ADV technology and engineering standards and technology best practices
- Demonstrates an ability to succeed in a wide range of complex technical situations across multiple axes: e.g., scale, uncertainty, and interconnectedness.
- Is a resource for other teams that need help with adjacent features.
- Advises the management team with insights and recommendations that will improve the team. Helps to create job description requirements, and participates in interview loops. Mentors multiple teammates.
Supervisory Responsibilities Direct Reports This position does not have supervisory responsibilities for direct reports. Indirect Reports May delegate work to others and provide guidance, direction, and mentoring to indirect reports. Travel Requirements This position requires 10% travel. Minimum Qualifications Education Level: Bachelor's degree in Computer Science, Software Engineering, or related field. Master's degree preferred. Experience Requirements: 5-10+ years experience in engineering, programming, software development, data structures, algorithms, operating systems, networks, and concurrent/event-based development. Environmental & Physical Requirements Office / Sedentary Requirements Incumbent must be able to perform the essential functions of the job. Work is performed primarily in an office environment. Typically, requires the abilty to sit for extended periods of time (66%+ each day), ability to hear telephone, ability to enter data on a computer and may require the ability to lift up to 10lbs. Knowledge, Skills, and Abilities
- Advanced understanding of engineering, programming, and software development foundations.
- Strong knowledge of data structures, algorithms, operating systems, networks, and programming languages.
- Expertise in concurrent and event-based development, and development/test frameworks.
- Exceptional leadership and strategic decision-making skills.
- Ability to work collaboratively and influence senior leadership in shaping the company's technology and product direction.
Additional Information Regarding Job Duties and Job Descriptions Job duties include additional responsibilities as assigned by one's supervisor or other manager related to the position/department. This job description is meant to describe the general nature and level of work being performed; it is not intended to be construed as an exhaustive list of all responsibilities, duties, and skills required for the position. The Company reserves the right at any time with or without notice to alter or change job responsibilities, reassign or transfer job positions, or assign additional job responsibilities, subject to applicable law. The Company shall provide reasonable accommodations of known disabilities to enable a qualified applicant or employee to apply for employment, perform the essential functions of the job, or enjoy the benefits and privileges of employment as required by the law.
Important Information
The above statements are intended to describe the general nature and level of work being performed by people assigned to this position. They are not intended to be an exhaustive list of all responsibilities, duties and skills required of associates so classified.
The Company is committed to providing equal opportunity in all employment practices without regard to age, race, color, national origin, sex, sexual orientation, religion, physical or mental disability, or any other category protected by law. As part of this commitment, the Company shall provide reasonable accommodations of known disabilities to enable an applicant or employee to apply for employment, perform the essential functions of the job, or enjoy the benefits and privileges of employment as required by the law.
CONNECT TO YOUR CAREER
Not ready to apply? Connect with us for general consideration.
|