Remote

Observability Architect

AHEAD
United States
Nov 16, 2024
We are looking for talented, creative, and proactive individuals who are passionate about solving complex business problems and contributing to the next generation of modern applications. Our goal is to help our customers understand the connections between application performance, user experience, and business outcomes, thereby creating exceptional customer experiences. Join us in shaping the future of Observability Engineering within our Intelligent Operations team with innovative data and integration solutions tools. Internally known as a Principal Technical Consultant. Responsibilities Architect and implement scalable and resilient observability solutions, leveraging tools like Datadog, New Relic, AppDynamics, Dynatrace, and open-source technologies for large-scale enterprise environments. Lead the design, development, and optimization of monitoring, logging, and tracing systems, ensuring scalability, cost-efficiency, and business alignment. Serve as a technical advisor and thought leader, driving the integration of observability best practices across our client's engineering and product teams. Collaborate with cross-functional stakeholders, including engineering leads, product managers, and business executives, to design observability strategies aligned with business goals. Mentor and coach team members, building technical expertise within the organization and fostering a culture of innovation and operational excellence. Conduct architectural reviews and evaluations to ensure observability solutions meet business, security, and compliance requirements. Stay ahead of industry trends in AI-powered observability, logging, monitoring, and cloud-native technologies, and guide teams in adopting these innovations. Create and maintain comprehensive documentation for observability systems, frameworks, and workflows, ensuring knowledge sharing across teams. Establish and refine KPIs, develop custom dashboards, implement AIOps rules, and proactively address performance bottlenecks and anomalies. Participate in post-sales activities, including customer onboarding, training, and providing advanced technical escalation support when required. Drive strategic technology planning, ensuring observability infrastructure remains scalable, resilient, and cost-effective as the business grows. Qualifications 10-15 years of progressive hands-on experience in Observability, Application Performance Management, or related fields, with at least 5 years in senior or lead roles. Extensive experience with Application Performance Management tools, including Datadog, New Relic, AppDynamics, Dynatrace, Splunk ITSI, Honeycomb, Chronosphere, Riverbed Aternity/Alluvio, ExtraHop, and Logic Monitor. Deep expertise in cloud-native, open-source observability solutions, such as Prometheus, Grafana, the ELK stack/Elastic.io, and OpenTelemetry (OTEL). Advanced understanding of public cloud observability tools, including AWS CloudWatch, Azure Application Insights, and Google Cloud Operations Suite (formerly Stackdriver). Strong foundational knowledge of distributed systems, networking, and database technologies, with experience architecting and scaling solutions in enterprise environments. Proven experience leading teams and guiding technical direction, including architecting and building complex solutions from the ground up. Operational background with familiarity in ITIL ITSM, SRE, and DevOps practices, with the ability to implement and evangelize these principles across teams. Demonstrated ability to mentor and develop junior engineers and foster a culture of innovation and collaboration. Exceptional problem-solving, communication, and organizational skills, with experience presenting to executive stakeholders.