We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Senior Software Engineer - Application Reliability , Hybrid

Cisco Systems, Inc.
$199,700.00 to $254,600.00
life insurance, vision insurance, parental leave, paid holidays, sick time, 401(k)
United States, California, San Jose
170 W Tasman Dr (Show on map)
May 19, 2026
The application window is expected to close on: 06/20/2026

Job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.

This position is based in San Jose, CA or North Carolina and operates under a hybrid work model.

Meet the Team

Join Cisco's Enterprise AI team, the core group enabling Generative AI powered experiences across Cisco. Our mission is to build secure, scalable AI platforms that empower teams to safely develop, deploy, and operationalize AI-powered solutions. We operate at the intersection of applied AI, cloud infrastructure and security - partnering across engineering, security, compliance, and product teams to bring trusted AI to life at an enterprise scale.

We are a fast-growing, highly collaborative team of platform engineers, AI engineers, and data scientists who value technical depth, ownership, and pragmatic execution. What makes this team exciting is the opportunity to define how secure Generative AI is built and governed inside a global technology leader.
As a Senior Software Engineer in Application Reliability, you will own the reliability of our AI-powered applications and features from the user's perspective.

While our infrastructure SRE team ensures the platform is healthy, your focus will be on feature uptime, usage trends, automated issue identification, and self-healing remediation at the application layer. You will build LangGraph-based agents for automated diagnostics, Looker dashboards for observability, and evaluation harnesses for agent quality - all powered by BigQuery, BigTable, and Python. You will partner closely with application developers, data engineers, and infrastructure SREs to ensure our APIs, RAG systems, agents, and user-facing features are reliable, observable, and continuously improving.

Your Impact
  • Define, implement, and enforce feature-level SLIs, SLOs, and error budgets for APIs, RAG systems, AI agents, and user-facing applications.

  • Build and maintain application observability systems using Looker dashboards on BigQuery and BigTable - providing real-time visibility into feature health, error patterns, and usage trends for developers, PMs, and leadership.

  • Design and build LangGraph-based agents for automated issue identification and remediation: anomaly detection on BQ logs, root cause diagnosis, auto-rollback, feature flag kill switches, and self-healing workflows.

  • Develop agent evaluation harnesses to benchmark agent performance, test multi-step workflows, handle non-deterministic outputs, and run regression testing as agents evolve.

  • Write complex SQL (BigQuery) for usage trend analysis, anomaly detection, and operational analytics; design BQ table schemas optimized for observability and debugging.

  • Analyze application usage trends and adoption metrics to proactively identify reliability risks, capacity needs, and degraded user experiences before they become incidents.

  • Partner with application development teams to embed reliability practices into the development lifecycle: deployment safety (canary, progressive rollout), structured logging standards, and distributed tracing.

  • Lead application-level incident response, root cause analysis, and blameless postmortems focused on feature impact rather than infrastructure symptoms.

  • Build Python-based tooling and automation to reduce mean time to detect (MTTD) and mean time to resolve (MTTR) for application-layer issues.

  • Stay current with the rapidly evolving AI landscape (new frameworks, tools, and paradigms) and apply emerging techniques to improve platform reliability and developer productivity.

Minimum Qualifications
  • 10+ years of experience in software engineering with significant focus on reliability, observability, or production operations; Bachelor's or Master's Degree in Computer Science, Engineering, or a related technical discipline.

  • Strong Python development skills, with experience building production tooling, automation, and agent-based systems.

  • Production GCP experience - deploying and managing applications on GKE (Kubernetes), deep SQL expertise with BigQuery (complex queries, window functions, schema design, cost optimization), and hands-on experience with BigTable (or equivalent) for high-throughput operational data.

  • Proven experience designing and operating application-level SLI/SLO frameworks, burn-rate alerting, and error budget policies.

  • Strong debugging skills at the application layer - distributed tracing, profiling, structured log analysis, and dependency mapping.

Preferred Qualifications
  • Experience building agent evaluation harnesses (benchmarking, regression testing, guardrail validation for AI agents).

  • Familiarity with A2A protocols, streaming architectures, and event-driven systems.

  • Experience with deployment safety patterns: feature flags, canary deployments, progressive rollouts, and automated rollback.

  • Experience with GCP observability services (Cloud Logging, Cloud Trace, Cloud Monitoring).

  • Exposure to AIOps concepts: ML-driven anomaly detection, automated root cause analysis, intelligent alerting.

  • Experience driving reliability culture across engineering teams - SLO adoption, postmortem processes, and reliability reviews.

  • Active engagement with the evolving AI ecosystem; awareness of emerging tools and frameworks.

  • Hands-on experience with GenAI application development: LangGraph, agent engineering, prompt design, and agentic workflows.

  • Experience building Looker dashboards and Look ML models for operational observability.

Why Cisco?

At Cisco, we're revolutionizing how data and infrastructure connect and protect organizations in the AI era - and beyond. We've been innovating fearlessly for 40 years to create solutions that power how humans and technology work together across the physical and digital worlds. These solutions provide customers with unparalleled security, visibility, and insights across the entire digital footprint.

Fueled by the depth and breadth of our technology, we experiment and create meaningful solutions. Add to that our worldwide network of doers and experts, and you'll see that the opportunities to grow and build are limitless. We work as a team, collaborating with empathy to make really big things happen on a global scale. Because our solutions are everywhere, our impact is everywhere.

We are Cisco, and our power starts with you.

Message to applicants applying to work in the U.S. and/or Canada: The starting salary range posted for this position is $199,700.00 to $254,600.00 and reflects the projected salary range for new hires in this position in U.S. and/or Canada locations, not including incentive compensation*, equity, or benefits.

Individual pay is determined by the candidate's hiring location, market conditions, job-related skillset, experience, qualifications, education, certifications, and/or training. The full salary range for certain locations is listed below. For locations not listed below, the recruiter can share more details about compensation for the role in your location during the hiring process.

U.S. employees are offered benefits, subject to Cisco's plan eligibility rules, which include medical, dental and vision insurance, a 401(k) plan with a Cisco matching contribution, paid parental leave, short and long-term disability coverage, and basic life insurance. Please see the Cisco careers site to discover more benefits and perks. Employees may be eligible to receive grants of Cisco restricted stock units, which vest following continued employment with Cisco for defined periods of time.

U.S. employees are eligible for paid time away as described below, subject to Cisco's policies:

  • 10 paid holidays per full calendar year, plus 1 floating holiday for non-exempt employees

  • 1 paid day off for employee's birthday, paid year-end holiday shutdown, and 4 paid days off for personal wellness determined by Cisco

  • Non-exempt employees** receive 16 days of paid vacation time per full calendar year, accrued at rate of 4.92 hours per pay period for full-time employees

  • Exempt employees participate in Cisco's flexible vacation time off program, which has no defined limit on how much vacation time eligible employees may use (subject to availability and some business limitations)

  • 80 hours of sick time off provided on hire date and each January 1st thereafter, and up to 80 hours ofunused sick timecarried forwardfrom one calendar yearto the next

  • Additional paid time away may be requested to deal with critical or emergency issues for family members

  • Optional 10 paid days per full calendar year to volunteer

For non-sales roles, employees are also eligible to earn annual bonuses subject to Cisco's policies.

Employees on sales plans earn performance-based incentive pay on top of their base salary, which is split between quota and non-quota components, subject to the applicable Cisco plan. For quota-based incentive pay, Cisco typically pays as follows:

  • .75% of incentive target for each 1% of revenue attainment up to 50% of quota;

  • 1.5% of incentive target for each 1% of attainment between 50% and 75%;

  • 1% of incentive target for each 1% of attainment between 75% and 100%; and

  • Once performance exceeds 100% attainment, incentive rates are at or above 1% for each 1% of attainment with no cap on incentive compensation.

For non-quota-based sales performance elements such as strategic sales objectives, Cisco may pay 0% up to 125% of target. Cisco sales plans do not have a minimum threshold of performance for sales incentive compensation to be paid.

The applicable full salary ranges for this position, by specific state, are listed below:

New York City Metro Area:

$199,700.00 - $292,800.00

Non-Metro New York state & Washington state:

$174,500.00 - $260,500.00

* For quota-based sales roles on Cisco's sales plan, the ranges provided in this posting include base pay and sales target incentive compensation combined.

** Employees in Illinois, whether exempt or non-exempt, will participate in a unique time off program to meet local requirements.

Applied = 0

(web-77cf7d65c7-z52c2)