[Remote] Senior Manager, Resiliency Engineering & L4 Support (Digital Banking)
BMONote: The job is a remote job and is open to candidates in USA. BMO is a financial institution focused on creating positive change for customers and communities. They are seeking a Senior Manager in Resiliency Engineering & L4 Support to lead resiliency engineering and support for digital banking channels, focusing on performance optimization, stability, and scalability.
Responsibilities
- Leads resiliency engineering and L4 development support for the digital banking channels (web and mobile)
- Owns assessment, governance, and implementation of resilience patterns - circuit breakers, timeouts/retries, bulkheads, graceful degradation, failover/DR - and code-level observability
- Acts as the development arm for SRE/Service Delivery to deliver instrumentation, production hardening, and fixes for incidents requiring code changes
- Defines the strategy for continuous enhancement, performance optimization, stability, and scalability across our modern Java services, web clients, and native/hybrid mobile apps
- Establishes and governs resiliency standards, design guardrails, readiness checks, and production change controls
- Implements fault-tolerance patterns and libraries in services and clients; enables kill-switches, rate limiting, and backpressure
- Delivers observability: distributed tracing, metrics, logs, health endpoints, synthetic probes, and error taxonomies
- Serves as L4 dev support for production incidents
- Defines and maintains runbooks, SLIs/SLOs for critical journeys, DR playbooks; conducts regular failover exercises
- Partners with SRE/Service Delivery to translate operational needs into code-level instrumentation and monitoring enhancements
- Collaborates with API Governance and Platform Engineering on gateway policies, dependency hardening, and release safety (canary/blue-green)
- Improves performance and stability via caching, connection pooling, dependency isolation, capacity planning, and traffic shaping
- Guides DR architecture (multi-AZ/region, active-active/passive) aligned to RTO/RPO and regulatory requirements
- Influences cross-functional delivery and provides technical mentorship without formal line management
- Fosters a culture aligned to BMO purpose, values and strategy and role models BMO values and behaviours in all that they do
- Ensures alignment between values and behaviour that fosters diversity and inclusion
- Regularly connects work to BMO’s purpose, sets inspirational goals, defines clear expected outcomes, and ensures clear accountability for follow through
- Builds interdependent teams that collaborate across functional and operating groups to create the highest value for all stakeholders
- Improves team performance, recognizes and rewards performance, coaches employees, supports their development, and manages poor performance
- Operates at a group/enterprise-wide level and serves as a specialist resource to senior leaders and stakeholders
- Applies expertise and thinks creatively to address unique or ambiguous situations and to find solutions to problems that can be complex and non-routine
- Implements changes in response to shifting trends
- Broader work or accountabilities may be assigned as needed
Skills
- Resiliency patterns: circuit breakers, retries/timeouts, bulkheads, fallbacks, rate limiting, backpressure, feature flags/kill-switches
- Observability: OpenTelemetry-based tracing; metrics/logging with Prometheus/Grafana and Splunk/ELK; health endpoints and synthetic monitoring
- DR/BCP and failover: multi-AZ/region, active-active/passive designs; clear RTO/RPO ownership
- CI/CD and release safety: Git-based pipelines, canary/blue-green, automated rollbacks, progressive delivery
- Cloud and platforms: AWS (networking, compute, storage, databases, monitoring) and Red Hat OpenShift/Kubernetes; containerization
- Backend: Java with Spring/Spring Boot; API gateways and governance; resilience libraries (e.g., Resilience4j)
- Web and mobile: modern web apps (Angular) and hybrid/native mobile (Ionic, iOS, Android) including offline-first and graceful degradation
- Data and integration: RDBMS/NoSQL, caching (Redis), messaging/streaming (Kafka/SQS); idempotency and exactly-once patterns
- Web architecture, server-side concepts, and version control
- Technical writing/documentation; verbal & written communication
- Organization, collaboration & team skills; relationship building
- Analytical and problem-solving skills; influence skills; data-driven decision making
- Learning agility; ability to operate across multiple stakeholder groups
- Technical leadership role with direct reports; candidates with informal team lead experience are encouraged to apply
- Typically 7+ years of relevant experience in software engineering/tech lead roles and post-secondary degree in a related field, or equivalent experience
Benefits
- Health insurance
- Tuition reimbursement
- Accident and life insurance
- Retirement savings plans
Company Overview
- We’re a bank, but there’s more to it than that. When you join BMO, it opens a world of opportunities. It was founded in 1817, and is headquartered in Toronto, Ontario, CAN, with a workforce of 10001+ employees. Its website is http://www.bmo.com.
Company H1B Sponsorship
- BMO has a track record of offering H1B sponsorships, with 7 in 2025, 2 in 2024, 6 in 2023, 4 in 2022, 2 in 2021, 2 in 2020. Please note that this does not guarantee sponsorship for this specific role.
Job Type
- Job Type
- Full Time
- Location
- United States
Share this job:
