Staff DevOps Engineer — Alchemy
As an engineer focused on node operations and infrastructure at Alchemy, you’ll work within a fast-paced engineering team on the design, deployment, and continuous improvement of the blockchain infrastructure that powers our developer platform used globally. You’ll operate state-of-the-art systems for running blockchain RPC nodes at scale across many chains and regions, and leverage your experience to keep a highly available live system running for our customers.
Responsibilities:
-
Deploy, operate, and maintain blockchain RPC nodes across multiple chains and multiple geographic regions
-
Manage Kubernetes clusters that represent the underlying platform for all blockchain nodes
-
Perform rolling upgrades and hard fork migrations for blockchain clients across EVM and non-EVM chains
-
Operate on-call rotations, triage live incidents via PagerDuty, and coordinate resolution across teams for node outages, latency spikes, and SLO breaches
-
Develop and maintain AI agents / automation tooling for health checks, auto-heal, hard fork notifications
-
Deploy and manage services via ArgoCD and GitOps workflows (Helm charts)
-
Manage bare-metal and cloud infrastructure including provisioning, benchmarking, and hardware replacement
-
Respond to security advisories promptly and coordinate upgrades with minimal downtime
-
Contribute to postmortems and async review processes; track action items and follow up on resolutions
-
Collaborate cross-functionally with product, customer success, and other engineering teams on chain deprecations, capacity planning, and SLO reporting
What We’re Looking For:
-
Experience designing and operating large-scale, multi-region, multi-cloud production systems
-
Experience with Kubernetes (k3s or similar), including StatefulSets, storage management, Secrets and service mesh (Istio)
-
Experience with secrets management and access control in multi-cluster environments
-
Familiarity with automation frameworks for node health checks, upgrades, and remediation workflows
-
Experience with Infrastructure-as-Code (e.g. Terraform, Ansible, Pulumi, CloudFormation, Chef, Puppet, etc)
-
Experience with GitOps tooling – ArgoCD, Helm, and managing deployments
-
(Preferred) Experience with service mesh deployments such as Istio
-
Proficiency with cloud infrastructure and bare-metal management, including storage provisioning and snapshot management
-
Strong grasp of observability tooling – Grafana, Prometheus, Alertmanager – and experience building or tuning dashboards and alert rules
-
Comfort working in an on-call environment, triaging production incidents quickly and calmly using PagerDuty and structured runbooks
-
Ability to write clear technical documentation and postmortems, and contribute to async-first team communication
-
Experience with networking and configuring / managing VPC networks
-
A basic understanding of security best practices
-
(Preferred) Good understanding of web applications, microservice architecture
-
(Preferred) Experience working with startups
-
Passion for blockchain technologies and Web3
Perks:
-
Attractive salary package
-
Opportunity to work with the latest cloud and blockchain technologies
-
Flexible time away
-
Private Medical Insurance
-
Start-up environment: internal off-site hackathons, access to company-rented hacker house during summer
-
Opportunity to travel across offices
When applying, mention the word CANDYSHOP to show you read the job post completely.
