Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Much of our software development focuses on building infrastructure and eliminating work through automation.
We are distributed across time zones and continents, and we embrace remote work. In the EngOps team, we follow the infrastructure-as-code approach and practice GitOps. Our on-call rotation uses the follow-the-sun pattern.
We all have different backgrounds and are determined to help you succeed no matter where you are or who you are. If you think you would do a great job at Chainlink, we are looking forward to speaking with you, even if you don’t match 100% of the job requirements: those describe people we’ve usually had a great time working with, but they’re not a tick-box exercise.
As a Site Reliability Engineer for the Engineering Operations team you will:
- Maintain all on-chain and job orchestration configurations
- Automate and reduce complexities around product operations
- Evangelize and enact best practices as experts to guide high-quality Site Reliability Engineering
- Make tooling user-friendly and accessible to create self-sufficient operational experts across the company and our network of Node Operators
- Continue delivering operational tasks in agreed SLAs to expand scalability and reliability
- Deliver high product velocity while protecting reliability and operability
- Support production systems by being on-call
- Deploy and maintain various externally-facing services
- Improve the reliability and observability of Chainlink services
- Provide our engineers with reliable automations and empower them to deploy and maintain Chainlink services in a repeatable and stable manner
- Support monitoring services that watch over the entire Chainlink network
- Support Incident Response by shortening the duration of incidents while keeping an active feedback loop that assures operations and reliability of our systems get better over time
- Support services before they go live through activities such as system design consulting, capacity planning and launch reviews
- Engage in and improve the whole lifecycle of services—from inception and design, through to deployment, operation and refinement
- Manage execution of project priorities, deadlines, and deliverables
- Provide technical leadership for the local team and work closely with partner team technical leads
Skills and Qualifications
- Excellent communication skills and a sense of ownership
- 4+ years of relevant professional experience. You have a software engineering background and/or an operations background and have worked as an SRE or related role before
- Experience architecting, developing, and troubleshooting distributed systems
- Fluency on design patterns to build performant, resilient and highly available systems
- Proficient software developer, you not only have the ability to read and write code, but also identify opportunities and implement sound solutions to automate routine tasks and eliminate toil
- Experience with system architecture. You can create a design document for a performant and highly available application, involving multiple types of storage, cross-region load-balancing, caching layers and messaging infrastructure
- Excitement for blockchain and Web 3.0
- Be willing to go on-call. Reliability is our most important feature, because on-call is an essential component of a reliable system we take it very seriously
- Professional experience with Golang, TypeScript, or both
- Experience running blockchain full node operator is a big plus
- Experience with Chainlink as a developer or a node operator is a big plus
- Comfort working with network protocols, proxies, and load balancers
- Experience with CI/CD pipelines. You’ve worked on both software delivery and cloud-based services deployment
- Experience with information security and DevSecOps
- Experience working remotely in a distributed team
- Experience with container orchestration
- Some of the tools and services we use daily or almost daily are:
- AWS; Terraform/Terragrunt; Kubernetes, Calico and ArgoCD; Prometheus and Grafana; GitHub Actions; Packer
- We expect you to be comfortable with most of those tools and very proficient in several of them. #LI-RD1
At Chainlink Labs, we’re committed to the key operating principles of ownership, focus, and open dialogue. We practice complete ownership, where everyone goes the extra mile to own outcomes into success. We understand that unflinching focus is a superpower and is how we channel our activity into technological achievements for the benefit of our entire ecosystem. We embrace open dialogue and critical feedback to arrive at an accurate and truthful picture of reality that promotes both personal and organizational growth.
About Chainlink Labs
Chainlink is the industry standard oracle network for connecting smart contracts to the real world. With Chainlink, developers can build hybrid smart contracts that combine on-chain code with an extensive collection of secure off-chain services powered by Decentralized Oracle Networks. Managed by a global, decentralized community of hundreds of thousands of people, Chainlink is introducing a fairer model for contracts. Its network currently secures billions of dollars in value for smart contracts across the decentralized finance (DeFi), insurance, and gaming ecosystems, among others. The full vision of the Chainlink Network can be found in the Chainlink 2.0 whitepaper. Chainlink is trusted by hundreds of organizations—from global enterprises to projects at the forefront of the blockchain economy—to deliver definitive truth via secure, reliable data.
This role is location agnostic anywhere in the world, but we ask that you overlap some working hours with Eastern Standard Time (EST).
We are a fully distributed team and have the tools and benefits to support you in your remote work environment.