Full-time mid to senior DevOps Engineer / Site Reliability Engineer (SRE), who is comfortable with
applying modern “dev” principles to sysadmin practices: whole of lifecycle from design to release
and production support to ongoing development. We’re a dynamic engineering technology
company with plenty of opportunities to take the initiative in a highly technical environment.
The role is multifaceted: around one-third to one-half of your time will involve working with our
software, hardware, and test engineering teams to help them deliver our awesome products; the
balance will be more systems engineering SRE work in delivering the infrastructure that powers
company’s services and improving the flexibility and performance of our production environments.
- Write a Python program to poll a network device for health metrics to be ingested to the
monitoring and alerting system.
- Create a file sharing service so internal teams can securely supply sensitive data to partners.
- Troubleshoot layer two networking issues on trunk and access ports for the Software Engineering
team’s testing rack.
- Re-engineer our CI/CD system’s implementation including configuration management to deliver a more reliable, scalable, secure solution.
- Design and build a company-wide nearline storage and backup system based on ZFS.
- Provide guidance and consultation with engineering on improvements to build processes
including Dockerisation and artefact management.
- Deliver an authenticated application edge router (reverse proxy) solution, enabling authentication
for legacy services and future integration into an SSO identity service.
- Linux or Windows systems and higher layer services expertise (OS through to web and database services)
- Containerisation, container management and orchestration: Docker, Kubernetes, and LXC
- Provisioning and configuration management: Kickstart, Ansible, Ansible AWX/Tower, Puppet, Chef
- Solid networking fundamentals, design and troubleshooting knowledge (to the level of CCNA / CompTIA Network+, certification nice but by no means required)
- Proven troubleshooting and fault analysis skills
- Graceful under fire – able to stay calm and focused under pressure
- Interpersonal skills (working with developers, engineers, business and operations & partners)
- Ability to work independently and as part of a team
- Driven, with grit, and a let’s-get-it-done-right attitude
- Agile: flexible, open to change and able to preach the “continuous improvement” message
- Version control and repository management: Git, Artifactory
- Monitoring and alerting systems: Prometheus, Grafana
- Traffic management and content distribution systems: Traefik/Envoy/HAProxy/etc, Istio, Cloudflare
- CI/CD systems: Jenkins or Bitbucket Pipelines, or Travis CI, Circle CI, etc
- Web stack: Nginx or Apache, MongoDB, NodeJS, Meteor