Infrastructure/Site Reliability Engineer
Infrastructure Site Reliability Engineer
CV-Library is one of the leading job boards in the UK, attracting over 10 million monthly visits and boasting a CV database of over 15 million UK workers. This is why we are the job board of choice for over 10,000 customers who are looking to hire the best talent for their business. Our U.S brand Resume-Library is one of the fastest growing job boards in North America and we have ambitious plans for further global expansion.
We are looking for a Infrastructure/Site Reliability Engineer to join an existing highly skilled team to deliver in a DevOps environment supporting production applications, back-office services, cloud services, platform improvements, and acting as both advisor and coach to other team members. You will have experience of diagnosing and fault-finding incidents using data insights and liaising closely with the delivery teams reporting progress, gathering data, intelligence and information as requested. Within this role you will be a key member of our team and will help scope the on-going technical strategy across both CV-Library and Resume-Library.
Responsible for the performance and reliability of the company’s global online platforms. Working within the Technology Team, troubleshooting issues with services via proactive/reactive monitoring, alerts and logging, service requests communicated via Jira, email, Sprint meetings and Stand Ups
Enhancing existing service's tech stack/configurations to improve site performance, reduce issues through forensic analysis and be responsible for availability management, latency, efficiency, change management, monitoring, emergency response, and capacity planning.
Record data and manage issues with a view to participation in reviews and Blameless Post-Mortems.
Explore and deliver on opportunities to implement automation and scripting of services, environments and toolsets
Liaise closely with the application Developers, Sprint Teams and the Development Managers reporting progress, gathering data, readings and information as requested.
Design, implement, calibrate and validate to company procedures and processes alongside routine service, emergency service and product updates as required.
Create a bridge between Development and Operations teams by applying an ‘as-a-service' mindset to system administration, management and build topics. Gain exposure to systems in both staging and production, as well as all technical teams. Take part in work with software development, support, IT operations and on-call duties
Be an advocate for change with an innovative and Growth Mindset, be an engaging collaborative member of the Technology Team and actively support your colleagues in Operations and the wider team.
Skills and Experience –
Infrastructure-As-Code – Terraform or similar
Monitoring platforms – ELK, Grafana, Prometheus or similar
Desirable (Using or Supporting):
Google Cloud Platform
Observability/APM Platforms (preferably New Relic)
PHP/Perl/ Go Lang/Java/ Python
You will have an understanding of the mechanics of high-traffic high-availability online websites and related back-office services to support and evolve pragmatic solutions, while being able to explain technical details to non-technical stakeholders. You will have previously operated within an Agile environment using both Kanban and Scrum (preferably via Jira)