Site Reliability Analyst
Calero MDSL, Edinburgh
- Full time
- Permanent
Apply on company site
Junior C# Software Develo...
SPARTA GLOBAL LIMITED, Edinburgh
- Full time
- Permanent
Apply on company site
Junior C# Automation Engi...
SPARTA GLOBAL LIMITED, Edinburgh
- Full time
- Permanent
Apply on company site
Junior Data Consultant
SPARTA GLOBAL LIMITED, Edinburgh
- Full time
- Permanent
Apply on company site
Junior DevOps Engineer
SPARTA GLOBAL LIMITED, Edinburgh
- Full time
- Permanent
Apply on company site
Platform Migration Projec...
Capco, Edinburgh
- Full time
- Permanent
Apply on company site
Management Consultant - A...
Capco, Edinburgh
- Full time
- Permanent
Apply on company site
Management Consultant - O...
Capco, Edinburgh
- Full time
- Permanent
Apply on company site
Data Architect
Natwest, South Gyle, City of Edinburgh
- Full time
- Permanent
Apply on company site
Manager - Tech (Integrati...
TN United Kingdom, Edinburgh
- Full time
- Permanent
Apply on company site
Knowledge Computing Resea...
Huawei, Edinburgh
- Full time
- Permanent
Apply on company site
Enterprise Data Architect
Royal London Group, Edinburgh
- Full time
- Permanent
Apply on company site
Data Analyst
Royal London Group, Edinburgh
- Full time
- Permanent
Apply on company site
Graduate Operations Analy...
Clearwater Analytics, Edinburgh
- Full time
- Permanent
Apply on company site
Senior ServiceNow Technic...
Lloyds Banking Group, Edinburgh
- Full time
- Permanent
Apply on company site
IT Support Apprentice
Farmfoods, Luggiebank, North Lanarkshire
- Full time
- Graduate programme
Apply on company site
IT Database Administrator...
Royal London Group, Edinburgh
- Full time
- Permanent
Apply on company site
Principal Security Consul...
Computacenter PLC, Edinburgh
- Full time
- Permanent
Apply on company site
Assistant Analyst - STR07...
Stirling Council, Braehead, Stirling
- Full time
- Permanent
Apply on company site
Software Dev Engineer, Pr...
Amazon.com, Inc, Edinburgh
- Full time
- Graduate programme
Apply on company site
Salary not available. View on company website.
Calero MDSL, Edinburgh
- Onsite working
- Full time
- Permanent
Posted 4 days ago, 4 Apr
Job ref: 69d9d3d87374463dbf9e5e95e7c6b67c
Full Job Description
Site Reliability Analyst (SRA) will develop an in-depth understanding of the hosted application platforms at both hardware and application level and can diagnose and resolve issues efficiently using defined playbooks. SRA is responsible for monitoring the application platforms to identify performance issues, an unexpected increase in load, applications errors through log analysis, capacity concerns, and any risks from single point of failure. SRA will work closely with IT, DBA's, Developers, client supports teams and internal Client Service Deliver Managers, to investigate recurring issues, client performance problems and outages. Working as part of the Site Reliability Team will be required to research and integrate alerts to provide proactive monitoring and awareness for other teams. Duties and Responsibilities of the job:
- Actively monitor the performance and availability of the hosted application, investigating common issues across clients, or platform versions. Develop performance baselines to measure clients against to understand areas which need investigated, or to alert development teams of potential issues as part of a regular report out at Development and Operation Meetings.
- Investigate issues across different types of servers and gateways including Web and Database, using inbuilt tools to run performance diagnostics to support investigations relating to client reported performance issues.
- Part of the wider Site Reliability Team which includes; using communication channels to update others on active issues, investigate overnight alerts, and react to client specific reported issues as a priority.
- Working with IT Infrastructure and IT Security to understand the data flow, service dependencies, permissions, data throughput and security.
- Work with Database Administrators, Development and client Service Operations Managers (SOM) to provide data analysis to evidence the issue or demonstrate problems have been resolved.
- During outage and incidents provide regular analysis reports to key business leads, Support desk and Technology team.
- Enhance monitoring and logging of key areas which are client impacting. Implement alerts and perform alert reviews to understand the effectiveness of these alerts.
- Provide regular reporting to the business on common trends, improvements made, or areas which have degraded.
- Understand application logging process, events, and what to look for during different types of situations, pre and post hotfix and upgrades.
- Develop automation through scripting for common tasks.
- Build wiki articles around new processes as well as update existing wiki articles for internal use. Maintain Service Catalogues for platform services.
- Respond to requests assigned through the ticketing service in a timely and efficient manner, work with Support desk and take ownership of running critical outage events and engaging with Escalation Managers
- Seek to learn and apply new technologies, analyze new situations, and design solutions using a variety of technologies.
- Assist with the IT Disaster Recovery plans for the hosted application, test and review
Minimum of 4 years of professional Application administration or equivalent experience is required - A combination of experience and education may be considered Experience and Training:
- Experience with Windows and Linux operating systems logs
- Experience of how Web Services and Database operate, specifically Microsoft IIS and SQL
- 3 years in a Client Support Role, IT Support, or other related role which has required data log analysis and research to identify issues.
- 3 years using monitoring services using tools like; Azure Monitor, Application Insights, SolarWinds or similar cloud/Application Performance Monitoring tools.