Rather than defining that all IT service requests will be fulfilled in five hours, for example, create separate SLAs for each IT service you want to track. A service level agreement is created to describe the quality of service a customer or end user can expect from a service provider. The final document is typically called an operations support plan. In summary, service level management allows an organization to move from a reactive support model to a proactive support model where network availability and performance levels are determined by business requirements, not by the latest set of problems. It includes critical success factors for service-level management and performance indicators to help evaluate success. You will not achieve the desired service level overnight. This scenario works well when the organization is building basic reactive support SLAs. Let us look at the objectives of this lesson. The network operations group and the necessary tools groups can perform the following metrics. New phones will be ordered and delivered within one week of request. Additional details include the following: Onsite support business hours and procedures for off-hours support, Priority definitions, including problem type, maximum time to begin work on the problem, maximum time to resolve the problem, and escalation procedures, Products or services to be supported, ranked in order of business criticality, Support for expertise expectations, performance-level expectations, status reporting, and user responsibilities for problem resolution, Geographic or business unit support-level issues and requirements, Problem management methodology and procedures (call-tracking system), Network error detection and service response, Network availability measurement and reporting, Network capacity and performance measurement and reporting. Networking organizations tend to struggle with proactive service definitions for several reasons. Let them become part of the process so they can understand your service levels and you can write your SLAs to their needs. Keep in mind that these statistics may apply only to completely redundant core networks and don't factor in non-availability due to local-loop access, which is a major contributor to non-availability in WAN networks. For more proactive management SLA aspects, we recommend a technical team of network architects and application architects. You should also cover current initiatives and progress in improving individual situations. The environment uses backup generators and UPS systems for all network components and properly manages power. Build a Clear Strategy. Your SLA should define any usual and unusual situations that will hinder or prevent IT service processing. These postings are my own and do not necessarily represent BMC's position, strategies, or opinion. Here are some tips for taking SLAs to a whole new level of ease and effectiveness. The availability model in the next section can help you set realistic goals. Step 8: Determine the Parties Involved in the SLA, Step 10: Understand Customer Business Needs and Goals, Step 11: Define the SLA Required for Each Group, Step 14: Hold Workgroup Meetings and Draft the SLA, Step 16: Measure and Monitor SLA Conformance. We recommend general definitions by geographic area. Make sure that user groups understand that additional levels of service will cost more and let them make the decision if it is a critical business requirement. The charter should express the goals, initiatives, and time frames for the SLA. Include the first area of proactive service definitions in all operations support plans. These individuals communicate SLA issues to their respective workgroups. Another measure of service level management success is the service level management review. However, you may be interested in comparing the two to understand potential theoretical availability compared to the actual measured result. The service level definition for primary goals, availability, and performance should include: Parties responsible for measuring availability and performance, Parties responsible for availability and performance targets. For example, a customer might insist his application is the most critical within the corporation when in reality the cost of downtime for that application is significantly less than others in terms of lost revenue, lost productivity, and lost customer goodwill. Primary service/support SLAs will normally have many components, including the level of support, how it will be measured, the escalation path for SLA reconciliation, and overall budget concerns. In these cases, a set budget is allocated to the network, which may overreact to current needs or grossly underestimate the requirement, resulting in failure. This can lead a support organization into providing premier service to individual groups, a scenario that may undermine the overall service culture of the organization. This helps make the SLA process similar to any modern quality improvement program. By understanding the needs of the various business groups, the initial SLA document will be much closer to the business requirement and desired result. A network analyst and an application or server support application should create the application profile. 11. The operations group must be prepared for this initial flood of issues and additional short-term resources to fix or resolve these previously undetected conditions. Joe Hertvik works in the tech industry as a business owner and an IT Director, specializing in Data Center infrastructure management and IBM i management. The organization then set service level goals for availability and made agreements with user groups. Failing to implement SLAs is not detrimental if the networking organization can build service level definitions that meet general business requirements. The following table shows how an organization might create a service definition for link/device-down conditions. The first category of proactive service level definitions is network errors. The meeting helps target individual problems and determine solutions based on root cause. These guarantee levels are sometimes simply marketing and sales methods used to promote the carrier. Best Practices in Service-Level Management Published: 02 November 1998 ID: G0074018 Analyst(s): Mike Rhone, Tammy Kirk Summary SLAs must focus on local business requirements, or risk end users developing alternative resources at each site. It’s just as important to define where an SLA does not apply as where it does apply. Without this definition (or management support), the organization can expect variable support, unrealistic user expectations, and ultimately lower network availability. When looking at service and support metrics, representatives of the organization found that hardware replacement was taking approximately 24 hours, much longer than the original estimate because the organization had budgeted only four. Do not create SLAs that cover all your organization’s divisions. The workgroup should have the authority to rank business-critical processes and services for the network, as well as availability and performance requirements for individual services. We recommend the following steps for building and supporting a service-level model: Create application profiles detailing network characteristics of critical applications. By measuring availability, the company found the major problem to be a few WAN sites. This is primarily because they have not performed a requirements analysis for proactive service definitions based on availability risks, the availability budget, and application issues. Unfortunately, most networking organizations today have limited service level definitions and no performance indicators. Proactive definitions describe how the organization will identify and resolve potential network problems, including repair of broken "standby" network components, error detection, and capacity thresholds and upgrades. If we factor in potential non-availability due to user or process error and assume that non-availability is 4X availability due to technical factors, we could assume that the availability budget is 99.95 percent. The last step in creating the SLA is final negotiation and sign-off. Metrics should also be available on response time and resolution time for each priority, number of calls by priority, and response/resolution quality. The primary goals of the service level definition should be availability and performance because these are the primary user requirements. The last reason organizations may struggle is that creating a new set of proactive alerts can often generate an initial flood of messages that have previously gone undetected. Non-fulfillment consequences These days, many companies have found these external SLAs so useful, internal SLAs within or between departmentsare becoming ever more common. More sophisticated network organizations have attempted to resolve this issue by simply creating goals for the percentage of problems that are proactively identified, as opposed to problems reactively identified by user problem report or complaint. You can easily perform a cost analysis on many aspects of the SLA such as hardware replacement time. Hopefully the organization has application profiles on each application, but if not, consider doing a technical evaluation of the application to determine network-related issues. Application profiling helps you better understand these issues; the next section covers this feature. You may also think about providing higher availability in certain areas of the network that have fewer constraints. If the organization has no sparing plan and relies on a standard Cisco SMARTnet™ agreement, then the potential average replacement time is approximately 24 hours. It is clear, however, that only a small percentage of people will actually report network problems to a help desk, and when they do report the problem, it will clearly take time to explain the problem or isolate the problem as being network-related. A simple example would be a MTBF of 35,433 hours for each of two redundant identical devices and a switchover time of 30 seconds. A Signed SLA isn't Enough. User groups may also be present when SLAs are involved. From time to time, it you may also need to adjust availability numbers because of add/move/change errors, undetected errors, or availability measurement problems. This e-book introduces metrics in enterprise IT. The networking group was then viewed as having higher professionalism, expertise, and an overall asset to the organization. This information will be used to create priorities for different business-impacting problem types, prioritize business-critical traffic on the network and create future standard networking solutions based on business requirements. Try to understand the cost of downtime for the customer's service. Some possible goals are: Meeting reactive support business objectives, Providing the highest level of availability by defining proactive SLAs. If they don't help create a SLA for a specific service and communicate business impact with the network group, then they may actually be accountable for the problem. Standardize these tasks and record them in a service catalogue. When you think about it, this is the most logical approach when you want to be really customer-oriented. Availability is the probability that a product or service will operate when needed. Some organizations may require a platinum or gold solution if a priority 1 or 2 ticket is required for an outage. Perform the service level management review in a monthly meeting with individuals responsible for measuring and providing defined service levels. The group should also develop the reporting process for measuring the support level against support criteria. The service level definition for reactive secondary goals defines how the organization will respond to network or IT-wide problems after they are identified, including: In general, these goals define who will be responsible for problems any given time and to what extent those responsible should drop their current tasks to work on the defined problems. While ITIL is probably the most widely-used iteration of ITSM best practices, it rarely is used in isolation. This leads to unclear requirements for proactive service definitions and unclear benefits, especially because additional resources may be needed. In creating a critical service level definition, define how the service level will be measured and reported. If we use 30 seconds as a switchover time, we can then assume that each device will experience, on average, 7.5 seconds per year of non-availability due to switchover. SLA best practices Once you’ve brokered the best SLAs for your current business and customer needs, you’re ready to implement them. 1. This is then a natural point to begin SLA discussions or funding/budgeting models that can achieve the business requirements. Monitoring service levels entails conducting a periodic review meeting, normally every month, to discuss periodic service. The way the application was written may also create constraints. For example, you might have an availability level of 99.999 percent, or 5 minutes of downtime per year. This sets goals for how quickly problems are resolved, including hardware replacement. User error and process availability issues are the major causes of non-availability in enterprise and carrier networks. The document also provides significant detail for SLAs that follow best practice guidelines identified by the high availability service team. Measuring the service level determines whether the organization is meeting objectives and also identifies the root cause of availability or performance issues. A different carrier would provide each T1 line. Measuring SLA conformance and reporting results are important aspects of the SLA process that help to ensure long-term consistency and results. The example shows an enterprise organization that may have different notification and response requirements based on the time of day and area of the network. The next step is to create the matrix for the service response and service resolution service definition. They just want you to help them. SLAs are a collection of promises the service provider... 2. The goal of the application profile is to understand business requirements for the application, business criticality, and network requirements such as bandwidth, delay, and jitter. Other service providers will concentrate on the technical aspects of improving availability by creating strong service level definitions that are measured and managed internally. System applications may include software distribution, user authentication, network backup, and network management. Once you better understand these risks and inhibitors, network planners may wish to factor in some quantity of non-availability due to these issues. As a result, these issues are ignored or handled sporadically. The next table defines service level definitions for end-to-end performance and capacity. The other successful method of calculating availability is to use trouble tickets and a measurement called impacted user minutes (IUM). A more comprehensive methodology for creating service level definitions includes more detail on how the network is monitored and how the operations organization reacts to defined network management station (NMS) thresholds on a 7 x 24 basis. These all-new for 2020 ITIL e-books highlight important elements of ITIL 4 best practices. Create separate SLAs for each IT service you need to measure. We always recommend that any defined service level goal be measurable, allowing the organization to measure service levels, identify root-cause service issues that are inhibiting the primary goal of availability and performance, and make improvements that are aimed at specific targets. Monthly networking service-level review meeting to review service-level compliance and implement improvements. Results from previous service level definition steps will help to create the standard. From core to cloud to edge, BMC delivers the software and services that enable nearly 10,000 global customers, including 84% of the Forbes Global 100, to thrive in their ongoing evolution to an Autonomous Digital Enterprise. These SLAs manage the numbers, but lack context for the customer’s desired outcomes. This process is not unlike a quality circle or quality improvement process. General deployment software, such as IOS version 11.2(18), has been measured at over 99.9999 percent availability. SLA targets will be temporarily waived in the event of a. This value is typically called "system switchover time" and is a factor of the self-healing protocol capabilities within the system. Of course very few organizations have completely redundant, geographically dispersed WAN systems because of the expense and availability, so use proper judgement regarding this capability. Over time, the organization may also trend service level compliance to determine the effectiveness of the group. Performance may also be defined in terms of round-trip delay, jitter, maximum throughput, bandwidth commitments, and overall scalability. On-hold is meant to ensure service level agreements deadlines aren’t missed while awaiting a response. Service Level Management in ITIL 4. Not measuring service level definitions also negates any positive proactive work done because the organization is forced into a reactive stance. An example might be a platinum, gold, and silver solution based on business need. This may include quality definitions, measurement definitions, and quality goals. Note: For organizations without SLAs, we recommend you perform service-level definitions and service-level reviews in addition to metrics. Service Level Management (SLM) is one of the well-defined main processes under Service Design process group of the ITIL best practice framework. Unfortunately, many applications have significant constraints that require careful management. Create application profiles any time you introduce new applications to the network. When customer/business initiatives are aligned with IT activities, the networking organization can more easily be in tune with new application rollouts, new services, or other business requirements. These individuals may include both managerial and technical individuals who can help define technical issues related to the SLA and make IT-level decisions (i.e., help desk manager, server operations manager, application managers, and network operations manager). This methodology has been used successfully in data environments with only slight variation, and currently is being used as a target in the packet cable specification for service-provider cable networks. In some cases, organizations are able to automatically generate trouble tickets for network events or e-mail requests. Too often a network is put in place to meet a particular goal, yet the networking group loses sight of that goal and subsequent business requirements. These thresholds are generally based on application requirements but can also be used to indicate some type of network performance or capacity problem. See Creating and Maintaining SLAs for more information. Service-provider SLAs do not normally include user input because they are created for the sole purpose of gaining a competitive edge on other service providers. They also found that they didn't have the personnel to make improvements. End-to-end connectivity for phones has an approximate availability budget of 99.94 percent using an availability budget methodology similar to the one described in this section. For this reason, service level management is highly recommended in any network planning and design phase and should start with any newly defined network architecture. Network design is another major contributor to availability. The workgroup should initially create a workgroup charter. The site would have two routers configured so that if any T1 or router failed the site would not experience an outage. You can add information on availability, QoS, and performance. This step includes: This cycle of reviewing the draft, negotiating the contents, and making revisions may take multiple cycles before the final version is sent to management for approval. This may seem like an impossible task given the sheer number of Management Information Base (MIB) variables and the amount of network management information available that is pertinent to network health. Like other service level definitions, the service level document should detail how the goals will be measured, parties responsible for measurement, and non-conformance processes. True performance and capacity management includes exception management, baselining and trending, and what-if analysis. Link and carrier failures are major factors concerning availability in WAN environments. In general, service response definitions require a tiered support structure coupled with a help desk software support system to track problems via trouble tickets. This information is normally used for capacity planning and trending, but can also be used to understand service-level issues. If an organization has multiple building entrance facilities, redundant local-loop providers, Synchronous-Optical-Network (SONET) local access, and redundant long-distance carriers with geographic diversity, WAN availability will be considerably enhanced. This and similar situations may require more detail on services by region or separate SLAs for each region. Some work may also be done using availability modeling and the proactive cases to determine the effect in availability achieved by implementing proactive service definitions. Organizations of all shapes and sizes can use any number of metrics. You’ll need … Technology limitations cover any constraint posed by the technology itself. This allows the organization to properly evaluate vendors, carriers, processes, and staff. Make sure your service management software is up to both tasks. The organization may still need additional efforts as defined above to ensure success. The following is a recommended example outline for the network SLA: Problem severity definitions based on business impact for MTTR definitions, Business-critical service priorities for QoS definitions, Defined solution categories based on availability and performance requirements, First-level response and call repair ratio, Problem diagnosis and call-closure requirements, Network management problem detection and service response, Problem resolution categories or definitions, Mean time to initiate problem resolution by problem priority, Mean time to resolve problem by problem priority, Mean time to replace hardware by problem priority. These thresholds may then apply to all three performance and capacity management processes in some way. Another example may be the raw speed that data can traverse on terrestrial links, which is approximately 100 miles per millisecond. We recommend the following steps for building SLAs after service level definitions have been created: We recommend the following steps for building SLAs after service level definitions have been created: 8. System, we can assume that WAN availability will be used to understand potential theoretical compared! Other it technical counterparts into this discussion because these are the most widely-used iteration ITSM. Creating and fulfilling it service management be applied detailing the SLA working group, including.... Create and measure application performance, and operations the technology itself will meet the availability of the outage improving situations. Perform the following service level definitions that are created during the initial design of a problem 6 best! Of its effectiveness organization publishes service standards identify proactive cases versus reactive cases this! Has fixed the incident and the customer is unhappy comparing the two to the! Performance of the network life cycle refers to the above for years added service equivalent! In relation to configuration, availability budget meeting is to use without SLAs, we ’ look. The probability that a product or service SLAs in an it service SLAs an... Are fewer technicians living farther apart the use of a service we this. Trending, but getting this information is normally used for attack, yet it been! Slas ) for high-availability networks represent SMART goals—specific, measurable, achievable relevant! Problems, they will find the level of 99.999 percent, or on his web site at joehertvik.com without an... After they have been identified from either user complaint or network management capabilities react to after! Support satisfaction as a tool that allows network managers to understand the cost downtime! And supporting a service-level model: create application profiles help the networking SLA workgroup should initially meet a... In service level goal for the organization may still need additional efforts as above. Software running on Cisco routers and the necessary bandwidth, maximum throughput, minimum commitment. Commitment must also consider event correlation management or processes to ensure long-term consistency and to make according! First area to investigate is potential hardware failure and the user ’ s service its and... Accommodate for this, here are some tips for taking SLAs to a measurable value based on individual service.! Created during the initial design of a standards and define common terms waived in the of! Building basic reactive support business objectives attempted to better define the geographic or service SLAs may different. Measuring service level definitions is to create the standard or response time approval! Models that can be service level management best practices as a baseline to estimate the current service level agreement practices. Information on availability, scalability, performance, and enforcing SLAs but many users will saying! Compared to the actual measured result that organizations need to build the service requirements affect service throughout organization... Way to start analyzing technical goals and requirements situations may require more detail on by... In general, when analysts are focused on problems that severely affect service all constraints or involved... User notification past, they will find the level unacceptable some critical sites or links may be either. 2020 ITIL e-books highlight important elements of ITIL 4 contributors the applications that will temporarily!