Implementing Incident, Problem, and Service Request Management

Why Resolution Practices Matter

Resolution practices—incident management, problem management, and service request management—are the highest-priority implementation area for ISO 20000 compliance for three critical reasons. First, these are the practices that customers directly experience. When a service fails, the customer interaction is with incident management. When a customer needs something new, the interaction is with service request management. Customer satisfaction depends entirely on how well these practices work. Second, resolution practices generate more audit evidence than any other management system area. Incident records, problem investigation documentation, and service request fulfillment records are examined in minute detail during Stage 2 audits. An auditor examining a sample of 25 incident records will look at each one for proper classification, correct priority assignment, appropriate escalation, SLA compliance, and closure verification. Third, resolution practices are the most commonly nonconforming area in ISO 20000 audits. Organizations implement change management and availability management, but without proper incident, problem, and service request management, the audit will reveal systemic gaps.

Implementing Incident Management

Define the Incident Classification Scheme

The incident classification scheme is the foundation of incident management. It must align to your service portfolio. Typical primary categories include: hardware (servers, storage, network devices, laptops), software (applications, middleware, operating systems), network (connectivity, firewalls, routers), security (security incidents, data breaches, unauthorized access), and application (software defects, performance, integration issues). For each primary category, define secondary categories. For example, under "software," you might have "application performance degradation," "application unavailable," "integration failure," "data corruption." The classification scheme must be exhaustive enough to cover all incidents you encounter, narrow enough to be meaningful, and taught to all frontline technical staff so that incidents are classified consistently. Maintain the classification scheme as a documented procedure; update it annually or when new incident categories emerge.

Design the Priority Matrix

The priority matrix is the instrument that translates incident characteristics into priority levels. It uses two dimensions: impact (how many users or services are affected, how critical the affected service is to the business) and urgency (how time-sensitive the issue is). The output is a priority level (P1, P2, P3, P4 or equivalent). Define each level concretely. For example: P1 = Full outage of a critical service affecting >100 users; P2 = Partial degradation affecting >50 users or full outage of non-critical service affecting >100 users; P3 = Partial degradation affecting <50 users or isolated impact on critical functionality; P4 = No business impact, informational, cosmetic issues. Make the priority levels unambiguous so that different technical staff members classify the same incident to the same priority level.

Set SLA Targets by Priority

SLA targets must be set by priority level. SLA targets include response time (how quickly the incident is acknowledged and investigation begins) and resolution time (how quickly the incident is fully resolved). Example targets: P1 response 15 minutes, resolution 4 hours; P2 response 1 hour, resolution 8 hours; P3 response 4 hours, resolution 2 business days; P4 response 1 business day, resolution 5 business days. Critically, SLA targets must be achievable given your actual technical capability. Setting a 1-hour P1 resolution target when your average P1 investigation takes 3 hours will result in SLA non-compliance. Connect SLA targets to the commitments in customer SLA agreements. If you commit to customers that critical incidents are resolved within 4 hours, your internal P1 SLA target must be 4 hours or better.

Design Escalation Procedures

Escalation procedures ensure that incidents that exceed normal resolution capability are escalated appropriately. Functional escalation moves an incident to a more experienced resolver or specialized support team (frontline support → specialized technical team → vendor support). Hierarchical escalation moves an incident to management (incident manager → service manager → director). Define escalation triggers: when does functional escalation occur (typically when the current resolver has spent a defined effort and not resolved), who is the escalation path, how is the escalation communicated, are there time gates between escalation levels. Example: if P1 not resolved in 2 hours, escalate to Level 3 Engineering; if not resolved in 4 hours, escalate to Incident Manager.

Major Incident Procedure

Define the criteria for declaring a "major incident." Typically, a major incident is a P1 incident (full outage of critical service) or an incident with widespread customer impact or regulatory implications. When a major incident is declared, a major incident manager is assigned, a war room or bridge call is convened, and accelerated communication protocols are activated. Communication cadence is increased (updates to stakeholders every 15–30 minutes instead of standard intervals). Post-incident review is scheduled and conducted within 2–5 business days. Major incident procedure must be documented and practiced.

Incident Record Structure

Incident records are the direct evidence auditors examine. Required fields in an incident record: incident ID (unique identifier), date/time opened, reporter (who reported the incident), affected service or component, initial priority assessment, classification (the category assigned), assigned resolver or team, status (open, investigating, resolved, closed), description of what happened, actions taken to resolve, root cause (if identified), resolution applied, date/time closed, actual SLA status (Met/Missed), and for missed SLA incidents, the justification or root cause of the SLA miss. These fields form the basis of the audit evidence trail. Auditors will sample 20–30 incident records and examine each field for completeness and accuracy.

Closure and Verification

Before an incident is closed, verification from the reporter or customer that the incident is truly resolved is required. This prevents "we fixed it from our perspective but the customer still has the problem" outcomes. Closure verification may be conducted via email, call, or portal confirmation. Auto-closure (system automatically closing an incident after X days of no activity) carries risk—the customer may never have confirmed resolution. If auto-closure is used, the confirmation attempt must be logged and the incident must be escalated for manual verification if the customer does not respond.

Incident Metrics

Track and report: incident volume by priority (are P1 incidents increasing or stable?), SLA achievement rate by priority (what % of P1 incidents met SLA), mean time to resolve by priority (average P1 resolution time), and reopened incident rate (what % of incidents were reopened after closure—indicates closure without true resolution). These metrics feed into Clause 9 performance evaluation and provide data for management review.

Implementing Service Request Management

Define the Service Request Catalogue

Service requests are requests from users for something to be provided—a new account, hardware, software installation, information. A service request is not a problem to be fixed; it is a standard fulfillment process. Common service request types: access provisioning (new user account, group membership, application access), hardware request (new laptop, monitor, phone), software installation or license request, information request (password reset, documentation, how-to guidance), configuration changes within policy (update personal proxy settings, change email forwarding). Define clearly what is a service request vs. an incident. An incident is "the email server is down;" a service request is "reset my email password." A service request that escalates becomes an incident (if the service request is refused because of a policy issue, it is escalated as a problem; if the fulfillment process uncovers a technical failure, it is escalated to incident).

Pre-Approved Fulfillment Procedures

For each service request type in the catalogue, define a pre-approved fulfillment procedure. A procedure includes: steps to fulfill the request, required approvals (e.g., manager approval for hardware purchase), SLA target (how quickly the request must be fulfilled), documentation to be generated. This prevents ad-hoc handling that produces no records. Example: for a new user account request: user submits via portal → manager approves → IT creates account in AD → sends credentials → updates CMDB → closes request. SLA target: 1 business day. All steps must be documented in the service request record.

Distinguishing Service Requests from Changes

Standard, low-risk changes that are pre-approved may be handled as service requests rather than requiring a full change management process. Example: "reset password" is a low-risk standard change handled as a service request. "Install new version of application" is a change requiring change management. The boundary must be clearly defined. Generally, if the change is pre-approved, repeatable, and poses minimal risk, it can be a service request. Otherwise, it is a change.

Service Request Records

Service request records must show: request ID, date opened, requester, type (from the catalogue), description of what is requested, approvals required and status, fulfillment steps completed, completion date, SLA status (Met/Missed). Service request records generate the evidence for audit. Auditors will examine service request records to verify that requests are being fulfilled consistently and timely.

Implementing Problem Management

Reactive Problem Management

Turn this guidance into a working ITSM on Bitlion

The ISO 20000 product brings together control mapping, evidence, policies, and continuous monitoring so your team spends less time on spreadsheets and more time passing audits with confidence.

Explore Bitlion for ISO 20000

Reactive problem management is triggered by: recurring incidents (same root cause producing multiple incidents), major incidents (a post-incident review process is invoked), customer escalation (customer demands root cause analysis). When triggered, a problem record is created and linked to the incident records that triggered it. The problem team conducts root cause analysis (RCA) using a defined methodology. Common RCA methodologies: Five Whys (ask why the incident occurred, then why that cause occurred, iterate to identify root cause), fishbone diagram (list contributing factors in categories: people, process, technology, environment), fault tree analysis (decompose the failure into causal chains). RCA must be documented in the problem record. Once root cause is identified, it is recorded. If a permanent fix can be implemented immediately, the problem record is marked for fix implementation via change management. If a permanent fix is not yet available, a known error record is created (see below) and the problem remains open until the fix is implemented.

Proactive Problem Management

Proactive problem management actively hunts for problems before they cause multiple incidents. Review incident data monthly to identify trends: are the same components failing repeatedly? Is a particular application generating increasing incident volume? Conduct availability and performance data review to identify emerging issues: is a database slowly degrading in performance, approaching a crisis? Is a storage array approaching capacity? Identify capacity-related problems: when utilization reaches defined thresholds, trigger a capacity problem. Maintain a proactive problem register and review it monthly or quarterly. Assign owners to proactive problems and set target resolution dates.

Known Error Database

A known error is an incident or problem that has a documented root cause and a documented workaround or temporary solution, but the permanent fix has not yet been implemented. The known error is recorded in a known error database so that if the same symptoms occur in an incident, the incident resolver can quickly apply the workaround without waiting for root cause analysis. Example: "Email connection failures on morning startups—known error: network load during business start window. Workaround: restart Outlook. Permanent fix: network infrastructure redesign, planned for Q3." The known error database must be actively used during incident management. When an incident is being investigated, the resolver should search the known error database for matching symptoms and apply the workaround if found. The known error database is reviewed when permanent fixes are implemented (the known error is marked as resolved and removed or archived).

Problem Management Metrics

Track: open problem count and trend (is problem count increasing or decreasing?), problems by age (how many problems are >30 days old?), mean time to root cause (average time from problem opening to RCA completion), known error database utilization rate (what % of incidents match a known error and apply the documented workaround?), and percentage of problems leading to permanent fixes (are problems actually being fixed, or just documented?).

Integration with Change Management

When an RCA identifies a required fix, a change record must be raised. The problem record and change record must be linked. Example: problem record states root cause is "outdated network driver causing intermittent connectivity," change record describes installing updated driver on affected servers. The change goes through the change management approval process. Once the change is implemented, the problem record is closed and the associated known error is archived.

Common Implementation Pitfalls

Pitfall 1: Implementing incident management without service request management. Service requests are handled informally, no records are kept, and there is no documented fulfillment process. This creates confusion (is this an incident or a request?) and no audit evidence for service request handling. Solution: implement both incident and service request management from the start.

Pitfall 2: Creating problem records but never progressing them. A problem queue fills with aging, stale records. Root cause analysis is never completed, and no known errors are created. The queue becomes an appendix and evidence of a non-functional process. Solution: assign ownership, set closure targets, and review the problem queue at management review.

Pitfall 3: Conducting RCA verbally but not documenting it. "We found the root cause" but no record exists. When the same issue occurs weeks later, the knowledge is lost. Solution: require RCA findings to be documented in the problem record and reviewed by a second party for completeness.

Pitfall 4: Creating a known error database but never using it. The database exists but incident resolvers do not search it. Known errors accumulate and become a useless appendix. Solution: make searching the known error database a required step in incident investigation.

Pitfall 5: Setting SLA targets without checking achievability. SLA targets are set to please customers but are unachievable given actual operational capability. Incidents routinely miss SLA. Solution: analyze historical incident data to understand achievable targets before setting them.

KEY CONCEPT

The three-tier resolution structure operates as an integrated loop: Incident (restore service immediately) → Problem (investigate root cause) → Change (implement permanent fix) → back to CMDB (update configuration record). All three practices must work together. Without problem management, incidents repeat. Without change management, root causes are never fixed. Without CMDB accuracy, change impact assessment fails.

Incident Priority Matrix

Priority	Impact Description	Urgency Level	Resolution Target	Update Frequency	Escalation Trigger
P1	Full outage of critical service, >100 users affected	Immediate	4 hours	30 minutes	Not resolved in 2 hours
P2	Partial degradation or full outage non-critical service, 50–100 users	High	8 hours	1 hour	Not resolved in 4 hours
P3	Partial degradation <50 users or isolated functionality impact	Medium	2 business days	4 hours	Not resolved in 8 hours
P4	No business impact, cosmetic issues, informational	Low	5 business days	Daily	As needed
Emergency	Widespread customer impact, regulatory breach, data loss	Critical	1 hour	15 minutes	Immediate executive escalation

Resolution Practice Implementation Checklist

Practice		Design Elements Required	Records Required	Operational Readiness Criteria
Incident Management		Classification scheme, priority matrix, SLA targets by priority, escalation procedures, major incident procedure	Incident records with complete fields: ID, date/time, classification, priority, resolver, status, resolution, SLA status	All staff trained, 2 weeks of incident records, SLA targets achievable, 95% incidents classified correctly in sample audit
Service Request Management		Service request catalogue, pre-approved fulfillment procedures per request type, approval authorities	Service request records with request type, approvals, fulfillment steps, closure verification, SLA status	3 common request types with documented procedures, 2 weeks of request records, all requests have closure verification, 100% SLA achievement
Problem Management (Reactive)		Trigger criteria for problem creation, RCA methodology, problem record template, known error criteria	Problem records with RCA documentation, known error records with workaround and permanent fix status, problem-to-change links	RCA methodology trained to problem team, 1 problem record per audit sample showing complete RCA, known errors searchable in incident management
Problem Management (Proactive)		Incident data analysis process, availability/performance review process, capacity problem criteria, proactive problem review schedule	Monthly incident trend analysis, proactive problem register with owner and target date, trend reports	Monthly trend analysis conducted, proactive problem register reviewed monthly at operations meeting, 1–2 proactive problems identified per month
IMPORTANT	Incident records are the single most-examined document type in Stage 2 audits. Auditors will sample 20–30 records and examine each in detail for: (1) classification accuracy—is the incident assigned to the correct category? (2) Priority correctness—does the priority match the incident impact and urgency? (3) Escalation compliance—were escalation procedures followed? (4) SLA status—was the SLA met or missed, and if missed, is there documented justification? (5) Closure verification—is there evidence the customer confirmed resolution? Incident records with incomplete data, missing escalation documentation, or incorrect SLA status will generate audit findings.

BITLION INSIGHT

Bitlion GRC includes integrated ITSM module with incident, problem, and service request workflows specifically designed to meet ISO 20000 Clause 8.2, 8.3, and 8.4 requirements. Incident records include all required ISO 20000 fields. Service request catalogue and pre-approved procedures are configurable. Problem management workflow includes RCA template and known error database. Integration with change management ensures problem-to-change linkage. All records generate the documented information required for audit.

Why Resolution Practices Matter

Implementing Incident Management

Define the Incident Classification Scheme

Design the Priority Matrix

Set SLA Targets by Priority

Design Escalation Procedures

Major Incident Procedure

Incident Record Structure

Closure and Verification

Incident Metrics

Implementing Service Request Management

Define the Service Request Catalogue

Pre-Approved Fulfillment Procedures

Distinguishing Service Requests from Changes

Service Request Records

Implementing Problem Management

Reactive Problem Management

Turn this guidance into a working ITSM on Bitlion

Proactive Problem Management

Known Error Database

Problem Management Metrics

Integration with Change Management

Common Implementation Pitfalls

Incident Priority Matrix

Resolution Practice Implementation Checklist

ISO 20000 Foundations

What Is ISO 20000 and Why It Matters

The ISO 20000 Standard Structure

Key Definitions and Core Concepts

ISO 20000 and ITIL: Understanding the Relationship

The SMS Lifecycle: Plan–Do–Check–Act

Who Needs ISO 20000 and When

ISO 20000 Requirements

Clause 4: Understanding the Organization and Its Context

Clause 5: Leadership and Commitment

Clause 6: Planning — SMS Objectives, Risk Management, and the Service Management Plan

Clause 7: Support — Resources, Competence, Awareness, and Documented Information

Clause 8.1: SMS Operations — Operational Planning and Control

Clause 8.2–8.3: Service Portfolio and Relationship Management

Clause 8.4–8.5: Supply Chain Management and Service Design, Build, and Transition

Clause 8.6–8.7: Resolution Practices and Service Assurance

ISO 20000 Implementation Process

ISO 20000 Implementation Roadmap: A Phased 12-Month Program

Scoping the SMS: Which Services, Which Boundaries, Which Customers

Gap Assessment and Remediation Planning: Finding and Fixing the Gaps

Designing the Service Management Plan: The Governing Document of the SMS

Implementing Incident, Problem, and Service Request Management

Implementing Change, Release, and Configuration Management

Implementing Availability, Capacity, and Service Continuity Management

Integrating ISO 20000 with ISO 27001 and ISO 22301: Building an Integrated Management System

ISO 20000 Certification Process

Preparing for ISO 20000 Certification: The Pre-Audit Readiness Checklist

Selecting a Certification Body for ISO 20000: A Practical Evaluation Guide

Stage 1 Audit: What Happens, What Auditors Look For, and How to Prepare

Stage 2 Audit: Demonstrating That the SMS Is Genuinely Operational

The 12 Most Common ISO 20000 Audit Nonconformities — and How to Prevent Them

Surveillance Audits and Recertification: Maintaining ISO 20000 Certification

Multi-Standard Certification: Running ISO 20000 and ISO 27001 Together

ISO 20000 SMS Operations and Service Management Practices

Service Level Management and SLA Design: Building Agreements That Work

Incident Management in Practice: From Logging to Closure

Problem Management: Building a Culture of Root Cause Resolution

Change Management and the Change Advisory Board: Controlling the Service Environment

Configuration Management and the CMDB: Building the Foundation of Service Knowledge

Continual Improvement in the SMS: From Aspiration to Discipline

Customer Satisfaction and Service Review: Keeping the SMS Customer-Centred

ISO 20000 in the Indonesian Context

ISO 20000 and OJK IT Governance: Demonstrating Compliance Through SMS

ISO 20000 for Indonesian Managed Service Providers: Market Positioning and Implementation

ISO 20000 for Cloud Service Providers: Managing Services at Scale

ISO 20000 for Government and Public Sector IT in Indonesia

ISO 20000 and UU PDP: Integrating Personal Data Protection into Service Management

Building a Compliance-Ready SMS: The Integrated Indonesian Compliance Architecture

ISO 20000 and Government Procurement: Winning and Retaining Public Sector IT Contracts