Overview: The Operational Core of the SMS
Clauses 8.6 and 8.7 contain the service management practices that are most visible to customers, most frequently executed day-to-day, and most heavily tested in ISO 20000 certification audits. These clauses are the operational core of the SMS. Clause 8.6 addresses resolution practices (how the organization responds when incidents or problems occur) and control practices (how the organization maintains control of the IT environment). Clause 8.7 addresses assurance practices (how the organization maintains agreed service levels). Understanding and effectively implementing Clause 8.6 and 8.7 is essential for ISO 20000 compliance.
Clause 8.6: Resolution and Control Practices
Clause 8.6.1: Incident Management
Purpose and Scope
An incident is an unplanned interruption to a service or a reduction in service quality. Incident management is the process for responding to incidents: classifying them, determining their priority, escalating them, working to resolve them, restoring service, and closing the incident once service is restored.
Incident Classification and Prioritization
Not all incidents are equal. A critical application being down creates more urgency than a single user unable to print. Incident management requires classification (what type of incident is this) and prioritization (how urgently must it be addressed).
A common classification scheme includes:
• Hardware failures: network outages, server failures, storage issues • Software or application failures: crashes, hung processes, data corruption • Access or permission issues: users unable to access systems or data they need • Performance issues: systems running slowly due to resource exhaustion • Data or configuration issues: incorrect data or misconfiguration causing service degradation • Security incidents: intrusions, malware, unauthorized access
Prioritization typically combines impact (how many users or services affected) and urgency (how critically the business is affected) into priority levels:
• P1/Critical: wide-spread impact, multiple services down, significant business impact • P2/High: significant impact, important service affected, business process interrupted • P3/Medium: moderate impact, workaround available, single user or small group affected • P4/Low: minimal impact, cosmetic issues, non-critical function affected
SLA targets (resolution time) are typically set by priority level. P1 incidents might have 1-hour resolution targets; P2 might be 4 hours; P3 might be 24 hours. These targets drive escalation: if a P1 incident is not resolved within its SLA window, it is escalated to senior management.
Major Incident Management
When a P1 incident occurs or when an incident has very high business impact, separate major incident procedures often apply. Major incident management includes:
• Crisis team activation: senior leadership, service owners, and subject matter experts convened immediately • Frequent communication: crisis team updates every 15-30 minutes to leadership • Rapid diagnosis: using all available resources to identify the root cause quickly • Alternative solutions: if normal resolution is slow, can the service be restored through a workaround or alternative configuration • Post-incident review: once service is restored, a formal post-incident review (sometimes called a postmortem) is conducted
Incident Records and Evidence
ISO 20000 requires documented evidence of incidents and their resolution. Incident records should include:
• Incident ID and date/time opened • Description of the incident and affected services • Classification and priority • Assigned team and individual owner • Activities and diagnostics performed to resolve • Root cause (identified during resolution or problem management) • Solution implemented and date/time resolved • Date/time closed (after verification that service is restored) • SLA target and whether the incident was resolved within SLA
Auditors often examine incident records to verify that incidents are being handled according to procedures. Common findings include missing classifications, no SLA tracking, or incidents marked resolved without verification that service was actually restored to the customer.
Clause 8.6.2: Service Request Management
Service requests are distinct from incidents. An incident is an unplanned disruption; a service request is a standard, expected, pre-approved activity. Examples include:
• Password resets • Granting or revoking access to systems or data • Provisioning new hardware or software licenses • Requesting IT support or consultation • Requesting changes to existing configurations
Service request management requires:
• A service request catalogue: the list of standard requests that are available • Fulfillment procedures: how each request type is handled • SLA targets for request types: e.g., password resets fulfilled within 1 hour, access provisioning within 1 business day • Records: documentation that requests were received, fulfilled, and closed
Many organizations integrate service requests with a self-service portal (allowing users to request standard items directly) or with email and ticketing systems. The key requirement is that service requests be documented and tracked to closure.
Clause 8.6.3: Problem Management
Reactive and Proactive Problem Management
Problem management has two modes:
• Reactive: investigating the root cause of incidents that have already occurred • Proactive: identifying and eliminating potential causes before they result in incidents
Root Cause Analysis
When an incident occurs, reactive problem management investigates the root cause. Root cause analysis techniques include:
• 5 Whys: repeatedly asking "why" to trace the issue back to its fundamental cause • Fishbone (Ishikawa) diagram: visually mapping causes and sub-causes • Fault tree analysis: mapping how multiple factors combine to produce failure • Correlation analysis: examining logs and data to identify what events preceded the incident
The goal is not just to fix the immediate problem but to identify and eliminate the underlying cause, preventing recurrence.
Known Error Management
As problems are investigated and resolved, the organization builds a "known error database" or "known error record." A known error record documents:
• The problem and symptoms • Root cause • Workaround (if the problem cannot be immediately fixed) • Permanent fix (when available) • Related incidents: which incidents have the same root cause
When a new incident occurs, the help desk or support team can search the known error database to see if the incident matches a known problem. If it does, the known workaround can be applied immediately, reducing resolution time while the permanent fix is developed.
Problem Records and Evidence
ISO 20000 requires documented evidence of problems and their resolution. Problem records should include:
• Problem ID and date opened • Description and symptoms • Related incidents: which incidents have the same root cause • Root cause analysis documentation • Known error record (if applicable) • Permanent fix plan and progress • Status: whether the problem is active, in progress, or resolved • Review and closure documentation
A common audit finding is problem records opened but never progressed to closure. The organization may create a problem ticket but never complete the root cause investigation or track the problem to a permanent fix.
Clause 8.6.4: Configuration Management
Configuration management maintains an accurate record of all IT components (configuration items or CIs) that make up the services the organization manages. A CI might be a server, network switch, application, database, virtual machine, software license, or any other component that is tracked.
CMDB: The Configuration Management Database
The Configuration Management Database (CMDB) is the repository of CI data. The CMDB should include:
• CI attributes: name, type, version, owner, location, procurement date • CI relationships: which CIs depend on or connect to other CIs • CI lifecycle status: active, retired, in development • Associated documentation: for each CI, links to technical specifications, support contacts, etc.
A well-maintained CMDB enables other service management practices: incident management (tracing the impact of a CI failure through related CIs), change management (understanding the blast radius of a proposed change), and problem management (identifying which CIs are involved in a problem).
CMDB Verification and Accuracy
Many organizations populate a CMDB initially but fail to maintain it. CIs are added to the environment but not added to the CMDB. Configurations change but CMDB records are not updated. The result is a CMDB that diverges from reality.
ISO 20000 requires that the organization verify the accuracy of the CMDB at defined intervals. This means:
• Regular audits: comparing the CMDB to the actual environment to identify discrepancies • Reconciliation: updating CMDB records to match actual configuration • Documented evidence: audit reports showing verification activities and reconciliation actions
Clause 8.6.5: Change Management
Change Types and Processes
Not all changes are handled the same way. ISO 20000 recognizes three change types:
• Standard changes: pre-approved, low-risk changes that can be implemented quickly using defined procedures (e.g., password resets, adding users to standard groups) • Normal changes: changes that require assessment and formal approval before implementation • Emergency changes: urgent changes required to resolve critical incidents; approval is expedited, but the change is reviewed post-implementation
Standard changes have minimal bureaucracy; they use established, tested procedures. Normal changes go through a change review process (often including a Change Advisory Board or CAB) where the change is assessed for impact and risk before approval. Emergency changes bypass pre-approval but undergo post-implementation review to ensure the emergency was justified.
Change Records and Required Information
The organization must maintain documented change records. Each change record should include:
• Change description: what is being changed and why • Change type: standard, normal, or emergency • Reason for change: business justification • Risk assessment: what could go wrong if the change is implemented • Impact analysis: which services, CIs, and customers are affected • Implementation plan: step-by-step plan for implementing the change • Rollback plan: how to revert if the change causes problems • Approval: sign-off from appropriate stakeholders (CAB, service owner, etc.) • Implementation evidence: records showing that the change was executed as planned • Post-implementation review: verification that the change achieved its objective and caused no unexpected issues • Change success: did the change achieve its intended purpose without creating new problems
The Change Advisory Board (CAB)
The Change Advisory Board is a forum for reviewing and approving normal changes. CAB membership typically includes:
• Service owners: representatives for each service that might be affected by changes • Infrastructure and operations: technical experts who understand system dependencies • Risk and compliance: ensuring changes do not violate policies or regulatory requirements • Change management: facilitating the process
The CAB meets regularly (weekly, bi-weekly, etc.) to review proposed changes, assess risks, and make approval decisions. Decisions and reasoning must be documented in meeting minutes.
Clause 8.6.6: Release and Deployment Management
Release and deployment management addresses how software, patches, and updates are packaged and deployed to the production environment. This includes:
• Release planning: determining what components will be included in a release and when the release will be deployed • Release composition: gathering components and conducting release-level testing • Deployment: rolling out the release to production environments • Documentation and approval: release notes, deployment verification, sign-off
Most releases require a change request (Clause 8.6.5) and must go through the change management approval process.
| KEY CONCEPT | The incident/problem/change/configuration quadrant shows how these four practices are deeply interdependent. The CMDB feeds change impact assessment; incidents feed problem analysis; problems drive changes; changes must be reflected in the CMDB. Failure in any one area weakens all four. |
Clause 8.7: Service Assurance Practices
While Clause 8.6 practices address how the organization responds to events (incidents, problems) and controls changes, Clause 8.7 practices address how the organization maintains and improves agreed service levels. There are four main service assurance practices.
Clause 8.7.1: Availability Management
Availability is a measure of whether the service is accessible and functioning when users need it. Availability management includes:
• Availability targets: defining what availability the service should provide (e.g., 99.5% availability) • Availability planning: designing and architecting services to meet availability targets • Availability monitoring: continuous measurement of actual availability • Availability reporting: regular reports comparing actual availability to targets • Availability improvement: identifying causes of unavailability and taking action to improve
Clause 8.7.2: Service Continuity Management
Service continuity management addresses how the organization prepares for and responds to disasters or major disruptions. It includes:
• Continuity planning: developing plans to maintain critical services during or after a disaster • Recovery objectives: defining Recovery Time Objective (RTO—how quickly service must be restored) and Recovery Point Objective (RPO—how much data loss is acceptable) • Backup and recovery: ensuring that data is backed up and can be recovered • Testing: regularly testing continuity plans to ensure they work • Training: ensuring staff know their roles in a disaster recovery scenario
For organizations certified to ISO 22301 (business continuity management), Clause 8.7.2 aligns with and supports ISO 22301 requirements.
Clause 8.7.3: Capacity Management
Capacity management ensures that services have sufficient resources (computing, storage, network) to meet user demand and performance targets. It includes:
• Capacity monitoring: continuously measuring resource utilization (CPU, memory, disk, network) • Capacity planning: forecasting future demand based on growth trends • Capacity incidents: managing situations where resources become exhausted and service performance degrades • Optimization: rightsizing resources to balance cost and performance
Clause 8.7.4: Information Security Management
Information security management ensures that services and the data they process are protected from unauthorized access, modification, or loss. ISO 20000 requires that the organization implement information security controls relevant to service management. For organizations certified to ISO 27001 (information security management), Clause 8.7.4 aligns with ISO 27001 requirements.
Information security management in the SMS context includes:
• Access controls: ensuring that users can access only the data and systems they are authorized to use • Data protection: protecting data at rest and in transit • Incident response: procedures for responding to security incidents (unauthorized access, data breaches, malware) • Security monitoring: continuous monitoring for security threats and anomalies
| IMPORTANT | Clause 8.6 generates more audit nonconformities than any other section of ISO 20000. Most commonly, the issue is not the absence of processes but the absence or incompleteness of records. Processes may exist, but evidence that the processes have been executed—incident records, change approvals, problem investigations—is missing or incomplete. |
| BITLION INSIGHT | Bitlion GRC integrated ITSM and GRC capabilities enable comprehensive management of Clause 8.6 practices: incident and problem tracking, change management workflows, configuration item registers, and integrated performance monitoring all within a unified platform. |
Resolution Practice Requirements Summary
| Practice | Key ISO 20000 Requirements | Required Records | Common Audit Finding |
|---|---|---|---|
| Incident Management | Classify incidents; set priorities; escalate; resolve within SLA; close with verification | Incident records with classification, priority, resolution, SLA status | Incidents logged but not classified consistently; no escalation evidence; SLA breaches not tracked |
| Service Request Management | Maintain service request catalogue; define fulfillment procedures; SLA targets by request type | Service request records from receipt to fulfillment | No service request catalogue; requests tracked informally without documented closure |
| Problem Management | Investigate incident root causes; maintain known error database; progress problems to resolution | Problem records with RCA documentation; known error records; problem closure evidence | Problem records opened but never progressed; no documented RCA; known error database not maintained or used |
| Configuration Management | Identify all CIs; maintain CMDB with CI relationships; verify CMDB accuracy periodically | CI records; relationship maps; CMDB verification audit reports | CMDB populated initially but not maintained; no verification audits; significant divergence from actual environment |
Service Assurance Practice Summary
| Practice | Planning Requirement | Operational Requirement | Evidence for Audit |
|---|---|---|---|
| Availability Management | Define availability targets (uptime %); establish availability metrics | Continuously monitor actual availability; compare to targets; investigate and remediate failures | Availability reports; incident records; improvement action tracking |
| Service Continuity Management | Document recovery objectives (RTO, RPO); develop continuity plans; identify recovery procedures | Test continuity plans at defined intervals; maintain backup and recovery capability; update plans when services change | Continuity plans; test results and sign-off; recovery procedure documentation |
| Capacity Management | Forecast capacity demand; establish capacity thresholds and alerts | Monitor utilization; alert on threshold exceedance; plan for capacity additions before exhaustion | Capacity reports; monitoring alerts; capacity incident records; expansion planning documentation |
| Information Security Management | Identify security requirements and threats; design security controls into services | Monitor for security threats; respond to security incidents; verify security control effectiveness | Security policies; incident records; monitoring logs; control effectiveness evidence |
Integration and Interdependency
The practices in Clause 8.6 and 8.7 do not operate in isolation. They form an integrated ecosystem:
• Incidents reveal problems; problems drive changes; changes are tracked in the CMDB; the CMDB supports impact assessment for future changes • Availability targets (Clause 8.7.1) inform incident priority (Clause 8.6.1) and define the SLAs that drive incident resolution timeframes • Capacity incidents (Clause 8.7.3) may be classified as incidents and tracked through incident management (Clause 8.6.1) • Security incidents (Clause 8.7.4) are a high-priority category of incident requiring escalated response • Configuration accuracy (Clause 8.6.4) enables effective change impact assessment (Clause 8.6.5) and availability improvement (Clause 8.7.1)