Degraded performance in Commerce API + Admin

Incident Report for Norce

Postmortem

Between 09:30 and 10:28 CET on October 24th, the Commerce Services API and Admin API experienced degraded performance resulting in increased response times and, in some cases, failed requests. The incident was caused by an unexpected surge in load on one of the SQL clusters within our multi-tenant production environment — a so-called “noisy neighbor” scenario where one tenant’s workload impacted the performance of others. The elevated load led to a cascading performance degradation, with automatic scale-out events and external retry mechanisms further amplifying the strain on both the database and application layers.

The issue was mitigated by isolating the tenant workload that caused the initial problem and restarting API instances that were stuck in a failed state, restoring full service functionality by 10:28 CET. Our alerting systems functioned as intended, enabling a coordinated and timely response from our engineering and infrastructure teams.

As a result of this incident, we are implementing additional safeguards to reduce the risk of similar occurrences. These include enhanced tenant workload isolation and improvements to application-layer recovery times. Together, these measures will strengthen the overall resilience of the platform and ensure faster recovery in the event of future disruptions.

Posted Oct 24, 2025 - 12:11 CEST

Resolved

We have observed normal response times since 10:28. Customers should no longer experience any delays. Scope and regions impacted are now fully operational. We will continue to monitor to ensure stability, and no further action is required at this time.

Posted Oct 24, 2025 - 12:10 CEST

Monitoring

We're back to normal response times since 10:28 and will continue to monitor the situation.

The incident was caused by a cascading overload initially triggered by a sudden surge in queries against one of our database clusters. The initial impact affected customers that were co-hosted on the same database cluster, but the longer response times against one database cluster eventually also affected our application layer, thus affecting most customers in our multi-tenant hosting environment.

Customers using a single-tenant hosting solution was not affected by this incident.

We will publish a post-mortem report on our status page within the next couple of hours.

Posted Oct 24, 2025 - 10:50 CEST

Identified

We have identified the problem and are addressing it. Some customers may have experienced slower response times during the incident and may still be experiencing somewhat decrease API performance while the problem is being mitigated. We are continuing to monitor and will share another update once performance is fully stable.

Posted Oct 24, 2025 - 10:24 CEST

Investigating

Some customers may notice delays when using Admin or Commerce features. Our engineering teams are investigating and will provide more information as soon as possible. We recommend monitoring your services and retrying any failed requests.

Posted Oct 24, 2025 - 10:04 CEST

This incident affected: Norce Commerce and Norce Commerce (Norce Commerce).