In September, we experienced three incidents that resulted in degraded performance across GitHub services.
September 15 17:55 UTC (lasting 25 minutes)
On September 15, 2025, between 17:55 and 18:20 UTC, Copilot experienced degraded availability for the majority of the features. This was due a partial deployment of a feature flag to a global rate limiter. The flag triggered behavior that unintentionally limited 100% of requests, returning 403 errors. The issue was resolved by reverting the feature flag which resulted in immediate recovery.
The root cause of the incident was from an undetected edge case in our rate limiting logic. The flag was meant to scale down rate limiting for a subset of users, but unintentionally put our rate limiting configuration into an invalid state.
The issue has been resolved, and we are enhancing system resilience by adding traffic anomaly monitors for early issue detection and increasing coverage of rate limit scaling tests to strengthen pre-production validation.
September 24 14:02 UTC (lasting 50 minutes)
On September 23, 2025, between 15:29 UTC and 17:38 UTC, and also on September 24, 2025, between 14:02 UTC and 15:12 UTC, email deliveries were delayed, resulting in significant delays for most types of email notifications. While the overall incident impact from the two incidents totaled ~130 minutes, the peak delays experienced by customers was ~50 minutes. This occurred due to an unusually high volume of traffic, which caused resource contention on some of our outbound email servers.
We have updated the configuration to better allocate capacity when there is a high volume of traffic and are also updating our monitors to improve our detection capabilities.
September 29 16:26 UTC (lasting 67 minutes)
On September 29, 2025, between 16:26 UTC and 17:33 UTC, the Copilot API experienced a partial degradation, causing intermittent erroneous 404 responses for an average of 0.2% of GitHub MCP server requests, peaking at times around 2% of requests. The issue stemmed from an upgrade of an internal dependency, which exposed a misconfiguration in the service.
We resolved the incident by rolling back the upgrade to address the misconfiguration. We fixed the configuration issue and will improve documentation and rollout process to prevent similar issues.
Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.
The post GitHub Availability Report: September 2025 appeared first on The GitHub Blog.
In September, we experienced three incidents that resulted in degraded performance across GitHub services.
The post GitHub Availability Report: September 2025 appeared first on The GitHub Blog.
Social Plugin