GitHub Availability Report: September 2023

[Collection]

In September, we experienced two incidents that resulted in degraded performance across GitHub services.

September 5 16:24 UTC (lasting 19 minutes)

On September 5, from 16:24-16:43 UTC, multiple GitHub services were down or degraded due to an outage in one of our primary databases. The primary host for a shared datastore for GitHub experienced an underlying file system write error, which affected availability for the majority of public-facing GitHub services. SAML login was affected, as was access to GitHub Actions, GitHub Issues, pull requests, GitHub Pages, GitHub API, Webhooks, GitHub Codespaces, and GitHub Packages.

The primary database suffered a partial host failure when the disk storage for the operating system became unreachable. In this case, our automatic failover was unable to detect the partial file system failure mode. We mitigated by manually failing over to a healthy host, initiated 17 minutes after our first alert and completed 2 minutes later.

With the incident mitigated, we have worked to assess more detailed impact and resilience improvements to each affected service to reduce the scope of any future incident with this shared dependency. Some of those are complete and the rest will be completed within our standard repair item SLAs. To increase the resiliency of our system, we have improved our automation that will detect and initiate a failover for this type of partial host failure. Additionally, we have identified a source of resource contention that is consistent with this type of failure and patched a fix to reduce the likelihood of recurrence.

September 19 20:36 UTC (lasting 7 hours 30 minutes)

On September 19 at 20:36 UTC, while migrating the primary datastore for GitHub Projects, an incident occurred that disrupted 95% of GitHub Projects data availability for 3.5 hours. A misconfigured index constraint on the primary GitHub Projects database table caused GitHub Projects to become fully unavailable between 20:36 UTC and 00:06 UTC. By 00:06, we restored GitHub Projects data to its state from the beginning of the incident. New project data created by users while the incident was being mitigated was fully recovered and available to users by 04:28 UTC.

In addition, a database replication interruption caused by our remediation steps created limited availability for some Git Operations, APIs, and GitHub Issues for 1.25 hours from 21:48 UTC to 23:00 UTC.

To prevent similar incidents in the future, we have improved validation of data migrations in testing and during rollout. We have evaluated and are making improvements to the constraints for any data migration to prevent the unexpected behavior that led to this data loss. To reduce the time to mitigate similar incidents, we are also in the process of rolling out improvements to reduce both the time to restore data and fix replication issues.

Please follow our status page for real-time updates on status changes. To learn more about what we’re working on, check out the GitHub Engineering Blog.

The post GitHub Availability Report: September 2023 appeared first on The GitHub Blog.

In September, we experienced two incidents that resulted in degraded performance across GitHub services.

The post GitHub Availability Report: September 2023 appeared first on The GitHub Blog.

Hot Posts

GitHub Availability Report: September 2023

Posted by Types Digital Programming

Popular Post

Amazon MGM Studios wins big with ‘American Fiction’ at the 2024 Academy Awards—now streaming on Prime Video

From pair to peer programmer: Our vision for agentic workflows in GitHub Copilot

Git security vulnerabilities announced

Full exposure: A practical approach to handling sensitive data leaks

Twitter

Subscribe Us

Subscribe Us

Most Popular

Amazon MGM Studios wins big with ‘American Fiction’ at the 2024 Academy Awards—now streaming on Prime Video

From pair to peer programmer: Our vision for agentic workflows in GitHub Copilot

Git security vulnerabilities announced

Full exposure: A practical approach to handling sensitive data leaks

Git security vulnerabilities announced

LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone!

Why developer expertise matters more than ever in the age of AI

Upskill your LLMs with Gradio MCP Servers

Git turns 20: A Q&A with Linus Torvalds

GitHub Copilot Spaces: Bring the right context to every suggestion

Facebook

Tags

Categories

Tags

About Me

Search This Blog

Metrocool AXI

Random Posts

Popular Posts

Amazon MGM Studios wins big with ‘American Fiction’ at the 2024 Academy Awards—now streaming on Prime Video

From pair to peer programmer: Our vision for agentic workflows in GitHub Copilot

Git security vulnerabilities announced

Footer Menu Widget

Contact form

Hot Posts

GitHub Availability Report: September 2023

Posted by Types Digital Programming

You may like these posts

Popular Post

Twitter

Subscribe Us

Social Plugin

Subscribe Us

Most Popular

Facebook

Tags

Categories

Tags

About Me

Search This Blog

Metrocool AXI

Random Posts

Popular Posts

Footer Menu Widget

Contact form