Packet loss due to an upstream carrier issue
Incident Report for Bigleaf Networks
Resolved
The network is stable and no further impact is expected.
Posted Aug 30, 2020 - 16:08 PDT
Update
At about 8am Pacific, CenturyLink/Level3 reported that they had resolved a BGP routing problem in their network that had prevented some router sessions from establishing correctly since approximately 4am. Around that time, we had noticed a restoration of normal traffic levels, and network problems appear to have been resolved.

Site-to-cloud traffic and some CenturyLink/Level3 and other carrier circuits were affected while the problem was occurring. We worked to route around their network, which helped restore service for some customers. However, even though we withdrew our route announcements to them, CenturyLink/Level3 were inadvertently continuing to announce withdrawn routes and were black-holing some customers' traffic. The problem wasn't fully resolved until the carrier fixed their BGP peering issues around 8am Pacific time.

CenturyLink/Level3 is a major part of the global internet, so any issue with their network can cause an impact on all services for many carrier networks. Due to the widespread nature of this problem, multiple major internet carriers have temporarily de-peered from CenturyLink/Level3, causing some services hosted in that network to be unavailable. Please let us know if you are experiencing any such problems through your Bigleaf service, and we'll be happy to help troubleshoot it.

We are working with CenturyLink/Level3 to determine the root cause of the problem on their network and to ensure the problem does not happen again. We will leave this incident open for the next 6 hours while we monitor the situation.
Posted Aug 30, 2020 - 10:05 PDT
Monitoring
About 15 minutes ago, CenturyLink/Level3 has reported that they have resolved a BGP routing problem in their network that was preventing router sessions from establishing correctly. We've noticed a restoration of traffic levels, and most problems appear to have been resolved. We're monitoring the situation now.
Posted Aug 30, 2020 - 08:17 PDT
Update
We have pushed out a change to disable our CenturyLink/Level3 carrier nationwide due continuing effects of the packet loss in multiple Gateway Clusters. We're reviewing data and impact reports to ensure this is resolved.
Posted Aug 30, 2020 - 07:13 PDT
Identified
Our automated monitoring systems have detected packet loss on an upstream carrier circuit in our Atlanta Gateway Cluster over the past two hours. The problem may be affecting transit traffic for other POPs as well, since it is a top-tier internet carrier. Unfortunately, the carrier is continuing to announce routes and black-hole providers' traffic even when their BGP sessions are disabled. Some customers are experiencing packet loss due to this issue, and we are currently in the process of mitigating the effects.

We will provide an update once the issue is fully mitigated.
Posted Aug 30, 2020 - 06:37 PDT
This incident affected: Gateway Cluster: Atlanta, GA, Gateway Cluster: Chicago, IL, Gateway Cluster: Dallas, TX, Gateway Cluster: Los Angeles, CA, Gateway Cluster: New York, NY, Gateway Cluster: San Jose, CA, and Gateway Cluster: Seattle, WA.