On March 25th of 2023 a very high rate of 500 HTTP responses was detected in the Service Management API frontend affecting 100% of the traffic. The root cause was tracked down to an intermediate proxy layer between the frontend instances and the sharded storage layer. Remediation actions were undertaken in this proxy layer to restore the service.
Timeline:
Root cause:
Unexpected error on an intermediate proxy layer between Service Management API frontend instances and sharded storage layer.