Cloud AWS Azure DevOps Kubernetes K8s Troubleshooting Linux

5xx Server Errors – Why it occurs, What is Causes and How to Resolve?

March 31, 2023

1853

If you are new or experienced to Kubernetes or any IT production system, at time you may experience some errors and you would be responsible for it. There is lot of server/client errors and we should fit quicker if it production system. In this will see what is 5xx error and why should it is matter and also lets understand what could cause 5xx errors.

What are HTTP Status Codes

HTTP is a client-server protocol—the client, known as a user-agent, connects to a server and makes requests. The server receives each request, handles it, and returns a response. It is common to have intermediaries known as proxies between the client and server, which relay requests and responses to their destination.

An HTTP request looks like this:

The method indicates what operation the client wants to perform on the server. For example, GET means the client wants to read information.
The version indicates which HTTP version is used by the client.

An HTTP response looks like this:

The version indicates which HTTP version is implemented by the server.
The status code is the response code. If this is a number starting with 5xx, the response indicates a server error.
The status message is a verbal description of the error, which the client can display to the end-user.

HTTP supports the following groups of error codes:

1xx informational response – request was received, and server continues working.
2xx successful – request was received and successfully performed.
3xx redirection – the request was redirected to another URL.
4xx client error – the request was incorrect or invalid and cannot be fulfilled.
5xx server error – problem on the server preventing it from fulfilling the request.

Let’s extend to see only 5xx server error in this article.

What are 5xx Errors

5xx errors are returned as part of the Hypertext Transfer Protocol (HTTP), which is the basis for much of the communication on the Internet and private networks. A 5xx error means “an error number starting with 5”, such as 500 or 503. 5xx errors are server errors—meaning the server encountered an issue and is not able to serve the client’s request.

5xx errors can be encountered when:

A user browses a website, and the web server is experiencing an error.
A application tried to accesses an API, and the API server returns an error.
A component of a distributed system like Kubernetes fails to server requests by other components.

The most common 5xx errors are:

500—Internal Server Error
501 – Not Implemented
502 – Bad Gateway
503 – Service Unavailable
504 – Gateway Timeout
509 –Bandwidth Limit Exceeded
511 – Network Authentication Required

In most cases, the client cannot do anything to resolve a 5xx error. Typically, the error indicates that the server has an application, hardware, or configuration problem that must be remediated.

Why You Should Care About 5xx Errors?

Significance of 5xx Errors for Web Admins

For a website owner or developer, a 5xx error indicates that a website user attempted to access a URL and could not view it. In addition, if search engine crawlers access a website and receive a 5xx error, they might abandon the request and remove the URL from the search index, which can have severe consequences for a website’s traffic.

Significance of 5xx Errors for API Developers

A 5xx error returned by an API indicates that the API is down, undergoing maintenance, or is experiencing another issue. When an API endpoint experiences a problem, returning a 5xx error code is good, expected behaviour, and can help clients understand what is happening and handle the error on the client side.

In microservices architectures, it is generally advisable to make services resilient to errors in upstream services, meaning that a service can continue functioning even if an API it relies on returns an error.

Significance of 5xx Errors for Kubernetes Users

In Kubernetes, a 5xx error can indicate:

A node-level terminating condition—the node is not functioning or unable to respond to a request.
A pod-level terminating condition—the pod may have been terminated (SIGKILL),or is about to be terminated and is currently in the termination grace period (SIGTERM).

Understanding Different 5xx Server Error Codes

500 – Internal Server Error

This error indicates that the server experienced an unexpected condition that was not specifically handled. Typically, this means an application request could not be fulfilled because the application was configured incorrectly.

501 – Not Implemented

This error indicates the server does not support the functionality requested by the client or does not recognize the requested method. This could indicate that the server might respect this type of response in the future.

502 – Bad Gateway

This error indicates that the server is a proxy or gateway and received an invalid response from an upstream server. In other words, the proxy is unable to relay the request to the destination server.

503 – Service Unavailable

This error indicates that the server is temporarily incapable of handling the request, for example because it is undergoing maintenance or is experiencing excessive loads.

The server may indicate the expected length of the delay in the Retry-After header. If there is no value in the Retry-After header, this response is functionally equivalent to response code 500.

504 – Gateway Timeout

This error indicates that a server upstream is not responding to the proxy in a timely manner. This does not indicate a problem in an upstream server, only a delay in receiving a response, which might be due to a connectivity or latency issue.

505 – HTTP Version Not Supported

This error indicates that the application does not support the major HTTP version that was used by the request. The response contains an entity stating why the version is not supported, and providing other protocol versions that the server does support.

506 – Variant Also Negotiates

This error occurs when using Transparent Content Negotiation—a protocol that enables clients to retrieve one of several variants of a given resource. A 506 error code indicates a server configuration error, where the chosen variant starts a content negotiation, meaning that it is not appropriate as a negotiation endpoint.

507 – Insufficient Storage

This error indicates that the client request cannot be executed because the server is not able to store a representation needed to finalize the request. This is a temporary condition, like a 503 error. It is commonly related to RAM or disk space limitations on the server.

508 – Loop Detected

This error occurs in the context of the WebDAV protocol. It indicates that the server aborted a client operation because it detected an infinite loop. This can happen when a client performs a WebDAV request with Depth: Infinity.

509 – Bandwidth Limit Exceeded

This error indicates that the request exceeded the bandwidth limit defined by the server’s administrator. The server configuration defines an interval for bandwidth checks, and only after this interval, the limit is reset. Client requests will continue to fail until the bandwidth limit is reset in the next cycle.

510 – Not Extended

This error indicates that the access policy for the requested resource was not met by the client. The server will provide information the client needs to extend their access to the resource.

511 – Network Authentication Required

This error indicates that the resource accessed requires authorization. The response should provide a link to a resource that allows users to authenticate themselves.

Causes of 5xx Server Errors

5xx errors can occur at multiple layers of the server environment. In an application, these layers include:

Web server
Reverse proxy server
Database server
API Server
Web development framework
Content distribution network (CDN)
Content management system

Here are a few common reasons for 5xx server errors, regardless of the type of application:

Code bugs – the application serving the request is experiencing an error because of an internal bug.
Updates – the application has been updated and the new version is not able to serve the request correctly.
Incompatibilities – the application is not compatible with other software on the host or with hardware on the host.
Operating system issues – operating system crashed, corrupted, or misconfigured.
Hardware issues – hardware failure or misconfiguration on the host.
Back-end failure – a back-end component the application relies on has failed or is not responding.
Insufficient resources – the host may not have sufficient resources to serve the current application load.
Insufficient bandwidth – the host’s network bandwidth may be exhausted by the current application load.

How to Resolve 5XX errors

Debugging Server-Side Scripts in Applications

5xx server errors are often caused by customer scripts you are running on a server. Here are a few things you should check if your application returns a 5xx error:

Check server permissions – your script may not have permission to perform the necessary operations on a file or folder. For example, the script may need to write files but may not have write permission to its folder.
Check for script timeouts – the script may have timed out. Coding errors or other issues might cause a script to use excessive resources or get stuck in a loop.
Check for server timeouts – in some cases the script itself is working properly, but the server is not working properly – for example, restarting or disconnected from the network.
Check for .htaccess error – on an Apache web server, the .htaccess file defines the configuration of the web server on a certain directory. An encoding error in the .htaccess file can result in 500 errors.
Check for script-specific errors – turn on error logging in your web framework to identify what is wrong with the custom script. There may be errors returned by the runtime environment or logged by the script itself.
Check for server-specific errors – consult with the hosting provider or server administrator to see if they are familiar with an error caused by the specific server or a component interacting with the server.

Debugging 5xx Server Errors in NGINX

The NGINX documentation recommends an interesting technique to debug 5xx errors in an NGINX server when it is used as a reverse proxy or load balancer—setting up a special debug server and routing all error requests to that server. The debug server is a replica of the production server, so it should return the same errors.

There are a few benefits to this approach:

The debug server only receives error requests, so its logs will contain only errors, making investigation and resolution easy.
The debug server does not need high performance, so it is possible to enable all logging and diagnostic tools, including stack trace and application profiling.
You can use the max_conns parameter to limit the number of requests directed to the debug server, to avoid overwhelming it if there is a sudden spike of errors.
It is easy to identify errors that are due to resource issues on the production server—if a request returns an error on the production server but works fine on the debug server.

You can use the following configuration to set up an application server and route errors to a debug server:

upstream app_server {
    server 172.16.0.1;
    server 172.16.0.2;
    server 172.16.0.3;
}
 
upstream debug_server {
    server 172.16.0.9 max_conns=20;
}

server {
    listen *:80;
    location / {
        proxy_pass http://app_server;
        proxy_intercept_errors on;
        error_page 500 503 504 @debug;
    }
 
    location @debug {
        proxy_pass http://debug_server;
        access_log /var/log/nginx/access_debug_server.log detailed;
        error_log  /var/log/nginx/error_debug_server.log;
    }
}

Debugging 5xx Errors in Kubernetes Nodes

There are two common errors for 5xx errors returned by a Kubernetes node—node-level termination and pod-level termination.

Node-level termination events

Nodes can return 5xx errors if an automated mechanism, or a human administrator, makes changes to the nodes without first draining them of Kubernetes workloads. For example, the following actions can result in a 5xx error on a node:

An administrator performing maintenance on a node
An administrator restarting a node
A cloud service scaling down and terminating a node
A process on the node attempts to restart or shut it down

To diagnose and resolve a 5xx error on a node:

Identify if the node as shut down or modified by a staff member or an external process.
Check if the node is running, and if not, restart it and ensure it rejoins the cluster.
Log into the nodes and review logs to see what caused the node to fail or misbehave.

Pod level termination events

When a pod is terminated due to eviction from a node, the following process occurs:

The Kubernetes control plane instructs the kubelet to terminate the pod.
The kubelet instructs the operating system to send a SIGTERM (15) signal to all containers running in the pod.
There is a configurable grace period, during which applications could gracefully shut down and close existing connections.
The operating system sends SIGKILL to kill any remaining containers in the pod.

5xx errors can occur in between steps 3 and 4. When applications are shutting down, they might fail to serve certain requests and return errors, which will typically be 502 (bad gateway) or 504 (gateway timeout).

Hope this article helps to understand 5XX errors, in coming articles will see how to troubleshooting selected 5xx error with real time example. If you like you can check our Kubernetes Troubleshooting Articles for more understanding on Troubleshooting.

5xx Server Errors – Why it occurs, What is Causes and How to Resolve?

What are HTTP Status Codes

What are 5xx Errors

Why You Should Care About 5xx Errors?

Understanding Different 5xx Server Error Codes

Causes of 5xx Server Errors

How to Resolve 5XX errors

Like this:

Related

Simplifying Azure Application Gateway Failed Request Monitoring

Kubernetes pod command and args – compared with Dockerfile

How to Troubleshoot Kubernetes Insufficient Node Resources

Most Popular

Simplifying Azure Application Gateway Failed Request Monitoring

Kubernetes pod command and args – compared with Dockerfile

How to Troubleshoot Kubernetes Insufficient Node Resources

Deep Dive into Kubernetes Design Patterns: Building Resilient and Scalable Applications

Recent Comments

EDITOR PICKS

Simplifying Azure Application Gateway Failed Request Monitoring

Kubernetes pod command and args – compared with Dockerfile

How to Troubleshoot Kubernetes Insufficient Node Resources

POPULAR POSTS

Simplifying Azure Application Gateway Failed Request Monitoring

Kubernetes pod command and args – compared with Dockerfile

How to Troubleshoot Kubernetes Insufficient Node Resources

POPULAR CATEGORY

ABOUT US

FOLLOW US

5xx Server Errors – Why it occurs, What is Causes and How to Resolve?

What are HTTP Status Codes

What are 5xx Errors

Why You Should Care About 5xx Errors?

Understanding Different 5xx Server Error Codes

Causes of 5xx Server Errors

How to Resolve 5XX errors

Share this:

Like this:

Related

Most Popular

Recent Comments

EDITOR PICKS

POPULAR POSTS

POPULAR CATEGORY

ABOUT US

FOLLOW US