The reliability of a computing web infrastructure can be measured by system uptime, which is the percentage of time that a website is online and fully operational. System uptime is important for ecommerce organizations, especially with the increasing number of businesses who rely on internet-generated revenue as their primary means of selling.
Many pure-play ecommerce websites rely on service level guarantees of as-close-to 100% uptime as possible. Any outages or downtime can result in a substantial loss of sales, a weaker brand perception, less business productivity, and the possibility of loyal customers looking elsewhere for services: most likely your competitors.
Downtime is a real possibility if your business does not invest in resilient hosting services. Downtime has struck several major online retailers in recent years such as worldwide fashion giant ASOS, who went offline for over 20 hours. There are no official figures from the outage, but it is expected to have had a significant financial impact.
The world’s biggest online retailer, Amazon.com, went offline for 13 minutes in July 2018 during its Prime Day discount promotion. Some online sources suggest the outage may have cost up to $90 million dollars. It is particularly common for online retailers to experience downtime during peak sales, especially on retail events such as Black Friday and Cyber Monday.
There are many technical solutions which can help protect web assets from downtime. Three of the most popular are fault tolerance, high availability, and load balancing. Often these technologies intertwine and work directly with one another. Besides these recommended solutions, it is essential that all infrastructure architecture is built from a designed foundation of uptime, scaling, and stability cogitations.
Private and public managed service providers (MSPs) design and build resilient computer infrastructure from the ground up. Data center facilities are created to be fault tolerant, which often includes multiple power feeds from the national grid, backed up by many fossil fuel generators. Cooling systems are usually in pairs for redundancy, and there are typically redundant power feeds direct to each of the server racks.
Within the server racks, dual power feeds are provided to the compute infrastructure. At the hardware level there is at least N+1 redundancy to prevent single component failure; that might mean dual power supplies, dual network connection bonding, or a dual SAN setup. Storage will also be configured in a RAID configuration to prevent data loss upon disk failure.
Fault-tolerant software-defined infrastructure is a key component to consider if instituting failover services. Fault tolerance enables a system to continue operating in the event of a failure or unexpected outage. The technology is commonly found within hyper-converged virtual infrastructure services.
The technology configures a fault-tolerant lead node that runs on a specific host. The virtual infrastructure creates a fault-tolerant secondary copy of the node on a different host and keeps both nodes in sync. In the event of a failure, the hypervisor simply fails over to the secondary node in a seamless manner by powering off the failing node and powering up the secondary node. The user will often never notice such an event has taken place, but the worst-case scenario might be a few seconds of micro-stutter.
High availability (HA) is another technology which can help increase uptime on your web infrastructure. HA is not too dissimilar to fault tolerance; the key difference is that within a HA setup, at least 2 versions of the same server run concurrently, often in an active-active or active-passive configuration. Clustered resources, such as database resources, disks, and networking, are typically shared between the HA servers, which are often geographically disparate. Should one server fail, all the load will be transferred to the second server seamlessly.
One of the most popular and commonly-used failover services is that of a load balancer. Load balancers are designed to point ingress traffic to the available resource on the network (usually a compute node). Load balancers are intelligent and will route traffic using a load balancing policy to available nodes. If any nodes fail, then the load balancer will remove the failed node from the pool and continue to serve traffic to other healthy nodes.
There are many types of load balancers available: HTTP(S), SSL Proxy, TCP Proxy, and internal network load balancers. Traffic is routed to the closest available node or instance and typically uses a method of CPU utilization, requests per second, or weighted round robin to determine which node the ingress traffic is transferred to.
A load balancer consists of a front end service, which is usually an external IP address, and a backend service which is the endpoint(s) for the traffic, usually a pool of servers. When traffic hits the front end service it is re-routed using a pre-defined policy to the backend service.
Load balancers offer greater flexibility and are extremely useful for web assets, as they can be used to push website updates “live” in a seamless manner. Typically, a systems engineer will drain several nodes from the load balancer, patch or update the website code, and then return the servers to the pool. This process can be repeated and the result on the user is zero downtime and an easy upgrade path. User sessions are gracefully terminated, and new sessions are rerouted to the new nodes.
If there were no load balancers, anyone who accesses a site is directed to the same server. That server is likely to be inundated with user requests (during peak times or as the site is becoming more popular). When an upswing in traffic occurs, people visiting the site will either experience slow page loads, or the server will start denying requests.
For assistance with implementing failover services as part of a HIPAA-compliant web infrastructure or some other implementation, contact the sales team at Atlantic.Net today!