In one of my previous articles I configured the haproxy in my local virtual machine setup.
So today I would be doing a similar configuration on the cloud. This makes more sense since usually the webservers are deployed on the cloud and not on local virtual machines.
So before beginning I would be giving a gist of why we doing this configuration.
In today’s world hosting website is a very common yet critical task. No company wants the web server to go down due to heavy traffic because this is clear loss of business opportunities.
In today’s world there are millions of user accessing the internet every day. All the content on the web is hosted on a web server which is responsible for taking the load from the clients.
However there is an upper cap to the number of users a single server can take. So if we want to configure multiple web servers there are few challenges
- firstly all these different web servers have different IP hence if the user want to connect to them the he would have to remember all these IP address which is not practical since some websites have thousands of servers.
- not all web servers are busy at all times and the user would have to check the traffic on different web servers before connecting. Again this is not a very time-efficient way of managing things.
So what is the solution?
This is the architecture of a typical load balancer. The client does not connect to the web server directly. The client first connects to the load balancer and the task of the load balancer is to check the traffic on each server and connect the client to the desired webserver.
By using this approach both the challenges are directly solved. Firstly since the load balancer is responsible for managing the traffic the user does not need to care about this.
Secondly the user has the IP of the load balancer only and since the load balancer redirects to the webserver implicitly the user need not care about it.
Let’s do the configuration
Whenever we are automating any task we need to have a proper plan about what we want to achieve.
The task can be divided into parts. Firstly we need to configure one of the EC2 instance as reverse proxy server and configure other nodes as web server. Today I would be configuring two nodes as web servers.
For this I would be creating two playbooks.
Before creating playbook we need to set up the inventory and the configuration files.
I would be using the localhost as the proxy server and the <backend_servers> would be configured as web servers. You can notice that the authentication is based on private-public key. This is because password authentication is less secure and also not allowed by default in EC2 for SSH.
When we configure the backend servers we need to escalate our privilege so that we can execute command as the root user with unlimited power. Hence we specify this in the configuration file.
Now since we have the prerequisite of creating the files setup we can begin the configuration.
Configuring the reverse proxy
For configuring the system as haproxy server we need to do the following tasks:
- Install the haproxy server software
- Configure the configuration file of the haproxy
- Start the services
The given playbook sets the localhost as a reverse proxy. While executing the playbook we take the input for the port which we want to bind our proxy server to and also the port through which our backend servers or web servers would be connecting.
The most critical task in this file is the setup of the configuration file.
Since the template module is used to copy the file at the required location it first parses the content. Here we used a for loop that loops over the <backend_servers> that is defined in the inventory file.
Once this is done the next task is to configure the webserver.
Configuring the web server
The steps of configuring the web server node are as follows:
- download the apache web server software
- download the python software
- copy the content we want to host on the server
- set the permissions of these files transferred so that the user can execute them
- start the web server
I downloaded the python software because for experimenting I would be displaying the IP of the webserver and I used python for this task.
The content of my file is static so I used copy module in this task and not template.
After executing this playbook we can try connecting to the webserver through the proxy server.
Let’s try this.
In the file you can notice that when run the command for the first time the IP displayed for the webserver is 172.31.37.242. The next time I run the command the IP being displayed is 172.31.11.43.
This means that the proxy server connects to different webserver dynamically without any intervention. This proves it is working correctly.
The task we did today might seem very simple but in real industry use case it is a very important task. Automating these tasks is very important since it is very difficult to manage so many tasks manually. Automation also makes these tasks less prone to errors and even they are bugged it is easier to detect the issue since all the configuration files can be found in a localised system.