Adding Elasticity to the datanode in Hadoop

Arjun Chauhan
5 min readNov 5, 2020

--

I have recently started to learn about the hadoop environment. HDFS cluster for distributed storage is something that I have been researching for quite sometime.

Elastic storage in hadoop cluster is something that fascinated me a lot. The concept involved of LVM storage integration with hadoop cluster makes me amazed that in how many ways we can integrate the various technologies to improve the existing architecture.

So what is LVM?

LVM is a tool for logical volume management. With LVM, a hard drive or set of hard drives can be allocated to one or more physical volumes. LVM physical volumes can be placed on other block devices which might span two or more disks.

LVM in layman terms will allow us to create dynamic storage units that can be resized according to the need without any hassling or issues of data leakage or wastage of energy. Everything with a single command.

Sounds interesting right? Let’s see what we gonna do today

  1. Firstly I would be creating a physical volume. A physical volume is a storage unit that acts as building block of storage in LVM.
  2. Next we would be creating the volume groups. Volume groups are like a canvas on which various physical volumes can be put for storage. Volume groups dont have any storage of their own.
  3. Next we will use the volume group as a traditional storage unit . We will partition it and format it like we do always.
  4. We would then mount this partition to the datanode folder.
  5. Then to test the power of LVM we would be expanding the partition on the fly.

Creating physical volume

The normal storage units are not compatible to be used in a volume group. We can see the current attached volumes using the given command.

fdisk -l

We have a 2GB volume with us.

The volume group only takes physical volume as the storage means. So we need to convert our hard disks into physical volumes.

We first need to install lvm2 before using the LVM commands

yum install lvm2

pvcreate /dev/xvdf

Creating Volume Group

The next step is to create a volume group. A volume group takes in number of physical volumes and creates a single unit from it

vgcreate “name_of_vg” /dev/xvdf

To check the status of the volume group the following command can be used.

The real storage unit that exists is the physical volume. The volume group accumulates the storage but does not have any real storage of it’s own.

Creating a partition

Now we have got ourselves a storage unit. So the next task would be to create a partition from it. This is called a logical partition. To create a logical partition we use the following command

lvcreate — size “add_size” — name “name_of_partition” “name_of_VG”

This creates a partition in the volume group. Now we need to format it. We might be using a volume group but the basic structure of partitioning still remains the same.

mkfs.ext4 /dev/”vgname”/”lvname”

Now our storage is ready to be mounted. Since we want the feature of elasticity in our datanode, we would mount this storage in the datanode folder.

mount /dev/”vgname”/”lvname” /”datanode_folder”

To see if the storage has been mounted or not, we can use the following command

To check this we can check the dfsadmin since we have contributed the storage from that directory to the namenode

Increasing the storage

Now I would like to show the power of the LVM storage unit we just created. Let’s say in a certain scenerio we get an urgent need to increase the volume size. Under normal circumstances where we dont have elasticity there would be an increased workload but LVM makes this very easy for us. We just take some storage from the volume group and add it to the system. Let’s see how this is done.

Now let us check again using dfsadmin whether there has been a change in the storage capacity of the namenode.

We can see clearly that there is no increase in the storage. The reason for this is that we have not yet formatted the new storage we added. For this we can use a smart formatting tool like resize2fs. This tool checks the partition and formats only the part that has not been formatted leaving the already formatted partition untouched.

Let us check the storage again in the hadoop namenode.

Now the storage has increased from the last time. I have added just 10MB of space for testing.

Now you must have realized that how important the concept of elastic storage can be. Increasing the storage on the fly without any added work. Just a single command to do all the work

Thanks.

--

--