Adding Elasticity to the datanode in Hadoop

I have recently started to learn about the hadoop environment. HDFS cluster for distributed storage is something that I have been researching for quite sometime.

Elastic storage in hadoop cluster is something that fascinated me a lot. The concept involved of LVM storage integration with hadoop cluster makes me amazed that in how many ways we can integrate the various technologies to improve the existing architecture.

So what is LVM?

LVM in layman terms will allow us to create dynamic storage units that can be resized according to the need without any hassling or issues of data leakage or wastage of energy. Everything with a single command.

Sounds interesting right? Let’s see what we gonna do today

  1. Firstly I would be creating a physical volume. A physical volume is a storage unit that acts as building block of storage in LVM.
  2. Next we would be creating the volume groups. Volume groups are like a canvas on which various physical volumes can be put for storage. Volume groups dont have any storage of their own.
  3. Next we will use the volume group as a traditional storage unit . We will partition it and format it like we do always.
  4. We would then mount this partition to the datanode folder.
  5. Then to test the power of LVM we would be expanding the partition on the fly.

Creating physical volume

fdisk -l

We have a 2GB volume with us.

The volume group only takes physical volume as the storage means. So we need to convert our hard disks into physical volumes.

We first need to install lvm2 before using the LVM commands

yum install lvm2

pvcreate /dev/xvdf

Creating Volume Group

vgcreate “name_of_vg” /dev/xvdf

To check the status of the volume group the following command can be used.

The real storage unit that exists is the physical volume. The volume group accumulates the storage but does not have any real storage of it’s own.

Creating a partition

lvcreate — size “add_size” — name “name_of_partition” “name_of_VG”

This creates a partition in the volume group. Now we need to format it. We might be using a volume group but the basic structure of partitioning still remains the same.

mkfs.ext4 /dev/”vgname”/”lvname”

Now our storage is ready to be mounted. Since we want the feature of elasticity in our datanode, we would mount this storage in the datanode folder.

mount /dev/”vgname”/”lvname” /”datanode_folder”

To see if the storage has been mounted or not, we can use the following command

To check this we can check the dfsadmin since we have contributed the storage from that directory to the namenode

Increasing the storage

Now let us check again using dfsadmin whether there has been a change in the storage capacity of the namenode.

We can see clearly that there is no increase in the storage. The reason for this is that we have not yet formatted the new storage we added. For this we can use a smart formatting tool like resize2fs. This tool checks the partition and formats only the part that has not been formatted leaving the already formatted partition untouched.

Let us check the storage again in the hadoop namenode.

Now the storage has increased from the last time. I have added just 10MB of space for testing.

Now you must have realized that how important the concept of elastic storage can be. Increasing the storage on the fly without any added work. Just a single command to do all the work


Tech enthusiast