Restricting Hadoop DataNode storage using LVM

B.V.Rohan Bharadwaj
4 min readNov 16, 2020

--

Logical Volume Manager

LVM for short.. Is a Device Mapper that acts as a Logical Volume manager for a Linux system. Most modern Linux distros are LVM-ready to the point that they are able to have their root file-systems on a LV.

Ways of Managing LVM:

There are 3 concepts that LVM manages:

1️⃣ Physical Volume: A physical volume is a collection of disk partitions used to store all server data.

2️⃣ Volume Groups: is a collection of physical volumes of varying sizes and types.

3️⃣ Logical Volumes: are groups of information located on physical volumes.

Task Objective:

🔹 Integrating LVM with Hadoop and providing Elasticity to DataNode Storage

Procedure:

1] Setup:

This can go two ways:

i) Using local systems

ii)Using remote Instances

I’ve launched 2 Instances for my Hadoop Cluster , One NameNode and DataNode on AWS.

2] Create volumes with the size of your choice and then attach them to your DataNode:

To test whether the EBS volume is attached :

fdisk -l

3] Creating a Logical Volume From a Volume Group which consists of Physical Volumes

In some cases , LVM might not be installed in your system or remote instance

yum install lvm2

Next, we create a Physical Volume from the disk we just attached using:

pvcreate <disk_name>
This is a DEMO pv

Now we can create a Volume Group using that Physical Volume we just made:

vgcreate <vg_name> <disk_name1> <disk_name2> ….....<disk_nameN>
This is a DEMO vg

To see more details we can use:

vgdisplay <vg_name>

Now , we can create a Logical Volume from the existing VG:

lvcreate --size <size> --name <lv_name> <vg_name>

4] Making it a Dynamic Partition:

mkds.ext4 /dev/<vg_name>/<lv_name>
[or]

mkds.ext4 <partition_name>

5] Mounting the partition to our Hadoop Cluster

mount /dev/<vg_name>/<lv_name> /<DN_folder>
[or]
mount /<partition_name> /<DN_folder>
This consists of the steps 4 , 5

With this.. we can check whether the DN has been reduced in size :

6] Expanding the L V capacity:

LVM’s biggest advantage is the fact that both logical and physical volumes can be resized without any restarts..

Now we add another Volume ( in my case, I went with a whopping 5 gigz of expansion…..)

Next , we extend the Volume Group using:

vgextend <vg_name> <disk_name>

Now we can add that to the LV using:

lvextend --size <size> <partition_name>

Now we’ve extended the storage capacity of our New partition , all that’s left is to format it:

resize2fs <partition_name>

Now , if we go back to our NameNode WebUI . We may notice the increase in storage capacity:

In conclusion..

Integrating LVM with Hadoop Clusters lets us dynamically control how much storage we wish to lend to our customers to access

Thank you for the time~!

--

--