Restricting Hadoop DataNode storage using LVM
Logical Volume Manager
LVM for short.. Is a Device Mapper that acts as a Logical Volume manager for a Linux system. Most modern Linux distros are LVM-ready to the point that they are able to have their root file-systems on a LV.
Ways of Managing LVM:
There are 3 concepts that LVM manages:
→1️⃣ Physical Volume: A physical volume is a collection of disk partitions used to store all server data.
→2️⃣ Volume Groups: is a collection of physical volumes of varying sizes and types.
→3️⃣ Logical Volumes: are groups of information located on physical volumes.
Task Objective:
🔹 Integrating LVM with Hadoop and providing Elasticity to DataNode Storage
Procedure:
1] Setup:
This can go two ways:
i) Using local systems
ii)Using remote Instances
→I’ve launched 2 Instances for my Hadoop Cluster , One NameNode and DataNode on AWS.
2] Create volumes with the size of your choice and then attach them to your DataNode:
To test whether the EBS volume is attached :
fdisk -l
3] Creating a Logical Volume From a Volume Group which consists of Physical Volumes
In some cases , LVM might not be installed in your system or remote instance
yum install lvm2
Next, we create a Physical Volume from the disk we just attached using:
pvcreate <disk_name>
Now we can create a Volume Group using that Physical Volume we just made:
vgcreate <vg_name> <disk_name1> <disk_name2> ….....<disk_nameN>
To see more details we can use:
vgdisplay <vg_name>
Now , we can create a Logical Volume from the existing VG:
lvcreate --size <size> --name <lv_name> <vg_name>
4] Making it a Dynamic Partition:
mkds.ext4 /dev/<vg_name>/<lv_name>
[or]
mkds.ext4 <partition_name>
5] Mounting the partition to our Hadoop Cluster
mount /dev/<vg_name>/<lv_name> /<DN_folder>
[or]
mount /<partition_name> /<DN_folder>
With this.. we can check whether the DN has been reduced in size :
6] Expanding the L V capacity:
LVM’s biggest advantage is the fact that both logical and physical volumes can be resized without any restarts..
→Now we add another Volume ( in my case, I went with a whopping 5 gigz of expansion…..)
Next , we extend the Volume Group using:
vgextend <vg_name> <disk_name>
Now we can add that to the LV using:
lvextend --size <size> <partition_name>
Now we’ve extended the storage capacity of our New partition , all that’s left is to format it:
resize2fs <partition_name>
Now , if we go back to our NameNode WebUI . We may notice the increase in storage capacity:
In conclusion..
Integrating LVM with Hadoop Clusters lets us dynamically control how much storage we wish to lend to our customers to access
Thank you for the time~!