Sunday 7 July 2013

Hadoop VS. RDBMS

Hadoop VS. RDBMS


I am posting this because several times I have been asked about a database and Hadoop. People often get confused between these two entities. At first impression some people believe Hadoop is a replacement of database. But that is not true, its a file system like your current OS : Windows - NTFS, Linux - EXT3/EXT4 and Hadoop - HDFS. 

Below I have mentioned some major difference between a Database (RDBMS) and Hadoop. RDBMS is used for real-time data transactions (Active DB for your web application - Frequent Read, Write, Update and Delete operations) and Hadoop is used for batch processing of large data in TBs and PBs (Unlike RDBMS it works on : Write once and Read multiple principle).

Hadoop is a Distributed File System. It provides a cluster environment of Master and Slave nodes. Hence, with such architecture large data can be stored and processed in parallel. Variety od data can be analysed say, structured(tables), unstructured(logs, email body, blog text) and semi structured (media file metadata, xml, html)

RDBMS, as it name hints is more suitable for relational data and faster access of data.

In below figure you can find out some major differences between Hadoop and RDBMS : 


Please post your queries below for further details.

1 comment:

  1. Good work sir, Thanks for the proper explanation about bigDataand Hadoop introduction. I found one of the good resource related to BigData and Hadoop . It is providing in-depth knowledge on Big data and Hadoop. which I am sharing a link with you where you can get more clear on BigData and Hadoop...

    ReplyDelete