Tuesday 16 July 2013

What is BigData ?

What is BigData?


Beginners must be assuming, what is this new thing buzzing around these days ?

I would like to take an opportunity here and I would represent BigData in very easy amicable format.

Below image represent What is BigData ?






A question arises why should we care about this ? Yes, correct.

But valid questions are : How it can be useful? Who can make real use of it? How to adopt it? What are the solutions for it?

I would like to state some simple examples here : 

v  Assume how weather forecasting is happening since ages ? - Are they astrologers who are doing this, NO, BIGDATA - It’s all history data they have with them and which helps them to predict the weather.

v  Assume why Google tells you what you need to search whenever you start typing something. - BIGDATA - It's the humongous data they collect everyday from user searches.
v  Assume why Amazon/ebay shows you products those are similar to your likes? - BIGDATA, it's the data stored in their Back-end which constantly tells them user trends and helps to know user likings.

v  Assume why social media sites twitter, facebook, likedIn comes to you with on your home page "People you May Know" / "Who to Follow" - BIGDATA, they constantly keep comparing your profiles with others and find similar people.

I think I have given very simple examples How/Why/What THE BIGDATA can do ? There are plenty more cases but above examples will help you whenever you will find similar scenarios.

There are major problems like : where to store these bigData, how to make it scalable, how to process it and how to make use of it?

For all above questions now-a-days there are solutions in market to cop-up with all above problems, very popular among them are : Apache Hadoop, Cloudera CDH, Hortonworks HDP, NOSQL, Columnar DB, Graph DB, Massive Parallel Programming DB, Oracle Exadata etc.

But as much as I have explored, I would firmly stand with Apache Hadoop which is the best solution that helps to work with BigData. It has its own utilities which help developers, Business Analysts and business owners to adopt BigData easily.

Sunday 7 July 2013

Hadoop VS. RDBMS

Hadoop VS. RDBMS


I am posting this because several times I have been asked about a database and Hadoop. People often get confused between these two entities. At first impression some people believe Hadoop is a replacement of database. But that is not true, its a file system like your current OS : Windows - NTFS, Linux - EXT3/EXT4 and Hadoop - HDFS. 

Below I have mentioned some major difference between a Database (RDBMS) and Hadoop. RDBMS is used for real-time data transactions (Active DB for your web application - Frequent Read, Write, Update and Delete operations) and Hadoop is used for batch processing of large data in TBs and PBs (Unlike RDBMS it works on : Write once and Read multiple principle).

Hadoop is a Distributed File System. It provides a cluster environment of Master and Slave nodes. Hence, with such architecture large data can be stored and processed in parallel. Variety od data can be analysed say, structured(tables), unstructured(logs, email body, blog text) and semi structured (media file metadata, xml, html)

RDBMS, as it name hints is more suitable for relational data and faster access of data.

In below figure you can find out some major differences between Hadoop and RDBMS : 


Please post your queries below for further details.