Thursday, 6 June 2013

Indexing BigData

Are you been thinking for indexing Big Data ?

Yes, a very exciting utility is going to be launched soon by Cloudera. Indexing on structured data has been proved very successful and efficient for faster search. But as data sources are growing day by day we are much concerned about indexing unstructured data and faster search on the same. Cloudera is building a utility called Cloudera Search which gives capability of faster search on unstructured data.

A very positive sign is , it is an open source utility. It has been built on very famous search indexing service called Lucene, SolR and Apche Flume.

A very good architectural model could be Flume + Apche SolR + HDFS.

Flume synching all the streaming live data (unstructured) with Apache SolR for indexing and Apache solR putting all large volume of indexed data into HDFS.

Just think now if you have data from a healthcare organization, which has a micro level information of individual patient, their each second health records then searching becomes very much difficult without indexing. If data is indexed although data is unstructured searching becomes many times faster. In this case for particular hospital, with particular group of patients and particular disease can be searched easily.

Indexing will make real time transactions faster for BigData.