Wednesday, February 22, 2017

Big Data and Machine Learning

Big data , Machine learning are the buzz words which are hot now . Almost everybody is either involved or want to get involved with the same. Articles are buzzing with companies want to use the machine learning to come out with innovative products. Traditionally hardware centric companies are realizing importance of the data that is being generated by their products and are trying to imitate  software Companies in generating the value from data.

 Most of the people consider Big Data , Machine Learning to mean  the same thing , though they are related but are totally different technologies.

Big Data refers mainly to the tools that run in a distributed manner over multiple commodity hardware  nodes and are used to deal with Big Data,  that is the data having the following three characteristics
1.                   1.  Velocity 
            2. Volume                   
            3. Variety

 Where as Machine learning refers to, a group of algorithms which are used to find patterns in the data and use those patterns to make future predictions. Machine learning may not always need Big Data tools. The typical work bench will generally consist of comparatively small data , typically filling couple of hundreds of rows in a typical Excel sheet. In this scenario machine learning algorithms can be run on single machine having a python or R setup.

However in commercial setup , as the volume of the data increases , we will need Big Data tools to perform the first step that is data wrangling. Big Data is of help as the processing gets broken in small chunks and can be done on various nodes in parallel fashion ,Data wrangling is the step when you will typically clean up, transform the data in a form which can be consumed by Machine learning algorithm to produce useful information. Example of Data Wrangling can be creating new elements from existing elements , dropping some of the redundant  elements , replacing the missing data with some values, In case of numerical value depending on the case this can be mean , median .Once the initial processing has been done, one could make predictions by implementing machine learning algorithms using tools like Spark,

Thursday, January 5, 2017

I have a keen interest in Machine Learning . This blog is to share my journey of the same.