Big data , Machine learning are the buzz words which are hot
now . Almost everybody is either involved or want to get involved with the
same. Articles are buzzing with companies want to use the machine learning to
come out with innovative products. Traditionally hardware centric companies are realizing importance of the data that is being generated by their
products and are trying to imitate software
Companies in generating the value from data.
Most of the people
consider Big Data , Machine Learning to mean the same thing , though they are related but are
totally different technologies.
Big Data refers mainly to the tools that run in a
distributed manner over multiple commodity hardware nodes and are used to deal with Big Data,
that is the data having the following
three characteristics
1. 1. Velocity
2. Volume
3. Variety
Where as Machine
learning refers to, a group of algorithms which are used to find patterns in
the data and use those patterns to make future predictions. Machine learning
may not always need Big Data tools. The typical work bench will generally
consist of comparatively small data , typically filling couple of hundreds of
rows in a typical Excel sheet. In this scenario machine learning algorithms can
be run on single machine having a python or R setup.
However in commercial setup , as the volume of the data
increases , we will need Big Data tools to perform the first step that is data wrangling. Big Data is of help as the processing gets broken in small chunks and can be done on various nodes in parallel fashion ,Data wrangling is the step when you will typically clean up, transform the data
in a form which can be consumed by Machine learning algorithm to produce useful
information. Example of Data Wrangling can be creating new elements from
existing elements , dropping some of the redundant elements , replacing the missing data with
some values, In case of numerical value depending on the case this can be mean
, median .Once the initial processing has been done, one could make predictions by implementing machine learning algorithms using tools like Spark,