There’s a lot of talk today of data science and machine learning. Like any buzzwords, their meaning is muddied as we try to include more of the work we do in their definitions (everyone wants to be a Data Scientist, right?).
Data science, also known as data-driven science, is the process of gaining business insights from data. It is not a tool, but a behaviour. The role of a Data Scientist encompasses business strategy, hypothesis generation, data wrangling, analysis, modelling and insight communication (often in the form of data visualisation).
Machine learning is a data science technique. It describes a type of modelling, similar to statistical modelling, with the aim of finding hidden patterns in large data sets. Statistical modelling techniques are often described as a type of machine learning as, over time, the disciplines have merged and overlap more of each other
Statistics holds assumptions to infer events.
Machine learning uses data to predict them.
There are two main types of machine learning:
Supervised learning techniques (also known as predictive or directed) are used when you know what you are trying to predict (your modelling target) – customer churn for example. These models analyse historical data to identify variables that will help to predict your target in future data. Businesses have been building these types of models for decades, with common modelling techniques including classification and regression.
Unsupervised learning techniques (also known as descriptive or undirected) are used when you don’t know what you are trying to predict – product recommendations for example. The modelling (such as clustering or feature extraction) is nothing new, but the volume of data these models are exposed to has led to the hype around this type of machine learning.
Some things to watch out for when using machine learning:
Not all models work for all business problems. Many machine learning models are black boxes. You put data in and you get an answer out, but you are unable to understand what influenced the output. The model may be more accurate than traditional statistical approaches, but it’s a process you can’t explain to the business.
Machine learning is not magic. It is a set of tools you can use to help solve a problem. There will always be some sort of human judgement as part of the machine learning process.
Machine learning can be bias. Models that learn from data showing prejudice will have this bias built into their output. There are many ethical questions that will need to be considered in a future of machine learning.
Does your business need data science and machine learning?
In regards to data science – yes, absolutely. All companies can benefit from using parts of the data science process to help drive their business. Regardless of the amount of data a business collects, putting structure around how this data is used will lead to insight.
Machine learning isn’t for everyone.
To build a machine learning model you generally need lots of data and that data needs to be clean. Once built, there also needs to be a way to operationalise the model (that is, it needs to run in-database).
Businesses are quick to jump to more advanced machine learning solutions before mastering the data science basics. The solution should always be what’s best for the business problem. Sometimes, understanding and applying a data science approach is enough to generate great results.