In this post I am providing specific instructions for setting up a data science strategy for CPG and consumer product companies. It also includes the following tools:
- a downloadable template to use as a guide;
- a dataset of use cases for data in a CPG or consumer product company, to give you ideas for your own company.
What is data science in a CPG and consumer product company
When I say data, I define as follows:
- data: quantitative in nature…numbers
- information: qualitative in nature…documents, media like speech, video and images
So when I refer to data, it includes the above.
Then there is data about data.
- knowledge, which I define as synthesizing data to make it more useful (also called analysis)
- Wisdom, which I define the ultimate purpose of data, which is delivering insights from knowledge honed through experience of the practitioner.
So in its purest form, I define data science as obtaining wisdom (or insights) from data.
In the context of a CPG or consumer product company, insights from data ranges from easy to hard.
Analysis of the past, like top line revenue total from last month’s sales, is usually pretty easy. This tells us what happened. Easier analysis which I define to be general calculations like summing or averaging would be thrown into a category called “Standard Analysis” for data science workflows.
But analyzing data to discover patterns and correlations (also called relationships) is harder, while using past data to help us ask “what if” and to help us predict, can be a lot harder, especially if we want it in real-time based on incoming data. Analysis likes correlations, patterns and predictions I throw into a category called “Advanced Analysis” for data science workflows.
And, analysis can be iterative, which means by continuously doing it from new data coming in, our insights could improve.
So, data science is the practice of analyzing data to give us insights from what happened into the past to help us make make future decisions, and done continuously to improve our decision-making potential.
In this context, all CPG and consumer product companies practice data science, especially at the standard analysis level, and in some cases, at the advanced analysis level.
What is a data science strategy for CPG and consumer product companies
A data science strategy is simply a document (information in this case) that defines how data (which is data and information as defined above) – its collection, storage, and general use of – supports the CPG or consumer product company in achieving its vision, mission, culture and goals. I define these elements more clearly and explain how they are related and different from each other in this post.
Why is a data science strategy important
CPG and consumer product companies succeed off of data. Having a documented strategy and process in place for data will help ensure that its value is maximized and nothing is missed or overlooked.
How to develop a data science strategy for CPG and consumer product companies
Identify the data to collect and its governance
Let’s start with our key performance indicators (KPI for short). They are numerical data points that should tell you in real time what is going on in your business.
KPIs that are reporting good values should indicate that you are on the right path to achieving your goals, which, if achieved, help get you to achieving the culture, mission and vision for what you want in your company. There should be congruency between KPI to goals to culture, to mission, and to vision. Bad KPI’s can mean everything after that is at risk.
The KPI’s important to you will point you towards the data to collect and analyze
Your data leads to analysis which leads to your KPI’s which leads to your goals which leads to culture, mission and vision.
So, your first step in a data science strategy is to define, at a minimum, the data you should collect to support these elements.
There may be other required data sets you need to collect for regulatory purposes, so define them as well.
Additional datasets to collect
While defining the minimum data you need to collect is a start, I strongly recommend going much further than that. Define what you would like to have, in an ideal world. You may not be able to get it, but by at least identifying it, you can work towards getting it in the future.
A problem with data is that we tend to work with the data we can most easily obtain and/or easily work with. But we need to think about finding data that is hard to get and/or hard to work with because there may be great nuggets of insight and wisdom to be found from that data. Furthermore, because it might be hard, most other companies won’t do it, which may give us a competitive advantage.
I recommend collecting as much data as possible, because more and more the collection and storage of data is cheap and easy. That does not mean the analysis and the data preparation required for analysis is cheap and easy or maybe not even doable, but as technology advances, it will likely become doable and cheap and easy.
For example, I started collecting detailed customer usage data for one of my business going back to 2006. I knew back then I could not do anything with it, but eventually when machine learning came of age, as it has, then I could make use of it, which I am.
Another example is to record all meetings and calls and use machine learning tools to transcribe to text. Taking notes in summary fashion in calls, which is what I do, often misses important context of what people say, so having text transcriptions for reference is a significant benefit. Further, using natural language processing in machine learning can take volumes of recorded calls and meetings to help potentially unearth key nuggets of information or identify patterns and correlations.
Collect as much data as you can because it could become of significant value in the future through technology.
Protect the business by getting data, information, knowledge and wisdom out of people’s heads
You absolutely do not want people to maintain information in their heads, where they can walk out with it and disrupt the business from their absence. Put policies in place to record standard operating procedures, workflows, and knowledge bases that require people to record everything digitally.
Set access rights
Identify who can access what information.
Identify how available the data needs to be
My general rule is to have realtime and updated access to all data and analysis that rolls up to my KPI’s and be able to do that from my mobile device. Beyond that, set your own policies for how close to realtime data and its analysis must be available. Usually the closer to realtime, the more cost and effort involved.
Identify who collects the data
Identify down to each employee who is responsible for which data to be collected and digitally stored. There are many common datasets that every employee would collect – work emails, files, recorded calls and meeting, workflows, standard operating procedures, knowledgesets – all of which are automatically digitized and stored. But for specific datasets, identify who has responsibility for them and put in place processes to verify that the data is being collected as scheduled in the proper format.
Identify the data storage capabilities
It use to be that you try to identify what you think you will need based on your vision of the size and scope of your company. Put in place what you can afford now and add capacity and capabilities later when you need them.
Now, just go big to start. Cloud storage costs and additional services (like machine learning) are cheap to start even on a small scale and since pricing is based on use, you can start small and easily add storage and services as needed.
It is better to setup scalable infrastructure to start, which is quite inexpensive to do even for a consumer product startup company, so better to get that out the way rather than later face having to move to another system, which can very disruptive to the business.
Put redundancies in place so that data is backed up and backups are easy to access and restore.
Data directory tool
Use this worksheet to document your datasets. Be sure to read the notes inserted into each header field label. This worksheet is a simple and basic approach to using a directory of your data so you can see what you have and the governance around each dataset.
This is a good place to start for any company, startups especially, but it is better to actually build this tool into a relational database so that it becomes a metadata of your data, capturing more information such as semantics, tags, provenance, lineage, evolution of the data, processes that extract, transform and load the data, versioning, when verification’s have been performed to validate the data was collected, and more.
Set regular data science strategy reviews
Set triggers to update the data directory worksheet. For myself and my companies, I use a daily log that contains certain prompts or reminders to remember to record any changes to the data directory so that it stays updated. But those changes have to come through department heads first before being approved. This is another reason to setup a metadata of your data in a database, so that people can access the database at the same time and rules can be setup where department heads must approve changes to the metadata database before going live.
Then, setup a time – a suggestion is monthly – for management to review the data directory worksheet or metadata database to determine if existing datasets support the company strategy or if new datasets need to be collected.
The tactics of the data science strategy
The tactics of a data science strategy would include more about people, skills, roles and responsibilities, software applications, workflows or projects, etc.
Data Science Use Cases
The following dataset includes data science use cases for data in a CPG or consumer product company. There are 2 ways to use it:
- I include generic data sets and their business use by department or company division so you can see if they apply to your company and include them in your data strategy. The company field is blank for these records.
- I track more specific data use cases practiced by other companies, with emphasis on what they are doing with more advanced data analysis, like using machine learning. This data is only available to Basecamp InQb8r members.
Please let me know if I am missing any (either in the comments below or through our contact page). I update this dataset regularly so subscribe to receive emails from us to be notified with new additions.
This dataset can be filtered and/or exported.
- Basic: addition, subtraction, multiplication, division, percentages, averages
- Intermediate: utilizing more advanced statistics to analyze past data
- Advanced: utilize modeling (AI and machine learning)
- Internal: data produced by the company
- External: data produced external to the company, either by other vendors and available for sales or in the public domain
If you cannot scroll down to the bottom of the page (where you see the footer), try resizing your browser window by selecting the Maximize/Restore down button on the top right of the browser window. Doing this once should correct the view and then you can resize your browser as you desire. Sometimes the embedded table distorts formatting that should fix itself with this action.