The KPI’s important to you will point you towards the data to collect and analyze
Your data leads to analysis which leads to your KPI’s which leads to your goals which leads to culture, mission and vision.
So, your first step in a data science strategy is to define, at a minimum, the data you should collect to support these elements.
There may be other required data sets you need to collect for regulatory purposes, so define them as well.
Additional datasets to collect
While defining the minimum data you need to collect is a start, I strongly recommend going much further than that. Define what you would like to have, in an ideal world. You may not be able to get it, but by at least identifying it, you can work towards getting it in the future.
A problem with data is that we tend to work with the data we can most easily obtain and/or easily work with. But we need to think about finding data that is hard to get and/or hard to work with because there may be great nuggets of insight and wisdom to be found from that data. Furthermore, because it might be hard, most other companies won’t do it, which may give us a competitive advantage.
I recommend collecting as much data as possible, because more and more the collection and storage of data is cheap and easy. That does not mean the analysis and the data preparation required for analysis is cheap and easy or maybe not even doable, but as technology advances, it will likely become doable and cheap and easy.
For example, I started collecting detailed customer usage data for one of my business going back to 2006. I knew back then I could not do anything with it, but eventually when machine learning came of age, as it has, then I could make use of it, which I am.
Another example is to record all meetings and calls and use machine learning tools to transcribe to text. Taking notes in summary fashion in calls, which is what I do, often misses important context of what people say, so having text transcriptions for reference is a significant benefit. Further, using natural language processing in machine learning can take volumes of recorded calls and meetings to help potentially unearth key nuggets of information or identify patterns and correlations.
Collect as much data as you can because it could become of significant value in the future through technology.
Protect the business by getting data, information, knowledge and wisdom out of people’s heads
You absolutely do not want people to maintain information in their heads, where they can walk out with it and disrupt the business from their absence. Put policies in place to record standard operating procedures, workflows, and knowledge bases that require people to record everything digitally.
Set access rights
Identify who can access what information.
Identify how available the data needs to be
My general rule is to have realtime and updated access to all data and analysis that rolls up to my KPI’s and be able to do that from my mobile device. Beyond that, set your own policies for how close to realtime data and its analysis must be available. Usually the closer to realtime, the more cost and effort involved.
Identify who collects the data
Identify down to each employee who is responsible for which data to be collected and digitally stored. There are many common datasets that every employee would collect – work emails, files, recorded calls and meeting, workflows, standard operating procedures, knowledgesets – all of which are automatically digitized and stored. But for specific datasets, identify who has responsibility for them and put in place processes to verify that the data is being collected as scheduled in the proper format.
Identify the data storage capabilities
It use to be that you try to identify what you think you will need based on your vision of the size and scope of your company. Put in place what you can afford now and add capacity and capabilities later when you need them.
Now, just go big to start. Cloud storage costs and additional services (like machine learning) are cheap to start even on a small scale and since pricing is based on use, you can start small and easily add storage and services as needed.
It is better to setup scalable infrastructure to start, which is quite inexpensive to do even for a consumer product startup company, so better to get that out the way rather than later face having to move to another system, which can very disruptive to the business.
Put redundancies in place so that data is backed up and backups are easy to access and restore.