By , Senior Systems Engineer
Before I begin, let me assure you that dark data has nothing to do with evil forces or anything of the like, so you may now put down your holy water and call off the IT exorcist. Dark data is any data asset that an organization stores, but never uses. This data has the potential to provide insights, value and a competitive advantage that could improve an organization’s processes and outcomes. In other words, dark data is data that an organization does not currently analyze. Some of this data is stored purely to meet compliances and some is stored as a result of daily operations. The former includes data such as…
- Indexed documents
- Support ticket history
- Customer interactions history, etc.
Data in the latter category includes things like…
- Log files
- Event logs
- IoT device data, etc.
Now that you know (hopefully) what dark data is, you may be asking yourself, okay so…what is the big deal about dark data??
“Data is the New Natural Resource”
Technically, this isn’t true since a natural resource is defined as a naturally-occurring source of wealth, and data does not occur naturally, BUT it can prove to be a source of wealth. Advances in data analytics have given birth to a huge set of tools and methodologies that have potential to discover insights, learn patterns and predict outcomes. And, with significant growth in computing power, artificial intelligence systems can learn from data and act in ways that revolutionize entire industries.
Ways to Leverage Dark Data
For example, organizations can use data from emails to train a natural language processing model to automatically extract and act on certain information. By determining the tone of an email from a customer, the model can prioritize a response if the customer’s tone sounds agitated.
An organization could use data in the form of images to train an AI neural network model to automatically classify incoming images and invoke specific workflows for each category. Think about an auto insurance company that receives claims with pictures where they first need to classify the damage then assess it.
A business that works with data in the form of videos could train a neural network to detect when an activity is taking place at any given time. Think about security officers with dozens of surveillance camera screens to monitor. The model can notify them of suspicious activity real-time as it happens, instead of constantly having them watch the live footage.
Some dark data has no direct use other than accruing storage costs. Worse yet, it can expose an organization to legal challenges if data that is currently stored is not supposed to be. Okay, maybe we should get our holy water and call the IT exorcist here! Determining that your dark data is not useful, costly and possibly a liability is an insight in and of itself.
So, after reading about the possibilities (and repercussions) that dark data holds, why is dark data still a thing? Why do so many organizations have so much of it?
- They’re unaware of the power their data holds
- They’re unaware that they even have dark data
- There’s a lack of resources (money, time, human capital [data scientists and AI experts], knowledge) to investigate the data they’re storing and determine its potential value
The number of use cases for analytics and AI grows by the day as people devise creative ways to harvest it and as more data is generated. Data is undoubtedly the fuel for machine learning, deep learning sub-components and artificial intelligence. Not being aware of the data you already have and not putting it use may be a missed opportunity for your organization. So, don’t let your data live in the dark! Shed some light on it and you may be surprised at the power it holds.
To stay up to date on advances in technology and how to better-leverage dark data, subscribe to our blog.
Senior Systems Engineer
About the Author: As a Senior Software Engineer at Pyramid Solutions, I have a passion for data and artificial intelligence. I’m currently pursuing a master’s degree in computer science with a specialization in Machine Learning.