Learn how to improve the governance of unstructured data. Keep data organized for better use, and ensure the effective and efficient use of information.
Businesses have to govern their data to keep it clean and organized for better use. They may focus on data governance for their systems of record and structured data, but what about big, unstructured data like photos, videos, digitized hardcopy documents and continuous text messages from social media?
To improve unstructured data governance, businesses need to take several proactive steps, including using trusted sources and establishing guidelines for user access. However, there are some limitations that may hinder the effective governance of unstructured data.
Jump to:
Due to its nature and the complexities involved in ensuring its quality, security and compliance, there are several challenges to big data governance of unstructured data:
SEE: Hiring Kit: Database engineer (TechRepublic Premium)
So, how can we improve the governance of unstructured data that now comprises roughly 80% of corporate data under management? Here are five ways to tackle the problem in the enterprise.
The data that organizations have directly created and accumulated is trusted, but most organizations also acquire data from outside cloud sources as they build an aggregated data repository for analytics.
How do you know that data from these outside sources is trustworthy? You don’t — unless you vet the data provider, understand where the provider has gotten its data, and know how the provider has prepared and secured the data. For example, if you’re in a sensitive industry such as healthcare, you’ll want to know that data on individual patients has been anonymized to meet privacy requirements.
SEE: Learn how to improve your data strategy.
Checking vendor governance standards to ensure they align with your own should be a routine task performed before any contract is entered into. Prior to signing a contract, you should also request the vendor’s latest IT audit so recent governance and security performance can be reviewed.
System of record, structured data, has firm rules in place for user access and permissions, but unstructured data may not. Unstructured data access should play by the same rules that structured data does.
In other words, access to unstructured data should be limited to those users who require the data. Within the category of access, there are likely to be levels of permission, with some users getting more access to data than others, depending on job function or role.
These user access decisions should be made between IT and end-user departments. There should be reviews annually, at a minimum, and procedures should be in place so that if an individual leaves the company, access is immediately removed as part of the separation process.
The basics of data security are trusted networks; strong user access methods and monitoring; perimeter monitoring that checks for vulnerabilities and potential breaches; and user habits that align with security best practices (such as not sharing passwords or not copying data to thumb drives that can be carried away). If data is stored on hardware at the edge of the enterprise, that hardware should be physically caged and secured when possible, where only those authorized can gain access.
Most of these standards and practices are in place with structured data but not necessarily with data that is unstructured, such as Internet of Things data.
Unstructured data should be governed by the same levels of security guidelines and practices that its structured counterpart is.
Robust logging and traceability software should be continuously at work where big data is concerned. Who or what is accessing the data? When and from where is the data being accessed? If there is an issue that arises, what event initiated the issue?
Logging, tracing and (in the future) observability all decrease the time spent to resolve the problem and are integral to security.
As an upfront data cleaning practice, bad data should be eliminated as raw and incoming big data streams in. There is a lot of bad big data, whether it’s documents that aren’t needed, IoT streams that contain as many device handshakes as salient information or superfluous social media threads.
SEE: Discover the differences between data governance and data management.
The data preparation process that’s part of data ingestion should eliminate this data so it never takes up real estate in storage. Big data repositories should also be regularly refreshed and revisited, and data that’s no longer needed discarded.
Unstructured data, compared to structured data, is usually very complex to process and analyze for insights, which is one of the reasons why it’s not often used for business intelligence. AI technologies can make the process of indexing, tracking, mining, analyzing and deriving insights from unstructured data more efficient. AI-enabled tools offer several capabilities that can handle information not organized in a predefined manner:
When shopping for a data governance solution, it’s best to select a tool that aligns with governance practices for unstructured data. Such a tool will help you enforce consistent standards throughout your organization. It will promote adherence to industry regulations and data protection laws and offer data quality assurance, which will give your data long-term value.
Remember that when it comes to data governance, there is no one-size-fits-all solution. The best data governance tool for your business depends on your data needs and preferences.