Data Governance

Data Governance is a strategic framework that ensures data is aligned with an organization's goals and objectives.

Data Governance is about the why and what of data management.

Data Governance is the cornerstone of managing and leveraging organizational data assets strategically. Before you define the architectures for corralling your data chaos, you must understand the constraints that are inherent in your organization.

Data Governance is not just a technical endeavor. It involves establishing policies, processes, and controls to ensure data quality, security, and compliance. A well-implemented data governance framework not only mitigates risks but also enhances decision-making capabilities. It creates a culture of data accountability, transparency, and collaboration across the enterprise, ultimately driving innovation and business growth. Finally, it informs the tactical approach of “how” data will be operationalized and deliver value to the business, which is the heart of Data Management.

Dimensions of Data Governance

Governance Structure

Governance Structure describes how data governance is organized. Who is involved, the decision-making process, what roles have been defined, and how people collaborate. Learn More

Policy Development

Policies are a framework of rules, guidelines, and standards that govern the collection, use, storage, and sharing of data within an organization. Learn More

Data Compliance

Data Compliance is the act of handling and managing sensitive data in a way that adheres to regulatory requirements, industry standards, and internal policies involving data security and privacy. Learn More

Data Quality

Data Quality describes the degree of business and consumer confidence in data’s usefulness based on agreed-upon business requirements. These expectations evolve based on changing contexts in the marketplace. Learn More

It is very important to define upfront how Data Governance is organized and then revisit this on a periodic basis to make sure the defined model is fit for purpose. While there is no one-size-fits-all solution, there are some best practices and roles that need to be defined. Data Governance Institute provides one useful framework and a good diagram for further reading. Some of these roles are:

Data Owners (business department heads) shall:

  • Be responsible for defining the sensitivity level of data assets

  • Ensure accurate classification based on established criteria

  • Authorize access controls based on business needs and data sensitivity

  • Collaborate with relevant stakeholders to determine appropriate data classification

  • Document the rationale behind classification decisions

  • Regularly review data classifications to align with business needs and regulations

Data Stewards (IT administrators, security analysts, and compliance officers) shall:

  • Manage and protect data assets according to their classification level

  • Implement and enforce security measures to safeguard data integrity, confidentiality, and availability

  • Ensure appropriate technical controls are in place to prevent unauthorized access, disclosure, or modification

  • Oversee data storage, transmission, and disposal processes in compliance with policies and regulations

  • Collaborate with data owners to implement security controls consistent with classification levels

Data Users (all employees, (including data scientists), contractors, vendors, and third-party entities who access or handle data within their roles) shall:

  • Adhere to established classification guidelines and security protocols

  • Exercise due diligence in handling sensitive information

  • Apply appropriate security measures when accessing, processing, transmitting, or disposing of data assets

  • Report any security incidents or breaches promptly

  • Undergo regular training and awareness programs to ensure compliance with data classification policies and best practices in data security management

Data Advisors (specialist knowledge workers) shall:

  • Support Committee objectives

  • Offer expert advice on specific technical issues as requested

  • Ensure that policies comply with relevant standards and regulations

  • Identify risks that the committee should address

  • Provide relevant samples and solutions as needed

  • Make the committee aware of recent or upcoming impactful regulatory and legislative changes

It is also important to define who chairs the Data Governance Committee. Often, it is a senior executive in the role of CDO (Chief Data Officer), but, sometimes, it is a rotating role played by members of the Data Governance Committee.

Governance Structure

Polices are a framework of rules, guidelines, and standards that govern the collection, use, storage, and sharing of data within an organization. These policies ensure that data is handled in a way that aligns with the organization's strategic objectives, complies with relevant regulations, and protects the privacy and security of individuals. Policies that are not enforced are not of any value.

Policies are never one-and-done. Your organization changes. Data rules change over time. So, policies are living documents as well. That means you must review and update them on a regular basis.

Policies are of no use if nobody knows about them. It is very important to have a mechanism for communicating policies regularly, conducting training, and generally making them easily available for reference.

Policy Development

Data compliance is the act of handling and managing sensitive data in a way that adheres to regulatory requirements, industry standards, and internal policies. Its focus is on classifying the data that you store so that you understand the lengths required to keep it secure and private. While data compliance has always been important, it is increasingly an area of focus in today’s evolving regulatory climate.

General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) are just two of the standards that are likely to impact your business. The landscape continues to evolve with the California Consumer Privacy Act (CCPA). More regulations are coming, making any and all Personally Identifiable Information (PII) important data that must be protected from exposure.

Non-compliance not only exposes your organization to increased cybersecurity risks, but the costs related to fines, legal penalties, and reputational damage cannot be overstated.

The first step is agreeing to your classification scheme. NIST IR 8496 is a place to start. You need to classify all of your data and begin creating policies related to the capture, use, and storage of that data through its Data Lifecycle. And then the real work starts to make sure that those policies are enforced. The overall process must include training and compliance audits.

Data Compliance

Data Quality (DQ) describes the degree of business and consumer confidence in data’s usefulness based on agreed-upon business requirements. These expectations evolve based on changing contexts in the marketplace. There are many dimensions to Data Quality, but we focus on the few most important ones:

Data Accuracy

Data accuracy is the degree to which data is correct, precise, and free of errors. It's a key aspect of data quality and is essential for making reliable decisions. It is also critical in AI/ML applications. Training a model on data that is not accurate can lead to a model that makes predictions with the accuracy of the deck of Tarot cards.

Data Completeness

Data completeness is the extent to which all required and expected data elements are present within a dataset, ensuring that no essential information is missing. It is crucial for accurate insights, as missing information can lead to incomplete analyses and flawed decision-making.

Data Consistency

While it very much depends on the nature of the organization and the underlying architecture, there should be a high degree of awareness about the consistency of data between different systems / applications / data stores. For example, information about a customer should be consistent across all instances of systems that are working with customer data. The examples could be - customer ID is consistent across all systems, customer address, and customer contact information. Lack of data consistency can lead to a low level of trust in the data and other issues.

Data Timeliness

Data timeliness is the degree to which data is up-to-date and available at the required time for its intended use. This is important for enabling businesses to make quick and accurate decisions based on the most current information available. Data timeliness affects data quality by determining the reliability and usefulness of the company’s information. Ideally, metrics such as data freshness, data latency, data accessibility, time-to-insight are defined, benchmarks suitable for the nature of the business are agreed, and tools and procedures exist to collect metrics, compare against benchmarks, and take action.

Data Uniqueness

Data uniqueness is important because it can lead to biased results if the data isn't unique. For example, if there are duplicates, the average, mode, and median can be skewed. It can also lead to issues in decision-making or operating processes.

The uniqueness of data records is the degree to which data records occur only once in a data file.

The uniqueness of the mapping of an object in a data file is the degree to which objects (in the real world) occur as one data record in a data file.

The uniqueness of primary keys is the degree to which primary keys occur only once in a data file.

Data Validity

Data Validity is a measure of how accurate and reliable information is. It ensures that data is trustworthy and suitable for its intended purpose by verifying that it meets specific standards, rules, and constraints. There are various methods to ensure data validity, such as validation, review, comparing, consistency checking, etc.

Data Quality

Would you like to learn more?