Summary
Data governance in data engineering refers to a structured system of policies, practices, and processes that ensure data is managed effectively throughout its lifecycle. It encompasses data quality, security, access, and usability, ensuring data is reliable, consistent, and compliant with regulations.
Source: Gemini AI Overview
OnAir Post: Data Governance
About
Key Aspects
- Data Quality
Ensuring data is accurate, complete, and reliable. This involves establishing standards and controls for data collection, storage, and transformation.
- Data Security
Protecting sensitive data from unauthorized access and breaches. This includes implementing security measures, access controls, and monitoring data access.
- Data Access and Usage
Defining who can access what data and how it can be used. This involves creating access policies and procedures for data utilization.
- Compliance
Ensuring data handling practices adhere to relevant regulations and legal requirements.
- Metadata Management
Maintaining information about data, including its source, transformations, and lineage.
- Data Stewardship
Assigning responsibility for specific data assets to individuals or teams.
- Data Management
Implementing processes and technologies for data storage, processing, and retrieval.
- Risk Mitigation
Identifying and addressing potential risks associated with data management, such as data breaches or non-compliance.
- Collaboration and Communication
Ensuring effective communication and collaboration between data engineers, data scientists, and business stakeholders.
Source: Google Gemini Overview
Importance
- Improved data quality and reliability
Data governance helps ensure data is accurate, consistent, and trustworthy, leading to better decision-making.
- Enhanced security and compliance
Data governance practices help protect sensitive data and ensure compliance with regulations.
- Increased efficiency and productivity
By streamlining data management processes, data governance can improve efficiency and reduce the time it takes to access and use data.
- Reduced risks and costs
Effective data governance can help mitigate risks associated with data breaches, non-compliance, and poor data quality.
- Enabling data-driven decision-makingBy ensuring data is accessible, reliable, and trustworthy, data governance enables organizations to make better, more informed decisions.
Source: Google Gemini Overview
Challenges
Data governance faces several key challenges, including data quality issues, security and privacy concerns, the complexity of data ecosystems, and resistance to change. Other significant hurdles involve defining data ownership and accountability, ensuring consistency across data sources, and managing the volume and variety of data.
Overcoming these challenges requires a well-defined data governance strategy, strong leadership, effective communication, and a commitment to continuous improvement.
Initial Source for content: Gemini AI Overview
[Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on the key issues and challenges related to Data Governance in the “Comment” section below. Post curators will review your comments & content and decide where and how to include it in this section.]
1. Data Quality and Consistency
- Data quality issuesInaccurate, incomplete, or outdated data can lead to poor decision-making and reduced business value.
- Ensuring consistencyData needs to be consistent across different systems and departments, which can be difficult to achieve with disparate data sources.
- Managing data varietyModern organizations deal with data from various sources and formats, making it challenging to maintain consistency and quality.
2. Security and Privacy
- Data securityProtecting sensitive data from breaches and unauthorized access is crucial, especially with increasing regulatory requirements.
- Data privacyOrganizations must comply with regulations like GDPR and CCPA, ensuring data is handled responsibly and ethically.
3. Complexity of Data Ecosystems
- Data silosData stored in separate systems or managed by different teams can hinder collaboration and access.
- Complex data environmentsOrganizations deal with diverse data formats, legacy systems, and multiple data sources, making it challenging to govern all data effectively.
4. People-related Challenges
- Resistance to changeEmployees may resist new data governance policies and processes, requiring change management strategies.
- Lack of understandingEmployees may not fully understand their roles in data governance or the importance of data quality and security.
- Lack of leadershipStrong leadership support is crucial for successful data governance, including setting priorities and allocating resources.
5. Other Challenges
- Defining data ownership and accountabilityClearly defining who is responsible for specific datasets is essential for effective governance.
- ScalabilityData governance frameworks need to be scalable to accommodate the growing volume and complexity of data.
- Measuring successQuantifying the benefits of data governance can be difficult, making it challenging to justify ongoing investment.
- Resource constraintsImplementing and maintaining a data governance program requires resources and budget, which can be a challenge.
- Compliance with evolving regulationsRegulatory landscapes are constantly changing, requiring organizations to adapt their data governance practices.
- Managing ROT data (Redundant, Obsolete, and Trivial)Organizations need to identify and manage data that is no longer needed to reduce storage costs and risks.
Research
Research for Data Governance involves studying and developing frameworks, policies, and best practices for managing data within research contexts. It aims to ensure data quality, security, and compliance with relevant regulations while supporting the research lifecycle. This includes guiding researchers on data handling, sharing, and storage, as well as establishing processes for accountability and decision-making around data usage.
In essence, research data governance is about establishing a framework that ensures data is managed responsibly, ethically, and in a way that supports the advancement of knowledge.
Initial Source for content: Gemini AI Overview
[Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on innovative research related to Data Governance in the “Comment” section below. Post curators will review your comments & content and decide where and how to include it in this section.]
1. Key Aspects of Research Data Governance
- Data Quality
Ensuring data is accurate, complete, consistent, and reliable for research purposes.
- Data Security
Protecting research data from unauthorized access, use, disclosure, or modification.
- Data Compliance
Adhering to relevant laws, regulations, and ethical guidelines related to data handling and privacy.
- Data Management
Implementing procedures for data storage, access, sharing, and preservation throughout the research lifecycle.
- Accountability and Responsibility
Defining roles and responsibilities for managing data and ensuring that individuals are accountable for their actions related to data.
- Policy Development
Creating and implementing policies that guide data-related decisions and practices.
2. Why is it important?
- Ensures Data Integrity
Data governance helps maintain the accuracy and reliability of research data, which is crucial for drawing valid conclusions.
- Protects Sensitive Information
It helps safeguard sensitive data, such as Personally Identifiable Information (PII), from unauthorized access and misuse.
- Promotes Ethical Research Practices
It supports responsible data handling and promotes ethical research practices.
- Facilitates Collaboration
Well-defined data governance practices enable researchers to collaborate effectively and share data safely.
- Reduces RisksIt helps mitigate risks associated with data breaches, non-compliance, and reputational damage.
Projects
Recent and future data governance projects are heavily influenced by emerging technologies like AI and cloud computing, and by increasing regulatory pressures related to data privacy and ethics. Key areas of focus include integrating AI and machine learning into data management, ensuring data privacy and compliance with regulations, enhancing data quality, and managing data lineage and metadata. Additionally, there’s a growing emphasis on data democratization, real-time data governance, and the use of blockchain technology for enhanced security and transparency.
Initial Source for content: Gemini AI Overview
[Enter your questions, feedback & content (e.g. blog posts, Google Slide or Word docs, YouTube videos) on current and future projects implementing solutions to Data Governance challenges in the “Comment” section below. Post curators will review your comments & content and decide where and how to include it in this section.]
1. Recent Trends and Projects
- AI and Machine Learning Integration
AI and ML are being used to automate data governance tasks, improve data quality, and enhance data discovery and lineage tracking.
- Data Quality Management
Ensuring data accuracy, completeness, and consistency remains a top priority. This often involves implementing data quality rules, monitoring data quality metrics, and addressing data quality issues.
- Data Lineage and Metadata Management
Understanding where data comes from, how it’s transformed, and how it’s used is crucial for effective data governance. Organizations are investing in tools and technologies to track data lineage and manage metadata effectively.
- Cloud Computing
The adoption of cloud-based data solutions is driving the need for cloud-native data governance strategies.
- Data Democratization
Making data accessible and usable for a wider audience is a growing trend, often involving user-friendly tools and training programs.
- Real-time Data Governance
The need for real-time insights is driving the adoption of real-time data governance solutions.
- Data Mesh and Data Fabric Architectures
These architectures, focused on decentralized data ownership and self-service data access, rely on robust data governance for success.
2. Future Trends and Projects
- Autonomous Data Governance
As data volumes and complexity increase, organizations will need more automated and self-governing systems.
- Blockchain-based Data Governance
Blockchain technology offers potential for enhancing data security and transparency in certain applications.
- Data Governance as a Service (DGaaS)
Cloud-based solutions will make data governance more accessible and scalable.
- Data Ethics and Responsible AI
Organizations will need to address ethical considerations related to data usage, particularly in the context of AI and machine learning.
- Hybrid and Multi-Cloud Data Governance
Managing data across multiple cloud environments will become increasingly important.
- Data Literacy and Accessibility
Empowering employees with the skills and tools to understand and use data effectively will be a key focus.
- Data Productization
Treating data as a product with clear ownership and defined usage policies will become more common.
- Measuring the ROI of AI-driven Data Governance
Organizations will need to quantify the benefits of their investments in AI-powered data governance solutions.
- Unstructured Data Management
Solutions for unstructured data management will need to address AI data governance and monitoring needs.
- Interoperability Standards
Ensuring data can be easily shared and used across different systems and platforms will become more important.
- Privacy-Enhancing ComputationTechniques like federated learning will help organizations balance data privacy and utility.
Wikipedia
Contents
Part of a series on |
Governance |
---|
Data governance is a term used on both a macro and a micro level. The former is a political concept and forms part of international relations and Internet governance; the latter is a data management concept and forms part of corporate/organisational data governance.
Data governance involves delegating authority over data and exercising that authority through decision-making processes.[1] It plays a crucial role in enhancing the value of data assets.[2]
Macro level
Data governance at the macro level involves regulating cross-border data flows among countries, which is more precisely termed international data governance. This field formed in the early 2000s[3] and consists of "norms, principles and rules governing various types of data." [4]
There have been several international groups established by research organizations that aim to grant access to their data. These groups that enable an exchange of data are, as a result, exposed to domestic and international legal interpretations that ultimately decide how data is used. However, as of 2023, there are no international laws or agreements specifically focused on data protection.[5]
Micro level
Data governance in an organisational/corporate sense, is the capability that enables an organization to manage data effectively, securely and responsibly. Data governance is the policies, processes, roles, responsibilities and technologies in place to ensure that the right entities have access to accurate, complete, high quality data. The key focus areas of data governance include availability, usability, consistency, data integrity and security, and standards compliance. The practice also includes establishing processes to ensure effective data management throughout the enterprise, such as accountability for the adverse effects of poor data quality, and ensuring that the data which an enterprise has can be utilized by the entire organization.
A data steward is a role that ensures that data governance processes are followed and that guidelines are enforced, and recommends improvements to data governance processes.
Data governance involves the coordination of people, processes, and information technology necessary to ensure consistent and proper management of an organization's data across the business enterprise. It provides all data management practices with the necessary foundation, strategy, and structure needed to ensure that data is managed as an asset and transformed into meaningful information. Goals may be defined at all levels of the enterprise and doing so may aid in acceptance of processes by those who will use them. Some goals include:
- Increasing consistency and confidence in decision making
- Decreasing the risk of regulatory fines
- Improving data security
- Defining and verifying the requirements for data distribution policies[6]
- Maximizing the income generation potential of data
- Designating accountability for information quality
- Enabling better planning by supervisory staff
- Minimizing or eliminating re-work
- Optimizing staff effectiveness
- Establishing process performance baselines to enable improvement efforts
- Acknowledging and holding all gain
These goals are realized by the implementation of data governance programs, or initiatives using change management techniques.
When companies seek to take charge of their data, whether by choice or necessity, they empower their employees, establish processes, and utilize technology to accomplish this objective.[7]
Data governance drivers
While data governance initiatives can be driven by a desire to improve data quality, they are often driven by C-level leaders responding to external regulations. In a recent report conducted by CIO WaterCooler community, 54% stated the key driver was efficiencies in processes; 39% - regulatory requirements; and only 7% customer service.[8] Examples of these regulations include Sarbanes–Oxley Act, Basel I, Basel II, HIPAA, GDPR, cGMP,[9] and a number of data privacy regulations. To achieve compliance with these regulations, business processes and controls require formal management processes to govern the data subject to these regulations.[10] Successful programs identify drivers meaningful to both supervisory and executive leadership.
Common themes among the external regulations center on the need to manage risk. The risks can be financial misstatement, inadvertent release of sensitive data, or poor data quality for key decisions. Methods to manage these risks vary from industry to industry. Examples of commonly referenced best practices and guidelines include COBIT, ISO/IEC 38500, and others. The proliferation of regulations and standards creates challenges for data governance professionals, particularly when multiple regulations overlap the data being managed. Organizations often launch data governance initiatives to address these challenges.
Data governance initiatives (Dimensions)
Data governance initiatives improve quality of data by assigning a team responsible for data's accuracy, completeness, consistency, timeliness, validity, and uniqueness.[11] This team usually consists of executive leadership, project management, line-of-business managers, and data stewards. The team usually employs some form of methodology for tracking and improving enterprise data, such as Six Sigma, and tools for data mapping, profiling, cleansing, and monitoring data.
Data governance initiatives may be aimed at achieving a number of objectives including offering better visibility to internal and external customers (such as supply chain management), compliance with regulatory law, improving operations after rapid company growth or corporate mergers, or to aid the efficiency of enterprise knowledge workers by reducing confusion and error and increasing their scope of knowledge.[citation needed] Many data governance initiatives are also inspired by past attempts to fix information quality at the departmental level, leading to incongruent and redundant data quality processes. Most large companies have many applications and databases that can not easily share information. Therefore, knowledge workers within large organizations often do not have access to the data they need to best do their jobs. When they do have access to the data, the data quality may be poor. By setting up a data governance practice or corporate data authority (individual or area responsible for determining how to proceed, in the best interest of the business, when a data issue arises), these problems can be mitigated.
Implementation
Implementation of a data governance initiative may vary in scope as well as origin. Sometimes, an executive mandate will arise to initiate an enterprise wide effort. Sometimes the mandate will be to create a pilot project or projects, limited in scope and objectives, aimed at either resolving existing issues or demonstrating value. Sometimes an initiative will originate lower down in the organization's hierarchy and will be deployed in a limited scope to demonstrate value to potential sponsors higher up in the organization. The initial scope of an implementation can vary greatly as well, from review of a one-off IT system, to a cross-organization initiative.
Data governance tools
Leaders of successful data governance programs declared at the Data Governance Conference in Orlando, FL, in December 2006 that data governance is about 80 to 95 percent communication.[12] That stated, it is a given that many of the objectives of a data governance program must be accomplished with appropriate tools. Many vendors are now positioning their products as data governance tools. Due to the different focus areas of various data governance initiatives, a given tool may or may not be appropriate. Additionally, many tools that are not marketed as governance tools address governance needs and demands.
See also
- Asset Description Metadata Schema
- Basel II
- Business semantics management
- COBIT
- Corporate governance of information technology
- Data Protection Directive (EU)
- Data sovereignty
- Health Insurance Portability and Accountability Act
- Information architecture
- Information governance
- Information technology controls
- ISO/IEC 38500
- ISO/TC 215
- List of datasets for machine-learning research
- Master data management
- Operational risk management
- Sarbanes–Oxley Act
- Semantics of Business Vocabulary and Business Rules
- Simulation governance
- Universal Data Element Framework
References
- ^ Janssen, Marijn; Brous, Paul; Estevez, Elsa; Barbosa, Luis S.; Janowski, Tomasz (2020-07-01). "Data governance: Organizing data for trustworthy Artificial Intelligence". Government Information Quarterly. 37 (3): 101493. doi:10.1016/j.giq.2020.101493. hdl:1822/69192. ISSN 0740-624X.
- ^ Abraham, Rene; Schneider, Johannes; vom Brocke, Jan (2019-12-01). "Data governance: A conceptual framework, structured review, and research agenda". International Journal of Information Management. 49: 424–438. doi:10.1016/j.ijinfomgt.2019.07.008. ISSN 0268-4012.
- ^ "The Evolution of Data Governance". DATAVERSITY. 18 May 2020. Retrieved 2024-11-14.
- ^ "FAQ". Digital Trade and Data Governance Hub. Retrieved 2023-02-20.
- ^ Bernier, Alexander; Molnár-Gábor, Fruzina; Knoppers, Bartha Maria (2022). "The international data governance landscape". Journal of Law and the Biosciences. 9 (1). Oxford University Press: lsac005. doi:10.1093/jlb/lsac005. PMC 8977111. PMID 35382430.
- ^ Gianni, Daniele (2014). "Data Policy Definition and Verification for System of Systems Governance". Modeling and Simulation Support for System of Systems Engineering Applications. pp. 99–130. doi:10.1002/9781118501757.ch5. ISBN 9781118460313.
- ^ Sarsfield, Steve (2009). The Data Governance Imperative. IT Governance Publishing. ISBN 9781849281102.
- ^ Warburton, Daniel (2017-03-15). "The Data Governance Report 2017 – Your Copy". CIOWaterCooler.co.uk. Retrieved 2023-02-20.
- ^ "eCFR — Code of Federal Regulations". eCFR.gov. Retrieved 2023-02-20.
- ^ "Rimes Data Governance Handbook". RIMES. 2013-10-16. Archived from the original on 2016-03-05. Retrieved 2023-02-20.
- ^ Dai, Wei; Wardlaw, Isaac (2016). "Data Profiling Technology of Data Governance Regarding Big Data: Review and Rethinking". Information Technology, New Generations. Advances in Intelligent Systems and Computing. Vol. 448. pp. 439–450. doi:10.1007/978-3-319-32467-8_39. ISBN 978-3-319-32466-1.
- ^ Hopwood, Peter (June 2008). "Data Governance: One Size Does Not Fit All". DM Review Magazine. Archived from the original on 2008-09-28. Retrieved 2023-02-20.
At the inaugural Data Governance Conference in Orlando, Florida, in December 2006, leaders of successful data governance programs declared that in their experience, data governance is between 80 and 95 percent communication. Clearly, data governance is not a typical IT project.
Further reading
- Sargiotis, Dimitrios (2024). DATA GOVERNANCE : A Guide. SPRINGER INTERNATIONAL. doi:10.1007/978-3-031-67268-2. ISBN 9783031672675. OCLC 1442731250.
- Reichental, Jonathan (2023). Data Governance for Dummies. Hoboken, NJ: John Wiley & Sons, Inc. ISBN 9781119906773. OCLC 1356475580.
- Stedman, Craig. "What is data governance and why does it matter?". www.techtarget.com. Retrieved 12 December 2024.