Skip to main content

Active metadata management and the rise of intelligent data architecture platforms

Today, organizations cannot afford to wait for data insights, as they need to focus on meeting business needs and delivering results at the speed of decision-making. However, many data professionals have been overly focused on technology, which can lead to suboptimal and costly choices. To address this, many are adopting a business-outcome-first mindset . However, this shift necessitates not only a different thought process, but also a fresh technology slant. A new alternative, called an “Intelligent Data Architecture Platform” (IDAP), is an approach that accomplishes this by unifying data and metadata, resulting in the faster development of data products.

As an intelligent data orchestrator, IDAP utilizes machine learning  and undergirds the metadata collection and discovery needed to perform the required tasks. Here, the metadata powers the automation and orchestration backplane, creating a unified engine that enables data and business teams to build and manage data products in a collaborative manner. Taking it one step further is a process known as active metadata management (AMM). Unlike traditional metadata management, AMM analyzes metadata and delivers timely alerts and recommendations for addressing issues like data pipeline failures and schema drifts as needed. This proactive approach also ensures a healthy and updated modern data stack.

More specifically, IDAP includes the following components that work together:

  • Ingestion and Profiling: Data ingestion is the process of importing or receiving data from various sources into a target system or database for storage, processing, and analysis. This involves extracting data from source systems, transforming it into a usable format, and loading it into the target system – and is a critical step in creating a reliable and efficient data pipeline. Some data is ingested in batch mode using data movement options like secure FTP, and some sources allow real-time ingestion using pub/sub mechanisms like Apache Kafka or APIs. The IDAP needs to not only manage varying frequencies on when to ingest the data, but also discover its schema and handle changes, like schema drift. Once done, data from operational and transaction sources is loaded into a data warehouse or a data lake where it is then integrated and modeled for consumption by downstream systems and data consumers. However, before this data can be used intelligently, it needs to be profiled.

Conventional systems have provided mechanisms to profile ingested data and extract technical metadata, such as column statistics, schema information and basic data quality attributes, like completeness, uniqueness, missing values to create technical metadata, etc. IDAP does this too, but also uses ML to build a knowledge graph, so it can infer relations and data quality rules. The approach also helps generate operational metadata, which is information on how and when data was created or transformed.

Traditionally, activating metadata was seen as a static resource, created and stored alongside the data it describes. However, with the increasing complexity and volume of data in modern systems, active metadata management has become essential. It involves treating metadata as a dynamic and valuable asset that can be actively leveraged for various purposes. IDAP activates the metadata so it can travel across modern data tool stacks and actively manage all data workloads. IDAP uses metadata analysis to provide recommendations to data engineers so they can effectively manage data pipelines, alert data quality issues to increase productivity, and ensure good data delivery to data consumers.

  • Curation: Data curation involves the selection, organization, and maintenance of data to ensure its accuracy, reliability, and usefulness for analysis and decision-making. It involves activities such as data cleansing, transformation, and enrichment, as well as metadata creation and documentation. Effective data curation is essential to normalize, standardize, and harmonize datasets to deliver successful data-driven projects.

To speed up business-led data product development, the technical metadata — which is composed of technical column names — is converted into business-friendly terms to create business metadata. In this step, the business metadata is linked to technical metadata and added to the business glossary.

  • Data Quality: Embedding quality checks into data pipelines addresses data inaccuracy, duplication, and inconsistency. By offering this capability, IDAP delivers exceptional data products while enhancing the reliability of data for organizations.
  • Transformation/Testing: This is designed to provide an excellent developer experience to help boost productivity. Here, a collaborative workspace is utilized to develop and deploy code as the IDAP borrows best practices from software engineering of agile and lean development, including reusability of the data transformation code. 

Additionally, it uses a no/low code transformation engine that can be built-in to the IDAP or integrated with an existing engine to speed up development. Finally, it applies key components of the DevOps philosophy such as continuous testing and automation to data management. The described discipline is called DataOps, and it is fast maturing.

  • Continuous Development and Deployment: DataOps best practices are utilized in deployment to push the code into production in a governed and secure manner. This allows business users to accelerate experimentation by branching and testing new features without introducing breaking changes into the production pipelines. Features can also be rolled back quickly if needed. Finally, the IDAP introduces the much-needed A/B testing capabilities into the development of data products.
  • Observability: IDAP uses ML to detect anomalies and has an alerting and notification engine to escalate critical issues. Traditional systems were rule-based and led to a large number of notifications causing “alert fatigue”. Modern observability systems leverage ML to detect anomalies and have an alerting and notification engine to escalate critical issues. The process allows the business to proactively determine anomalies to avoid downtime, while also handling notifications intelligently to reduce the overload.

Building Better Business Value Begins by Being “Business Led”

The future belongs to organizations that are led by business outcomes, rather than being driven by technology. These companies are laser-focused on delivering business value at all times and have an urgency to transform fast, quickly stand-up analytics use cases, and continuously innovate. However, this often requires adopting a hybrid approach that integrates the best of centralized infrastructure with domain-driven data product development.  It also needs to lead with the user experiences/needs in mind. As a result, this method helps deliver results faster and aligns well with organizational culture and skills, creating solutions with more value to clients/customers.

Partners who provide an integrated platform that supports active metadata management save their customers time and money while also delivering trusted business outcomes. The time saving comes from avoiding the need to integrate several technologies and by making the business significantly more efficient. For example, organizations can easily measure the benefits such as the ratio of successful projects, deployed use cases, and the frequency of new releases resulting in a higher trust in data. They can also leverage the approach to create economies of scale and to avoid unnecessary downtime.

Finally, these products gain from economies of scale, and like an ML model gets better by retraining itself frequently, so do these cloud-native multi-tenant data frameworks. By flipping the focus from technology to outcomes, organizations that consider IDAP are finally achieving the aspirational goal of becoming truly data driven.

 

The post Active metadata management and the rise of intelligent data architecture platforms appeared first on SD Times.



from SD Times https://ift.tt/OVmHgxQ

Comments

Popular posts from this blog

Difference between Web Designer and Web Developer Neeraj Mishra The Crazy Programmer

Have you ever wondered about the distinctions between web developers’ and web designers’ duties and obligations? You’re not alone! Many people have trouble distinguishing between these two. Although they collaborate to publish new websites on the internet, web developers and web designers play very different roles. To put these job possibilities into perspective, consider the construction of a house. To create a vision for the house, including the visual components, the space planning and layout, the materials, and the overall appearance and sense of the space, you need an architect. That said, to translate an idea into a building, you need construction professionals to take those architectural drawings and put them into practice. Image Source In a similar vein, web development and design work together to create websites. Let’s examine the major responsibilities and distinctions between web developers and web designers. Let’s get going, shall we? What Does a Web Designer Do?

A guide to data integration tools

CData Software is a leader in data access and connectivity solutions. It specializes in the development of data drivers and data access technologies for real-time access to online or on-premise applications, databases and web APIs. The company is focused on bringing data connectivity capabilities natively into tools organizations already use. It also features ETL/ELT solutions, enterprise connectors, and data visualization. Matillion ’s data transformation software empowers customers to extract data from a wide number of sources, load it into their chosen cloud data warehouse (CDW) and transform that data from its siloed source state, into analytics-ready insights – prepared for advanced analytics, machine learning, and artificial intelligence use cases. Only Matillion is purpose-built for Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Azure, enabling businesses to achieve new levels of simplicity, speed, scale, and savings. Trusted by companies of all sizes to meet

2022: The year of hybrid work

Remote work was once considered a luxury to many, but in 2020, it became a necessity for a large portion of the workforce, as the scary and unknown COVID-19 virus sickened and even took the lives of so many people around the world.  Some workers were able to thrive in a remote setting, while others felt isolated and struggled to keep up a balance between their work and home lives. Last year saw the availability of life-saving vaccines, so companies were able to start having the conversation about what to do next. Should they keep everyone remote? Should they go back to working in the office full time? Or should they do something in between? Enter hybrid work, which offers a mix of the two. A Fall 2021 study conducted by Google revealed that over 75% of survey respondents expect hybrid work to become a standard practice within their organization within the next three years.  Thus, two years after the world abruptly shifted to widespread adoption of remote work, we are declaring 20