Skip to main content

Three considerations to assess your data’s readiness for AI

Organizations are getting caught up in the hype cycle of AI and generative AI, but in so many cases, they don’t have the data foundation needed to execute AI projects. A third of executives think that less than 50% of their organization’s data is consumable, emphasizing the fact that many organizations aren’t prepared for AI. 

For this reason, it’s critical to lay the right groundwork before embarking on an AI initiative. As you assess your readiness, here are the primary considerations: 

  • Availability: Where is your data? 
  • Catalog: How will you document and harmonize your data?
  • Quality: Having good quality data is key to the success of your AI initiatives.

AI underscores the garbage in, garbage out problem: if you input data into the AI model that’s poor-quality, inaccurate or irrelevant, your output will be, too. These projects are far too involved and expensive, and the stakes are too high, to start off on the wrong data foot.

The importance of data for AI

Data is AI’s stock-in-trade; it is trained on data and then processes data for a designed purpose. When you’re planning to use AI to help solve a problem – even when using an existing large language model, such as a generative AI tool like ChatGPT   – you’ll need to feed it the right context for your business (i.e. good data,) to tailor the answers for your business context (e.g. for retrieval-augmented generation). It’s not simply a matter of dumping data into a model.

And if you’re building a new model, you have to know what data you’ll use to train it and validate it. That data needs to be separated out so you can train it against a dataset and then validate against a different dataset and determine if it’s working.

Challenges to establishing the right data foundation

For many companies, knowing where their data is and the availability of that data is the first big challenge. If you already have some level of understanding of your data – what data exists, what systems it exists in, what the rules are for that data and so on – that’s a good starting point. The fact is, though, that many companies don’t have this level of understanding.

Data isn’t always readily available; it may be residing in many systems and silos. Large companies in particular tend to have very complicated data landscapes. They don’t have a single, curated database where everything that the model needs is nicely organized in rows and columns where they can just retrieve it and use it. 

Another challenge is that the data is not just in many different systems but in many different formats. There are SQL databases, NoSQL databases, graph databases, data lakes, sometimes data can only be accessed via proprietary application APIs. There’s structured data, and there’s unstructured data. There’s some data sitting in files, and maybe some is coming from your factories’ sensors in real time, and so on. Depending on what industry you’re in, your data can come from a plethora of different systems and formats. Harmonizing that data is difficult; most organizations don’t have the tools or systems to do that.

Even if you can find your data and put it into one common format (canonical model) that the business understands, now you have to think about data quality. Data is messy; it may look fine from a distance, but when you take a closer look, this data has errors and duplications because you’re getting it from multiple systems and inconsistencies are inevitable. You can’t feed the AI with training data that is of low quality and expect high-quality results. 

How to lay the right foundation: Three steps to success

The first brick of the AI project’s foundation is understanding your data. You must have the ability to articulate what data your business is capturing, what systems it’s living in, how it’s physically implemented versus the business’s logical definition of it, what the business rules for it are..

Next, you must be able to evaluate your data. That comes down to asking, “What does good data for my business mean?” You need a definition for what good quality looks like, and you need rules in place for validating and cleansing it, and a strategy for maintaining the quality over its lifecycle.

If you’re able to get the data in a canonical model from heterogeneous systems and you wrangle with it to improve the quality, you still have to address scalability. This is the third foundational step. Many models require a lot of data to train them; you also need lots of data for retrieval-augmented generation, which is a technique for enhancing generative AI models using information obtained from external sources that weren’t included in training the model.  And all of this data is continuously changing and evolving.

You need a methodology for how to create the right data pipeline that scales to handle the load and volume of the data you might feed into it. Initially, you’re so bogged down by figuring out where to get the data from, how to clean it and so on that you might not have fully thought through how challenging it will be when you try to scale it with continuously evolving data. So, you have to consider what platform you’re using to build this project so that that platform is able to then scale up to the volume of data that you’ll bring into it.

Creating the environment for trustworthy data

When working on an AI project, treating data as an afterthought is a sure recipe for poor business outcomes. Anyone who is serious about building and sustaining a business edge by developing and using  AI must start with the data first. The complexity and the challenge of cataloging and readying the data to be used for business purposes is a huge concern, especially because time is of the essence. That’s why you don’t have time to do it wrong; a platform and methodology that help you maintain high-quality data is foundational. Understand and evaluate your data, then plan for scalability, and you will be on your way to better business outcomes.

The post Three considerations to assess your data’s readiness for AI appeared first on SD Times.



from SD Times https://ift.tt/nrjw0pW

Comments

Popular posts from this blog

Difference between Web Designer and Web Developer Neeraj Mishra The Crazy Programmer

Have you ever wondered about the distinctions between web developers’ and web designers’ duties and obligations? You’re not alone! Many people have trouble distinguishing between these two. Although they collaborate to publish new websites on the internet, web developers and web designers play very different roles. To put these job possibilities into perspective, consider the construction of a house. To create a vision for the house, including the visual components, the space planning and layout, the materials, and the overall appearance and sense of the space, you need an architect. That said, to translate an idea into a building, you need construction professionals to take those architectural drawings and put them into practice. Image Source In a similar vein, web development and design work together to create websites. Let’s examine the major responsibilities and distinctions between web developers and web designers. Let’s get going, shall we? What Does a Web Designer Do?

A guide to data integration tools

CData Software is a leader in data access and connectivity solutions. It specializes in the development of data drivers and data access technologies for real-time access to online or on-premise applications, databases and web APIs. The company is focused on bringing data connectivity capabilities natively into tools organizations already use. It also features ETL/ELT solutions, enterprise connectors, and data visualization. Matillion ’s data transformation software empowers customers to extract data from a wide number of sources, load it into their chosen cloud data warehouse (CDW) and transform that data from its siloed source state, into analytics-ready insights – prepared for advanced analytics, machine learning, and artificial intelligence use cases. Only Matillion is purpose-built for Snowflake, Amazon Redshift, Google BigQuery, and Microsoft Azure, enabling businesses to achieve new levels of simplicity, speed, scale, and savings. Trusted by companies of all sizes to meet

2022: The year of hybrid work

Remote work was once considered a luxury to many, but in 2020, it became a necessity for a large portion of the workforce, as the scary and unknown COVID-19 virus sickened and even took the lives of so many people around the world.  Some workers were able to thrive in a remote setting, while others felt isolated and struggled to keep up a balance between their work and home lives. Last year saw the availability of life-saving vaccines, so companies were able to start having the conversation about what to do next. Should they keep everyone remote? Should they go back to working in the office full time? Or should they do something in between? Enter hybrid work, which offers a mix of the two. A Fall 2021 study conducted by Google revealed that over 75% of survey respondents expect hybrid work to become a standard practice within their organization within the next three years.  Thus, two years after the world abruptly shifted to widespread adoption of remote work, we are declaring 20