Preparing content for GenAI | Content Rules | hand at a screen

Generative AI (GAI) is one of the latest and most difficult AI frontiers to be perfected. It is said that natural language processing is the most complex of all AI tasks – far more difficult than visual recognition and robotic movement. Human language, with all of its rules, exceptions, and nuances is a complex task. Processing it is one thing, creating it in the form of writing or speaking is even more challenging.

GAI is exciting and terrifying, all at the same time. There are so many reasons to be excited. Think of all the amazing things GAI is going to provide. AI’s ability to parse through absolutely enormous volumes of data to find correlations and disconnects is mindboggling. When we combine three things:

  • Huge amounts of data
  • Huge amounts of processing power via GPUs
  • Advanced natural language generation

We can expect answers to questions that we have never before considered. The potential for AI to do good in the world is tantalizing.

READ MORE: 6 reasons you’re not ready for AI

That said, when AI goes wrong, it can go very, very wrong. For example, if you train your AI engine with incorrect information, it will produce incorrect results. GAI will write the most convincing incorrect answer you can imagine. Most of the time, GAI won’t second guess itself. It will find the information, put some nice wording around it, and present it to you as gospel.

So, if your company is considering implementing AI now or in the future, you must clean up your content, before you do anything.

You need to go through your corpus and curate it:

  • Delete information that is out of date
  • Correct information that is inaccurate

If you don’t, you will not be able to guarantee the results that AI or GAI produce. You can curate your content before or after the AI system ingests it, but it must be done. This is a step that you cannot skip.

So many companies are thinking they can bypass structured content, bypass content cleanup, and just leapfrog directly to AI. This is magical thinking at best and a recipe for disaster at worst.

Your AI system doesn’t ‘know’ anything until you train it. And then, it only knows what you’ve trained it on and how you’ve trained it. Sure, it can use NLG to write pretty sentences. But it doesn’t know if its information is accurate or not. Only the people who are training the engine know.

READ MORE: This is how to improve your AI performance

Here is my best advice to prepare for AI

  • Structure content into discreet components
  • Deduplicate content so you have a single source of truth
  • Make sure your single source of truth is accurate
  • Resolve conflicting information
  • Remove outdated information
  • Clean, clean, clean

When you are confident that you have cleaned your content, give it one more pass just to be sure. After this, have the AI engine ingest it. Create your question/answer pairs for training. Validate the training. Then and only then can you be confident that the answers your AI provides will be accurate. Skip these steps at your company’s peril.

Subscribe to the Content Rules newsletter and get expert tips right in your inbox. 

Val Swisher