Data Not Documents

Ever since moving to word processing, the pharmaceutical industry has relied on unstructured authoring environments such as Microsoft Word to create submission content. For decades, pharma has used a combination of MS Word and file storage (such as SharePoint) to author, manage, store, and publish content destined for regulatory agencies.

Content that emanates from MS Word, Google Docs, and other unstructured paradigms is, by definition, worked on in documents. The document is the “unit” of content. Documents, by their very nature, have certain characteristics, features, and drawbacks.

Characteristics of Documents

Here are some of the characteristics that are intrinsic to documents:

Usually contain all information about a subject in a single file
Written, stored, managed as a single file
The entire file is updated, even if only a single word is affected
Published as a single file
Translated as a single file
Metadata is applied at the file level
Reusing content involves copying and pasting various words from one document to another
Multiple versions of the document can co-exist
Documents are always formatted
Each type of format (e.g., .DOCX or .PDF) requires its own file to be created, managed, stored, and published

The Problem with Documents

We often refer to unstructured documents as monoliths because all functions must be done across the entire unit of content.

Working with content at the document level creates and enforces problems:

Copy and paste eliminates a single source of truth
To keep them in sync, changes made to one document also need to be made to the copied and pasted derivatives of that document
Expensive to translate, particularly if only a small part of the document has been added or changed
Cumbersome, time consuming, and expensive to manage
Difficult to find specific information due to file-level tagging (particularly using .PDF files)
Difficult and cumbersome to scale
Difficult to track and audit each specific change to a document

Data Not Documents

By treating content as data, rather than documents, most of the problems inherent in monolithic documents are remedied. When the unit of content is data, rather than a complete document, the content is created, stored and managed as format-free, small components.

Each component exists on its own and is treated as a single entity. A “document” is a collection of data-sized components that are woven together using a map. The map is then the unit of publishing – all components that belong to the same map are published together as a single document.

Characteristics of Data

Here are some characteristics that are intrinsic to data:

Each piece contains details about a single, small, standalone piece of information or topic
Created, stored, and managed in small components
Only the single component is updated when a change is made
Content is published as a collection of components
Translation occurs at the component level
Metadata is applied to each component
A single component can be reused in multiple documents
The component is a single source of truth (exist once, use everywhere)
Created, managed, and stored without format
Format is applied at publishing, so one component can be published to any number of formats

Benefits of Data

Structured content authoring, using components of content that are treated as data, offers many benefits:

Maintain a single source of truth, regardless of how many outputs a component is appears in
Create a variety of file types from the same components
Only send affected component(s) to translation (faster and less expensive to translate)
Easy to find granular pieces of information
Easy to track and audit changes
Scalable across very large quantities of content

Infrastructure for Treating Content as Data

To treat content as data, rather than documents, requires a change in how we create, store, manage, and publish that content. It also requires different tools. Here are some of the tools that allow content to be treated as data:

Structured Content Authoring (SCA) software – The SCA tool is used by medical writers and others to create content. Some SCA tools are similar to MS Word or Google Docs in the way they look and feel. Others have complex interfaces where the SCA user can do complex tasks.
Structured Content Management System (SCMS) – The SCMS is used to store and manage components of content. The SCMS tracks where a component is used, metadata about each, and does security and version control. Most manage the review process and even have built-in workflow capabilities. Many store translated components with their source component for easy access. The SCMS has additional built-in features and most can connect to a variety of external systems that perform functions such as RIM and sophisticated metadata management.
Delivery Engine – The delivery engine is responsible for producing the final output. It builds the final “document” according to a map. It adds the formatting to the components right before sending the final output to its destination. Delivery engines are capable of producing final output destined for a variety of file types. The output file types are dependent on how the delivery engine is configured and usually include .PDF and .HTML. They can also be configured to publish .DOCX, JSON, and other types of outputs.

There are many additional items that can be added to a Structured Content Management ecosystem. However, all SCM ecosystems include an SCA tool, an SCMS, and a delivery engine.

As an industry, pharma is realizing that using a document as the unit of content is cumbersome, costly, and complicated to manage on a large scale. Many are contemplating the move to a Structured Content Authoring environment. Structured content has been used by other vertical markets for decades. Industries that adopted SCA long ago include hardware, software, manufacturing, and even medical devices. Over the years, they have realized the benefits of SCA. The time for pharma to make the move is now.

Data Not Documents

Characteristics of Documents

The Problem with Documents

Data Not Documents

Characteristics of Data

Benefits of Data

Infrastructure for Treating Content as Data

Val Swisher

Let's create amazing content together

Structured Content Is Like Your Closet—More Relevant Than Ever

Taxonomy and Terminology: The Crossroads of Controlled Vocabulary

The Trifecta of Change

Why AI Won’t Save Your Customer Engagement Strategy

Improving Content Efficiency and Quality with Component-Based Content Management

The Triple Advantage of Structured Content for Medical Technology Teams

Pharma Content Reuse

Content Problems vs. Process Problems: Can AI Help You Spot the Difference?

Services

Industries

Resources

About Us

Privacy Policy

© 2026 Content Rules, Inc. All rights reserved.