Ever since moving to word processing, the pharmaceutical industry has relied on unstructured authoring environments such as Microsoft Word to create submission content. For decades, pharma has used a combination of MS Word and file storage (such as SharePoint) to author, manage, store, and publish content destined for regulatory agencies.
Content that emanates from MS Word, Google Docs, and other unstructured paradigms is, by definition, worked on in documents. The document is the “unit” of content. Documents, by their very nature, have certain characteristics, features, and drawbacks.
Characteristics of Documents
Here are some of the characteristics that are intrinsic to documents:
- Usually contain all information about a subject in a single file
- Written, stored, managed as a single file
- The entire file is updated, even if only a single word is affected
- Published as a single file
- Translated as a single file
- Metadata is applied at the file level
- Reusing content involves copying and pasting various words from one document to another
- Multiple versions of the document can co-exist
- Documents are always formatted
- Each type of format (e.g., .DOCX or .PDF) requires its own file to be created, managed, stored, and published
The Problem with Documents
We often refer to unstructured documents as monoliths because all functions must be done across the entire unit of content.
Working with content at the document level creates and enforces problems:
- Copy and paste eliminates a single source of truth
- To keep them in sync, changes made to one document also need to be made to the copied and pasted derivatives of that document
- Expensive to translate, particularly if only a small part of the document has been added or changed
- Cumbersome, time consuming, and expensive to manage
- Difficult to find specific information due to file-level tagging (particularly using .PDF files)
- Difficult and cumbersome to scale
- Difficult to track and audit each specific change to a document
Data Not Documents
By treating content as data, rather than documents, most of the problems inherent in monolithic documents are remedied. When the unit of content is data, rather than a complete document, the content is created, stored and managed as format-free, small components.
Each component exists on its own and is treated as a single entity. A “document” is a collection of data-sized components that are woven together using a map. The map is then the unit of publishing – all components that belong to the same map are published together as a single document.
Characteristics of Data
Here are some characteristics that are intrinsic to data:
- Each piece contains details about a single, small, standalone piece of information or topic
- Created, stored, and managed in small components
- Only the single component is updated when a change is made
- Content is published as a collection of components
- Translation occurs at the component level
- Metadata is applied to each component
- A single component can be reused in multiple documents
- The component is a single source of truth (exist once, use everywhere)
- Created, managed, and stored without format
- Format is applied at publishing, so one component can be published to any number of formats
Benefits of Data
Structured content authoring, using components of content that are treated as data, offers many benefits:
- Maintain a single source of truth, regardless of how many outputs a component is appears in
- Create a variety of file types from the same components
- Only send affected component(s) to translation (faster and less expensive to translate)
- Easy to find granular pieces of information
- Easy to track and audit changes
- Scalable across very large quantities of content
Infrastructure for Treating Content as Data
To treat content as data, rather than documents, requires a change in how we create, store, manage, and publish that content. It also requires different tools. Here are some of the tools that allow content to be treated as data:
- Structured Content Authoring (SCA) software – The SCA tool is used by medical writers and others to create content. Some SCA tools are similar to MS Word or Google Docs in the way they look and feel. Others have complex interfaces where the SCA user can do complex tasks.
- Structured Content Management System (SCMS) – The SCMS is used to store and manage components of content. The SCMS tracks where a component is used, metadata about each, and does security and version control. Most manage the review process and even have built-in workflow capabilities. Many store translated components with their source component for easy access. The SCMS has additional built-in features and most can connect to a variety of external systems that perform functions such as RIM and sophisticated metadata management.
- Delivery Engine – The delivery engine is responsible for producing the final output. It builds the final “document” according to a map. It adds the formatting to the components right before sending the final output to its destination. Delivery engines are capable of producing final output destined for a variety of file types. The output file types are dependent on how the delivery engine is configured and usually include .PDF and .HTML. They can also be configured to publish .DOCX, JSON, and other types of outputs.
There are many additional items that can be added to a Structured Content Management ecosystem. However, all SCM ecosystems include an SCA tool, an SCMS, and a delivery engine.
As an industry, pharma is realizing that using a document as the unit of content is cumbersome, costly, and complicated to manage on a large scale. Many are contemplating the move to a Structured Content Authoring environment. Structured content has been used by other vertical markets for decades. Industries that adopted SCA long ago include hardware, software, manufacturing, and even medical devices. Over the years, they have realized the benefits of SCA. The time for pharma to make the move is now.