The LTO Show – AI and Storage Infrastructure Trends with Tom Coughlin

Host Intro

Welcome to The LTO Show — the premier podcast for leaders in the tape and storage industries. I’m your host, Pete Paisley. Each episode, I bring you conversations with industry leaders who are contributing in meaningful ways to the tape storage community. Our goal is to deliver fresh insights into the business case for LTO tape storage — and today, we’re exploring how tape storage is becoming the unsung hero in the age of artificial intelligence.

If you’ve been following the AI revolution, you know that large language models — LLMs like ChatGPT, Claude, and Gemini — are transforming how we work and communicate. But behind these incredible AI systems lies an enormous data challenge. We’re talking about petabytes — even exabytes — of training data that must be stored, managed, and preserved.

And that’s where tape comes in.

Today, I’m thrilled to welcome Tom Coughlin, CEO of Coughlin Associates, a leading storage consulting and market research firm. Tom is one of the foremost experts on storage technology trends and has been closely tracking the intersection of AI and storage infrastructure.

Segment 1 – Introduction

PETE:
Tom, welcome to The LTO Show.

TOM COUGHLIN:
Thanks, Pete. Great to be here. I’ve been looking forward to this conversation.

PETE:
Before we dive into the technical details, tell us a bit about your background.

TOM:
I’ve been in the data storage industry for over 40 years, holding engineering and senior management positions. Coughlin Associates provides consulting, publishes books and market research reports, and organizes digital storage and memory-focused events.

I’m a regular contributor to Forbes.com and media and entertainment publications. I’m an IEEE Fellow, 2025 IEEE Past President, Past President of IEEE-USA, Past Director of IEEE Region 6, and Past Chair of the Santa Clara Valley IEEE Section. I’m also active with SNIA and SMPTE.

For more information, visit www.tomcoughlin.com.

PETE:
When most people think about AI training, they picture massive GPU clusters and ultra-fast storage. Tape storage seems almost counterintuitive. Why are we talking about tape in the context of cutting-edge AI?

TOM:
That’s a common question. AI training is a multi-stage process with very different storage requirements at each stage. During active training, you absolutely need high-speed NVMe SSDs feeding GPUs. But once a training run is complete, you’re left with massive datasets that still have enormous value — yet don’t require constant access. That’s where tape becomes essential.

Segment 2 – The Scale of LLM Training Data

PETE:
When we say “massive datasets,” what does that really mean?

TOM:
The scale is staggering. Modern LLMs are trained on datasets ranging from hundreds of terabytes to multiple petabytes. GPT-3 used around 45 terabytes of text data. GPT-4 and newer models? Estimates suggest 10 to 20 petabytes or more when you include preprocessing data, intermediate checkpoints, and model versions.

PETE:
And companies are running multiple training cycles, correct?

TOM:
Exactly. They’re training multiple versions, running experiments, fine-tuning for different use cases, and iterating continuously. Each training run generates checkpoints, logs, validation data, and evaluation datasets. All of this must be preserved for compliance, reproducibility, and future reference.

PETE:
So it’s not just the final model — it’s the entire data lineage.

TOM:
Correct. And you can’t simply delete it. Regulatory requirements — particularly in healthcare and finance — mandate retention. If a model fails months later, you must trace what happened. Plus, the training process itself can cost tens of millions of dollars in compute resources. That data is a major investment.

Segment 3 – The Economics of Tape for AI

PETE:
Let’s talk economics. Why is tape so attractive for AI archives?

TOM:
Three reasons: cost per terabyte, energy efficiency, and longevity.

LTO-10 offers 30 to 40 terabytes native per cartridge, up to 100 terabytes compressed. Tape costs about $6 to $10 per terabyte. Enterprise hard drives cost roughly $15 to $30 per terabyte, and SSDs are significantly more.

PETE:
So a 3x to 5x cost advantage.

TOM:
Yes — and that’s just acquisition cost. Tape consumes zero power when idle. Hard drives spin continuously and require cooling. For petabyte-scale archives, energy savings can exceed 90% compared to disk-based storage.

PETE:
And longevity?

TOM:
Tape has a shelf life of 30 years or more if stored properly. Hard drives typically last 3 to 5 years in operation. For long-term AI archives, tape is extremely reliable.

PETE:
What about the perception that tape is slow?

TOM:
That’s why we use a tiered storage model. You’re not accessing archival data frequently. If retrieval takes a few hours for an audit or retraining process, that’s acceptable. The key is not paying premium prices for rarely accessed data.

Segment 4 – Real-World Architecture

TOM:
Think of AI storage as a three-tier pyramid:

Hot Tier: NVMe SSDs and high-bandwidth memory for active training.
Warm Tier: High-capacity hard drives for recent datasets and fine-tuning.
Cold Tier: Tape for long-term archives, compliance, and completed training runs.

Modern software automates migration between tiers. After 30 to 60 days of inactivity, data can automatically move to tape.

In 2024, tape shipments reached a record 176.5 exabytes — up 15.4% year-over-year. Much of this growth is driven by hyperscalers and AI companies.

Tape isn’t obsolete. It has evolved into the foundation of long-term data preservation.

Segment 5 – Challenges and Future Outlook

TOM:
There are challenges:

  • Cultural bias against tape among younger IT professionals
  • Upfront infrastructure investment
  • Restore time considerations

However, innovation continues. LTO-10 is shipping, and the roadmap extends through LTO-14 with higher capacities.

We’re also seeing better integration with cloud-native workflows. Some cloud providers use tape behind the scenes for deep archive tiers.

Looking ahead, AI models will grow larger, regulatory scrutiny will increase, and data retention needs will expand. All of these trends favor low-cost archival storage like tape.

We’re also seeing the rise of “data vaults” — immutable archives for compliance and audit. Tape’s write-once capability makes it ideal for this use case.

Segment 6 – Closing Thoughts

TOM:
Don’t dismiss tape as obsolete. In the age of AI, where data volumes are exploding and costs matter, tape is more relevant than ever. It’s not about replacing fast storage — it’s about building a smart, tiered system.

PETE:
For organizations starting this journey?

TOM:
Audit your data. Understand access patterns and retention requirements. Then design an automated tiered architecture that balances performance, cost, and sustainability.

Host Outro

Let’s recap the key takeaways:

  1. AI training generates massive data volumes — 10 to 20 petabytes or more per model when including preprocessing and checkpoints.
  2. Tape storage provides a 3x to 5x cost advantage over disk and near-zero idle energy consumption.
  3. 2024 saw record tape shipments of 176.5 exabytes — up 15.4% year-over-year.
  4. Tiered storage is essential: NVMe for hot data, HDDs for warm data, tape for cold archives.
  5. Tape’s write-once nature supports compliance and immutable AI data vaults.

Tape isn’t legacy technology. It’s the economic and sustainable foundation for long-term AI data preservation.

Thank you to Tom Coughlin from Coughlin Associates for joining us.

Subscribe wherever you get your podcasts, and join us next time as we continue exploring the business case for LTO tape storage in the age of AI.

Leave a Reply