Data Packages

Transform Data Chaos into Reusable Assets

Flexible, interoperable data packages that compound value over time

Give teams the power to define their own processes, use their own datatypes, and innovate as quickly as they want. Gain organizational-wide data findability while building reliable, trustable, version-controlled data products.

What's Data Chaos?

Quilt Open Source

ccle-test-3/SRR8788981

@ 2f07364e26

METADATA

study: "CCLE"

cancer_type: "Lung Cancer"

organism: "Homo sapiens"

data_type: "WGS"

Showing 1-5 out of 5

SRR8788981.runinfo_ftp.tsv 1.6 kB

SRX5578768_SRR8788981_1.fastq.gz 66.5 kB

SRX5578768_SRR8788981_2.fastq.gz 66.2 kB

Resilience

90%

faster data lookup

What Are Data Packages?

Data packages are intelligent manifests that combine pointers to data (like objects in Amazon S3) with rich context about that data, including lineage, metadata, and revision history.

If you can't find the data, you can't reproduce the analysis.

Data without context is not reusable. Traditional file storage separates data from its meaning, making collaboration and discovery nearly impossible.

Intelligent Manifests

Combine pointers to data with rich context and lineage

Self-Contained

Everything needed to understand and reproduce the analysis

Versioned

Track every change with cryptographic integrity

Discoverable

Find and access data through metadata and search

Built for AI

Packages are where AI in life sciences starts

A model is only as good as the data it can find and trust. A Quilt package bundles your data with its metadata and a versioned, cryptographic history. That gives models and agents the context they need, and lets you trace any result back to the exact data that produced it.

Packages you can verify

Data, metadata, and an immutable version history in one addressable unit. Every output traces back to an exact dataset, so it stays reproducible and audit-ready for regulated work.

Qurator: search in plain English

Ask for data the way you'd ask a colleague. Qurator searches your governed catalog and returns the right packages, scoped to what each person is allowed to see. No ontology expertise needed.

Bring your own model (MCP)

Connect Claude, ChatGPT, or any MCP-compatible agent to your Quilt data with per-user OAuth. Models can read, visualize, and build on your data, and they only ever see what that user can.

From instrument to model, your data stays in your AWS account: governed, versioned, and ready to use.

What data packages give you

Data packages pair familiar data management with cloud-native storage, so your data stays reliable, easy to find, and ready to reuse.

Data Provenance

Immutable hash-based version history — audit-ready for GxP and 21 CFR Part 11

Open Source

Rich Visualizations

Document previews, dashboards, and in-browser charts anchored to package versions

Platform

Powerful Search

Curator natural-language search across metadata and file contents

Platform

Team Collaboration

Web catalog, role-scoped permissions, and shareable package URIs

Platform

02 Package catalog

Browse your data like objects in S3, with context

Every package is a self-contained unit: data, a README, rich metadata, and previews. Explore the tree, read the docs, and see exactly what's inside before you pull a byte.

File tree, README, and key/value metadata in one view
In-browser previews for images, tables, and notebooks
Backed by your own S3, so the data never moves

liver-rnaseq / cohort-2024

# Liver RNA-seq, Cohort 2024

Bulk RNA-seq across 1,284 hepatocyte samples. Aligned with STAR, quantified with Salmon, QC via MultiQC.

assayRNA-seq

organismH. sapiens

samples1,284

size42.7 GB

workflownf-core/rnaseq

qcpassed

Explore the platform

Built for Your Infrastructure

Data packages work with your existing cloud infrastructure and analysis platforms, giving your data a vendor-neutral foundation.

Amazon Web Services

Native S3 integration with advanced AWS technology partnership

Integration

Amazon Web Services

Native S3 integration with advanced AWS technology partnership

Integration

Amazon Web Services

Native S3 integration with advanced AWS technology partnership

Integration

The Data Package Lifecycle

From creation to collaboration

Data packages follow a simple workflow. Start with the free Python SDK for basic packaging, or use the full platform for team collaboration.

Create

Bundle data with metadata (SDK) or use web interface (Platform)

Version

Track changes with SHA-256 checksums

Collaborate across teams and platforms

Discover

Access via SDK commands or rich web search (Platform)

Real-World Applications

See how teams across biotech and life sciences use data packages to accelerate discovery and ensure reproducibility.

Genomics Research

Package sequencing data with sample metadata for reproducible analysis pipelines

Key Benefit: Reproducible Analysis

Example: FASTQ, VCF, BAM files + sample metadata

Genomics Research

Package sequencing data with sample metadata for reproducible analysis pipelines

Key Benefit: Reproducible Analysis

Example: FASTQ, VCF, BAM files + sample metadata

Genomics Research

Package sequencing data with sample metadata for reproducible analysis pipelines

Key Benefit: Reproducible Analysis

Example: FASTQ, VCF, BAM files + sample metadata

Genomics Research

Package sequencing data with sample metadata for reproducible analysis pipelines

Key Benefit: Reproducible Analysis

Example: FASTQ, VCF, BAM files + sample metadata

90%

faster data lookup

Resilience

3×

NGS analysis throughput

Tessera

Weeks → minutes

from instrument to AI-ready package

30+

biotech & pharma teams

incl. Allen Institute, Inari

Trusted by life-sciences organizations

Allen Institute
Inari
Flagship Pioneering
Cellarity
Resilience
Tessera

Data lookups that used to take our scientists days now take minutes, with a single, governed source of truth the whole team can trust.

90% faster data lookup Data Platform team, Resilience

The Data Revolution

From Expendable Resource to Reusable Asset

AI and Machine Learning are creating new opportunities to answer much broader questions than lab data was originally intended for. To be competitive in biotech, using data beyond its original scope is no longer just nice to have. It's an imperative.

Start Building Data Assets

Beyond the lab

Extend the rigor of your data beyond instruments, spreadsheets, and scattered hard drives.

Reusable assets

Versioned packages turn one-off datasets into durable, shareable assets your team can trust.

AI-ready by default

Governed, contextual data that your models and agents can actually use.

Transform Data Chaos into Reusable Assets

What Are Data Packages?

If you can't find the data, you can't reproduce the analysis.

Intelligent Manifests

Self-Contained

Versioned

Discoverable

Packages are where AI in life sciences starts

Packages you can verify

Qurator: search in plain English

Bring your own model (MCP)

What data packages give you

Data Provenance

Rich Visualizations

Powerful Search

Team Collaboration

Browse packages with full context

Browse your data like objects in S3, with context

Built for Your Infrastructure

Amazon Web Services

Amazon Web Services

Amazon Web Services

The Data Package Lifecycle

Create

Version

Share

Discover

Real-World Applications

Genomics Research

Genomics Research

Genomics Research

Genomics Research

Outcomes teams see with Quilt

From Expendable Resource to Reusable Asset

Beyond the lab

Reusable assets

AI-ready by default