Upgrade and Downgrade

What operators need to know to move a running T4 deployment from one v1.x release to another. For the underlying compatibility rules see the v1 Compatibility Contract.

What v1.x guarantees

A newer v1.x binary can open a data directory and an object-store prefix written by any older v1.x binary.
Newer v1.x can also add fields, flags, default values, exported types, and metrics — but never remove or change the meaning of existing ones without a major version bump.
Default values listed in the defaults table do not change across v1.x unless explicitly documented as tuning guidance.

What fails closed

These are the failure modes the binary refuses rather than silently corrupting state. They protect against an unsupervised downgrade onto data the older binary cannot interpret. None of them is recoverable without action; the binary exits with a clear error and the cluster’s other nodes (if any) keep serving.

Trigger	Behaviour
Checkpoint `format_version` newer than this binary understands	Startup refuses to restore. Operator must roll the binary forward.
WAL frame magic / version newer than this binary understands	Startup refuses to replay. Same remedy.
`manifest/latest` references a checkpoint missing referenced SSTs	Startup error. Recoverable from an earlier checkpoint via `t4 restore`.
Branch registry entry points at a checkpoint older than the safe `format_version`	GC refuses to reclaim. Inspect with `t4 inspect`.

Rolling upgrade (cluster)

For multi-node clusters, upgrade one node at a time. The leader stays available throughout; the follower being upgraded re-syncs from S3 on restart.

Pick the oldest node that is currently a follower (avoid the leader on the first pass).
Stop the t4 process. Followers resync automatically on restart, so the data directory does not need to be cleared.
Install the new binary and restart with the same flags as before.
Wait for the follower to catch up — t4 status shows current_revision >= leader_current_revision - tolerable_lag and the leader’s metrics show the follower reconnected.
Repeat for the remaining followers.
Restart the leader last. A follower wins the takeover election (~6 s; see Consistency). Promote it gracefully by signalling the leader with SIGTERM; the new leader is the highest-CommittedRev follower.

The cluster keeps accepting writes throughout, modulo the ~6 s window when the leader is being restarted.

Single-node upgrade

Stop the t4 process.
Install the new binary.
Start with the same flags. T4 replays any unsealed WAL entries from the local data directory; if the local directory is gone (ephemeral storage), it restores the latest checkpoint from S3.

Downgrade

A downgrade to an older v1.x release is best-effort, not guaranteed. The newer binary may have written checkpoint or WAL formats the older binary does not understand.

Check these before downgrading:

Has the cluster written a new checkpoint since the upgrade?
Terminal window
```
t4 inspect meta --data-dir /var/lib/t4
```
If the most recent checkpoint was produced by the newer binary, the older binary may refuse to start (see fail-closed table above).
Are any branch registry entries (branches/<id> in S3) newer than the older binary’s safe format_version? List them:
Terminal window
```
aws s3 ls s3://your-bucket/your-prefix/branches/
```
Active branches pin their ancestor checkpoint format, so downgrading past that version is unsafe even if the latest checkpoint is older.
Has any newly-added t4.Config field been used at runtime? See git diff of config.go between the two versions. Embedders relying on a newer field will get a compile error against the older library; the standalone binary silently ignores unknown CLI flags from a newer env-var set, which is a recipe for silent misconfiguration.

If all three answers are “no”, the downgrade is safe. If any answer is “yes”, roll forward to a release without the regression instead — moving forward with a fix is always safer than moving backward.

What to do when fail-closed fires

The binary exits with an error referencing the schema version it rejected. Steps:

Note the version it refused and the corresponding compatibility contract section.
Do not delete the offending object — it is needed by the newer binary you must roll back to.
Roll the binary forward to a version that supports that schema. The data directory and S3 prefix are unchanged by the failed start.