Methodology

How TrendArc decides — every rule, window, and threshold

Every insight TrendArc shows is produced by a deterministic rule running in your browser. No machine-learning black box, no server, no "our AI thinks." That means every claim can be written down and checked — so here it is, all of it.

Trends

Each metric's trend compares your most recent window (up to 90 days) against the window of equal length immediately before it. A metric must have data on at least 60% of days to be scored at all, and the change must exceed a per-metric threshold before we call it a trend — 5% for HRV and sleep duration, 3% for heart-rate metrics, 8% for sleep stages, 10% for steps (ring step counts are noisy). Below the threshold we say it's steady, because it is.

Baseline bands

The shaded band on trend charts is your trailing 60-day mean ± one standard deviation — your personal normal range, not a population number. A day outside the band is unusual for you.

Trajectory ("if your last 30 days became your new normal")

The Whoop-style direction-of-travel verdict compares your last 30 days against your longer baseline, per metric, weighted so recovery metrics (HRV, resting heart rate) count more than steps. The composite percentage you see is that weighted mean, and needs 90+ days of history before it fires.

The Receipts

The cause-and-effect ledger is the most statistically defended thing in the app, because unverified correlations are how health apps lose trust. A behavior — a night you tagged ("drank", "magnesium"), or one we derive from the data itself (weekend mornings, your top-10% step days) — only earns a receipt if it survives all six gates:

Gate	Rule	Why
Minimum evidence	≥ 8 behavior days with data	A pattern, not an anecdote
Local baseline	Each behavior day is compared against non-behavior days within ±45 days, using a linear fit	A training block or bad quarter can't masquerade as a behavior effect
Both sides	≥ 4 baseline days before AND after each behavior day	A habit you adopted mid-trend is statistically indistinguishable from the trend — so we refuse to score it
Day-type matching	Weekend behavior days are only compared to weekend baselines (and weekdays to weekdays)	Friday-night behaviors don't get credit for Saturday sleep-ins
Effect size	\|Cohen's d\| ≥ 0.3 on the paired differences	Statistically real but tiny effects aren't worth your attention
Significance	Paired \|t\| = \|d\|·√n ≥ 4 (roughly p < 0.005), strict enough to survive the ~10 comparisons each behavior gets	In simulation, effect-size-only gating produced a false receipt on 90% of pure-noise tags; with this gate it's under 1%

Every receipt shows its evidence (n and signal strength) right on the line. "Strong signal" means |d| ≥ 0.8. We publish our false-positive rate because we measure it: the test suite includes tags with deliberately zero effect, and the engine must stay silent on them to ship.

A receipt is a measured association in your data, honestly gated — it is still not a controlled experiment, and it is not medical advice.

Sleep archetype

Your archetype is a rule-based cluster of three numbers computed from your history: bedtime consistency, weekday/weekend sleep gap, and sleep duration. The label is a name for numbers you can see — the three stats are printed on the same slide.

Correlations

The correlation matrix uses Pearson's r between daily metric pairs, behind the same philosophy as the receipts: a relationship is shown only if it has 30+ overlapping days, |r| ≥ 0.3, and passes a significance test (p < 0.01) — with its n printed on the line. Pairs that fail any gate aren't shown faintly; they aren't shown at all. Correlations remain exploratory — they never claim cause and effect the way receipts do.

Anomalies, streaks, superlatives

Anomalies are days more than two standard deviations from your trailing mean. Streaks count consecutive days past fixed thresholds (8k steps, 7h sleep, HRV ≥ 40ms). Superlatives are simple maxima/minima over your whole history, joined to the day that preceded them — that's how "your best sleep, the morning after a 14k-step day" gets written.

What we deliberately don't do

No population comparisons — every baseline is yours.
No claims from under 30 days of data; the Story's premium slides need 90+.
No correlations shown without gates, ever.
No data leaves your browser — the analysis you're reading about runs locally.

See every rule fire on real data

The demo dataset is 14 months of realistic ring data — trends, receipts, trajectory, and archetype, all live.

Open the live demo →