Methodology

The Hybrid Comp Pipeline Explained

How Cardboard Assets sources per-grade comp prices: hand-verified anchors, eBay disappearance-detection inferred sales, and the trust hierarchy. Daily cron, confidence scoring, and the path to Marketplace Insights data.

Published

Every grading-economics verdict on the platform depends on three numbers per card per grader per grade tier: the comp price at top grade, at grade 9, and at below 9. These nine numbers (per grader) determine whether the verdict is GRADE, WATCH, or SKIP. This page explains where they come from and how confident you should be in them.

Pipeline overview

The pipeline runs nightly via cron at 03:00 ET. Four stages in sequence:

  1. Sync โ€” pull every active eBay listing for in-scope cards via the eBay Browse API. Write to ebay_active_listings (current snapshot) and ebay_listing_history (per-listing lifecycle).
  2. Infer โ€” for every listing in history that's no longer active (last_seen_at > 24 hours ago), classify as inferred_sold (with confidence) or inferred_other. High-confidence inferred sales aggregate into prices_graded with source inferred_sold_hybrid.
  3. Export โ€” re-emit all site JSONs with the fresh comp data. Astro reads these at build time.
  4. Build โ€” npm run build regenerates the static dist. Charizard regression guard runs at the end โ€” if Charizard 1st Ed's best_net_profit drifts from $18,260.80, the cron halts with exit code 12.

Trust hierarchy

Comp prices come from one of five sources, in trust order:

SourceWhat it meansTrust
manual_verified Hand-curated against external auctions, PSA Pop Report, verified sales Highest
ebay_sold eBay Marketplace Insights API (Path A, gated on application approval) Highest (when active)
inferred_sold_hybrid eBay disappearance-detected with confidence 0.4+ Medium-High
psa_pop PSA Pop Report cumulative comp (when available) Medium
manual_seed Original platform seed values from initial bootstrap BETA โ€” labeled "verifying"

The aggregation step always prefers higher-trust sources. A manual_verified row is never overwritten by automated sources โ€” Charizard 1st Ed PSA 10 stays at the hand-curated $850,000 forever, regardless of what eBay disappearance detection turns up.

Disappearance-detection model

The hybrid pipeline's core trick: watch active listings until they disappear, then classify why.

What disappearance can mean

A listing leaving active eBay search has four possible explanations:

  1. It sold. What we want.
  2. It was canceled. Seller pulled the listing.
  3. It expired. 30-day listing ended without sale.
  4. It was relisted at a different price. Same item, new listing ID.

The pipeline can't ask eBay which happened (the Browse API doesn't expose sale state). Instead it uses signal heuristics:

Confidence scoring

Each disappeared listing gets a confidence score (0-1) for "this was a sale":

Confidence bandSignal patternClassification
High (0.7-1.0) Observed โ‰ฅ 7 days, price stable or declining, โ‰ฅ 5 observations, seller feedback > 100 inferred_sold (aggregates into prices_graded)
Medium (0.4-0.7) 3-7 days observed, stable price, some seller credibility inferred_sold (lower-weight aggregation)
Low (0.0-0.4) < 3 days observed, price moved up before disappearance, zero-feedback seller, single observation inferred_other (no aggregation)

The pipeline over-flags inferred_other rather than risk false-positive sales. Better fewer comps than wrong comps.

Freshness chip

Every card surfaces its comp freshness on the optimizer page:

  • Verified ยท Manually checked โ€” manual_verified source. Hand-confirmed, trust the number.
  • Fresh ยท < 30 days โ€” high-confidence inferred sales within the last 30 days.
  • Aging ยท 30-90 days โ€” inferred sales 30-90 days old.
  • Insufficient ยท pipeline accumulating โ€” fewer than 3 qualifying sales; verdict deferred (the page shows an empty-state hero).

The chip's color and copy is the single most-important UI signal on the optimizer. Don't trust a SKIP verdict on an Insufficient card; the data just isn't there yet.

Cron schedule + monitoring

Daily at 03:00 ET (07:00 UTC during EST, 08:00 UTC during EDT). The wrapper at factory/scripts/cron_comp_pipeline.sh coordinates all four stages. Status output writes to factory/data/comp_pipeline_last_run.json.

Exit codes:

  • 0 โ€” clean run
  • 2 โ€” eBay credentials missing or 401
  • 3 โ€” 5+ consecutive 5xx responses
  • 4 โ€” title-match heuristic broke (0 listings accepted across samples)
  • 5 โ€” 429 rate limit
  • 6 โ€” > 5% of cards returned zero results across all tiers
  • 7 โ€” over-confident inference (single-sale confidence โ‰ฅ 0.99)
  • 10 โ€” export failed
  • 11 โ€” build failed
  • 12 โ€” Charizard regression guard tripped

Path A: eBay Marketplace Insights

The hybrid pipeline is a workaround. eBay's Marketplace Insights API returns real sold-listing data. It's a Limited Release that requires per-application approval. The platform's path: run Path B (Browse API + disappearance detection) for 30+ days with clean usage history, then apply for Marketplace Insights.

When Path A lands, comp prices upgrade from inferred_sold_hybrid to ebay_sold โ€” real, not inferred. The aggregation logic is already coded; it just needs the API access. See factory/data/ebay_browse_integration_spec_2026-05-07.md for the full plan.

Frequently asked questions

Where do the per-grade comp prices come from?
Three sources in trust order: (1) manual_verified โ€” hand-curated against external references like SCI auction comps and PSA Pop Report (highest trust, ~16 cards today including Charizard 1st Ed); (2) inferred_sold_hybrid โ€” eBay disappearance-detected sales with confidence scoring (grows nightly); (3) manual_seed โ€” original platform seed values labeled BETA verifying. Every card surfaces its comp source in the freshness chip.
How does eBay disappearance detection work?
The nightly cron writes every active eBay listing for in-scope cards to ebay_listing_history. Listings that don't appear in the next sync (24+ hours later) flip to disappeared status. The inference pass classifies each disappeared listing as inferred_sold (with confidence 0-1) or inferred_other based on: days observed, price stability, observation count, and seller credibility. High-confidence inferred sales aggregate into prices_graded.
Why not just scrape eBay sold listings directly?
TCGplayer is eBay-owned. Scraping eBay's public sold-listings page would jeopardize eBay Partner Network eligibility (a real revenue line) and could violate eBay's terms of service. The disappearance-detection approach uses only the official eBay Browse API (active listings only) โ€” the inference does the rest. The real sold-data fix is eBay's Marketplace Insights API, which the platform plans to apply for once Browse usage has 30+ days of clean history.
How confident is the inferred-sold classification?
Each inferred sale carries a confidence score 0-1. High confidence (0.7+) requires: observed โ‰ฅ 7 days, stable or declining price, โ‰ฅ 5 observations, seller feedback > 100. Medium (0.4-0.7) is 3-7 days observed with some seller credibility. Low (< 0.4) gets classified inferred_other (cancellation, expiration, relist) instead of inferred_sold. The pipeline over-flags 'other' rather than over-claiming 'sold.'
What happens to a manual_verified row when the pipeline writes inferred data?
Nothing. The aggregation step always checks for existing manual_verified rows on the (card, grader, grade) tuple and skips writing inferred data if any exist. Charizard 1st Ed PSA 10 ($850K) is permanently anchored at the manual_verified value โ€” automated sources can't overwrite it.