The Hybrid Comp Pipeline Explained
How Cardboard Assets sources per-grade comp prices: hand-verified anchors, eBay disappearance-detection inferred sales, and the trust hierarchy. Daily cron, confidence scoring, and the path to Marketplace Insights data.
Every grading-economics verdict on the platform depends on three numbers per card per grader per grade tier: the comp price at top grade, at grade 9, and at below 9. These nine numbers (per grader) determine whether the verdict is GRADE, WATCH, or SKIP. This page explains where they come from and how confident you should be in them.
Pipeline overview
The pipeline runs nightly via cron at 03:00 ET. Four stages in sequence:
- Sync โ pull every active eBay listing for in-scope cards via the
eBay Browse API. Write to
ebay_active_listings(current snapshot) andebay_listing_history(per-listing lifecycle). - Infer โ for every listing in history that's no longer active
(last_seen_at > 24 hours ago), classify as
inferred_sold(with confidence) orinferred_other. High-confidence inferred sales aggregate intoprices_gradedwith sourceinferred_sold_hybrid. - Export โ re-emit all site JSONs with the fresh comp data. Astro reads these at build time.
- Build โ
npm run buildregenerates the static dist. Charizard regression guard runs at the end โ if Charizard 1st Ed's best_net_profit drifts from $18,260.80, the cron halts with exit code 12.
Trust hierarchy
Comp prices come from one of five sources, in trust order:
| Source | What it means | Trust |
|---|---|---|
manual_verified | Hand-curated against external auctions, PSA Pop Report, verified sales | Highest |
ebay_sold | eBay Marketplace Insights API (Path A, gated on application approval) | Highest (when active) |
inferred_sold_hybrid | eBay disappearance-detected with confidence 0.4+ | Medium-High |
psa_pop | PSA Pop Report cumulative comp (when available) | Medium |
manual_seed | Original platform seed values from initial bootstrap | BETA โ labeled "verifying" |
The aggregation step always prefers higher-trust sources. A manual_verified
row is never overwritten by automated sources โ Charizard 1st Ed PSA 10 stays at the
hand-curated $850,000 forever, regardless of what eBay disappearance detection turns up.
Disappearance-detection model
The hybrid pipeline's core trick: watch active listings until they disappear, then classify why.
What disappearance can mean
A listing leaving active eBay search has four possible explanations:
- It sold. What we want.
- It was canceled. Seller pulled the listing.
- It expired. 30-day listing ended without sale.
- It was relisted at a different price. Same item, new listing ID.
The pipeline can't ask eBay which happened (the Browse API doesn't expose sale state). Instead it uses signal heuristics:
Confidence scoring
Each disappeared listing gets a confidence score (0-1) for "this was a sale":
| Confidence band | Signal pattern | Classification |
|---|---|---|
| High (0.7-1.0) | Observed โฅ 7 days, price stable or declining, โฅ 5 observations, seller feedback > 100 | inferred_sold (aggregates into prices_graded) |
| Medium (0.4-0.7) | 3-7 days observed, stable price, some seller credibility | inferred_sold (lower-weight aggregation) |
| Low (0.0-0.4) | < 3 days observed, price moved up before disappearance, zero-feedback seller, single observation | inferred_other (no aggregation) |
The pipeline over-flags inferred_other rather than risk false-positive
sales. Better fewer comps than wrong comps.
Freshness chip
Every card surfaces its comp freshness on the optimizer page:
- Verified ยท Manually checked โ
manual_verifiedsource. Hand-confirmed, trust the number. - Fresh ยท < 30 days โ high-confidence inferred sales within the last 30 days.
- Aging ยท 30-90 days โ inferred sales 30-90 days old.
- Insufficient ยท pipeline accumulating โ fewer than 3 qualifying sales; verdict deferred (the page shows an empty-state hero).
The chip's color and copy is the single most-important UI signal on the optimizer. Don't trust a SKIP verdict on an Insufficient card; the data just isn't there yet.
Cron schedule + monitoring
Daily at 03:00 ET (07:00 UTC during EST, 08:00 UTC during EDT). The wrapper at
factory/scripts/cron_comp_pipeline.sh coordinates all four stages.
Status output writes to factory/data/comp_pipeline_last_run.json.
Exit codes:
0โ clean run2โ eBay credentials missing or 4013โ 5+ consecutive 5xx responses4โ title-match heuristic broke (0 listings accepted across samples)5โ 429 rate limit6โ > 5% of cards returned zero results across all tiers7โ over-confident inference (single-sale confidence โฅ 0.99)10โ export failed11โ build failed12โ Charizard regression guard tripped
Path A: eBay Marketplace Insights
The hybrid pipeline is a workaround. eBay's Marketplace Insights API returns real sold-listing data. It's a Limited Release that requires per-application approval. The platform's path: run Path B (Browse API + disappearance detection) for 30+ days with clean usage history, then apply for Marketplace Insights.
When Path A lands, comp prices upgrade from inferred_sold_hybrid to
ebay_sold โ real, not inferred. The aggregation logic is already coded;
it just needs the API access. See factory/data/ebay_browse_integration_spec_2026-05-07.md
for the full plan.
Frequently asked questions
- Where do the per-grade comp prices come from?
- Three sources in trust order: (1) manual_verified โ hand-curated against external references like SCI auction comps and PSA Pop Report (highest trust, ~16 cards today including Charizard 1st Ed); (2) inferred_sold_hybrid โ eBay disappearance-detected sales with confidence scoring (grows nightly); (3) manual_seed โ original platform seed values labeled BETA verifying. Every card surfaces its comp source in the freshness chip.
- How does eBay disappearance detection work?
- The nightly cron writes every active eBay listing for in-scope cards to ebay_listing_history. Listings that don't appear in the next sync (24+ hours later) flip to disappeared status. The inference pass classifies each disappeared listing as inferred_sold (with confidence 0-1) or inferred_other based on: days observed, price stability, observation count, and seller credibility. High-confidence inferred sales aggregate into prices_graded.
- Why not just scrape eBay sold listings directly?
- TCGplayer is eBay-owned. Scraping eBay's public sold-listings page would jeopardize eBay Partner Network eligibility (a real revenue line) and could violate eBay's terms of service. The disappearance-detection approach uses only the official eBay Browse API (active listings only) โ the inference does the rest. The real sold-data fix is eBay's Marketplace Insights API, which the platform plans to apply for once Browse usage has 30+ days of clean history.
- How confident is the inferred-sold classification?
- Each inferred sale carries a confidence score 0-1. High confidence (0.7+) requires: observed โฅ 7 days, stable or declining price, โฅ 5 observations, seller feedback > 100. Medium (0.4-0.7) is 3-7 days observed with some seller credibility. Low (< 0.4) gets classified inferred_other (cancellation, expiration, relist) instead of inferred_sold. The pipeline over-flags 'other' rather than over-claiming 'sold.'
- What happens to a manual_verified row when the pipeline writes inferred data?
- Nothing. The aggregation step always checks for existing manual_verified rows on the (card, grader, grade) tuple and skips writing inferred data if any exist. Charizard 1st Ed PSA 10 ($850K) is permanently anchored at the manual_verified value โ automated sources can't overwrite it.