The marketplace's SEO team manages over 8 billion URLs with individual datasets measured in terabytes. Yet critical SEO signals were trapped in silos across multiple platforms, requiring separate logins and manual extraction. Senior analysts spent their time wrangling spreadsheets instead of optimizing strategy. When leadership asked fundamental questions like "How many pages do we actually need?", the team couldn't answer. They called it the "infinity problem": infinite pages, no prioritization framework, and no way to separate signal from noise at scale.
Standard ETL pipelines timed out against terabyte-scale datasets. Google Search Console sampling stripped away the granular signals needed for optimization. Web log data auto-archived every six months, erasing seasonal context critical for a marketplace business driven by holiday peaks. Internal link and sitemap relationships weren't captured at all, leaving blind spots across billions of interconnected pages. This wasn't just fragmentation. It was a computational wall.
Mammoth Growth architected the SEO Command Center: a unified BigQuery platform consolidating GSC, Botify crawl data, server logs, and internal link data using medallion architecture optimized for terabyte-scale performance, with reusable LookML components designed for cross-team adoption.
The platform revealed that 84% of bot crawls targeted pages with zero visits and zero search volume, enabling crawl budget reallocation to revenue-driving pages. Of roughly 1 billion pages, only 10 million drive SEO value, giving leadership a concrete optimization target. Quality score analysis spanning 15 months of multi-source data was completed in under 3 weeks, work that previously would have taken months.
Executive-level answers, on demand. The SEO team now confidently reports which pages matter, where crawl budget is wasted, and how performance shifts year over year.
Self-serve insights beyond SEO. Reusable LookML components and standardized data models enable cross-functional teams to run their own analyses.
AI-ready architecture. The platform is designed to incorporate new data sources and support AI-driven automation and optimization agents, positioning the marketplace to lead in AI-powered SEO by 2026.
From SEO project to org-wide data foundation. What started as infrastructure for one team became the analytical backbone for the company.
Services
Tech Stack
Results
Crawl waste identified
84% hitting zero-value pages
SEO scope reduction
99% (from ~1B to ~10M pages)
Analysis acceleration
10x (under 3 weeks vs. months)
Cross-dataset analysis
Enabled for the first time