
What breaks when your catalog has a million products
If you’re running a large catalog, category structure stops being a simple organizational task, and turns into a technical SEO problem. This came up recently in a discussion around cleaning up canonical categories for a catalog with over a million products. The core question was straightforward:
How do I figure out what my canonical category should be, and what should be a breadcrumb?
The role of canonical categories
In Miva, a canonical category is structural. It defines where a product lives in the hierarchy and is directly tied to things like:
- Breadcrumbs
- Internal linking
- Crawl paths for search engines
- Building authority for your pages, links, categories
So when you assign a canonical category, you’re not just organizing products. You’re defining how both users and search engines move through your site. Before technical SEO was so critical to a site’s success, back when all you had to focus on was keywords, you might have assigned a canonical category as the highest level category of a product, or the last category in which it can be found:
Home > Car Parts
Vs
Home > Car Parts > Toyota > Exterior Accessories > Windshield Wipers
Choosing that canonical category requires a little more strategy now. It’s totally possible that neither the broadest nor the most specific categories are the answer. The breadcrumb should point to the last possible valuable page that applies.
If your Windshield Wipers category only has three products, it’s low value and not worth the crawl budget to find it. But if the Exterior Accessories category is more robust, has more products, and is generally a higher value page, then that’s where your breadcrumb should land.
Breadcrumbs are not just UI
Breadcrumbs are often treated as a visual feature. They help shoppers understand where they are, and navigate their way back up the chain. But from an SEO standpoint, they’re internal links that carry serious weight.
When a breadcrumb exists, it’s effectively saying “This product is from this category, and this category is important.”
Search engine crawlers will follow those links. They pass authority through them. They use them to understand the site structure. But that only happens if the destination actually exists.
Where we run into the problem with large product catalogs, is that this can lead to a LOT of category creation and maintenance.
You can’t point to breadcrumb pages that don’t exist
You may want to try assigning specific and detailed canonical categories, without actually creating the category in your system. You’re flagging the product with a very detailed and nuanced canonical. However, with a catalog that large, you’d end up with thousands of sub categories – a ton of bloat – so you don’t create the actual category. .
Unfortunately, that won’t work. If the category doesn’t exist as a real, indexable page, the breadcrumb link has nowhere meaningful to go. At best, it’s a dead end. At worst, it actively works against you. No page means no value, and a wasted crawl.
You can’t create the pages and just not index them
Let’s say you do create all the necessary subcategories:
Car Parts > Toyota > External Accessories > Windshield and Rear Window Accessories > Wiper Blades and Arms
But instead of indexing all of them, you set the lower-level pages to noindex.
On the surface, that feels like a reasonable compromise. You get structure and live links without having to manage thousands of category pages that need to be optimized, filled in with on-page SEO, populated with multiple products, etc.
The problem is what happens when a crawler hits that structure. When a search engine bot lands on a product page, it follows the breadcrumb links. If those breadcrumbs point to noindex pages, the crawler will:
- Follow the link
- Load the page
- See the noindex directive
- Drop it from consideration
That might not sound like a big deal once. But scale it. If this happens across tens of thousands of products, you’re forcing crawlers to repeatedly spend time on pages that you’ve explicitly told them not to index.
That does two things:
- Wastes crawl budget
- Breaks the flow of link authority
You’re essentially building a structure that leads search engines into dead ends, over and over again.
When deeper categories actually make sense
Now you know what you can’t do – so what can you do? There’s nothing wrong with building a deeper category structure. In fact, for large catalogs, it’s often necessary.
But there’s a practical filter:
If a category is important enough to include in a breadcrumb, it should be important enough to exist as a real page.
That means:
- It has enough products to be useful
- It represents a meaningful grouping
- It has some potential to capture search traffic, even if it’s long-tail
Not every subcategory meets that bar.
A better approach for large catalogs
When you’re dealing with thousands, hundreds of thousands or millions of products, you can’t treat every possible category as equal. You’ll burn through your (limited) crawl budget which is afforded to you by the search engines. You’ll waste that budget on unimportant pages, and you’ll confuse yourself and your shoppers in a maze of subcategories.
Instead:
- Build out core categories that actually matter
- Let breadcrumbs stop at a meaningful level
- Avoid linking to low-value or thin subcategories
Look again at our example from before:
Car Parts > Toyota > External Accessories > Windshield and Rear Window Accessories > Wiper Blades and Arms
Vs.
Home > Car Parts > Toyota > Exterior Accessories > Windshield Wipers
Or even
Home > Car Parts > Toyota > Exterior Accessories
If “Wiper Blades and Arms” doesn’t justify its own page, don’t force it into the structure.
The underlying issue: scale vs manageability
The root concern here wasn’t SEO theory. It was operational. Managing thousands of category pages is difficult:
- Content needs to be created
- Pages need to be optimized
- Structure needs to stay consistent
That’s a real constraint. But trying to avoid that work by creating “invisible” or non-functional categories creates bigger problems down the line. Instead, scale your breadcrumbs back to the last landing page that you’d pay to have people view – because that is essentially what you’re doing.
The takeaway
If you’re restructuring categories at scale, the rules are pretty simple:
- Don’t assign canonical categories that don’t exist
- Don’t breadcrumb to noindex pages
- Don’t create deep structures you’re not willing to support
And most importantly, if a category is important enough to be part of your site structure, it needs to function as a real page.
Otherwise, you’re not building structure. You’re creating noise that both users and search engines have to work around.