Automate the boring: How we maintain our catalog.

Automate the boring: How we maintain our catalog.

By m@|11/30/2025

I love digging up obscure facts about random towns or finding the weird history behind a local landmark—that’s the fun part of the job. But managing thousands of files? That’s the part that keeps you stuck at a desk, and I don't even have a desk.

We built the Image-Cataloger v3.0 so we can spend less time sorting and more time exploring. If you’re just starting out in tech support or engineering, here is a look at how we take a raw folder of chaos and turn it into a clean, searchable history.

1. Intake: Filtering the Noise

First, we have to make sure we aren't wasting time on stuff we already have. When a file hits the intake folder, the script runs an MD5 hash check. Think of this like a digital fingerprint—if the fingerprint matches something already in our catalog, we delete the new file immediately. No duplicates allowed.

If it’s unique, we rename it with a standardized SKU based on the date and convert it to a friendly format (like PNG or SVG) so it’s ready for the next step.

2. AI Enrichment: The Research Assistant

Next, we hand the file over to Vertex AI (Gemini). We aren't just asking it to "look" at the image; we're asking it to learn about it.

Take that "Monte Carlo" European Oval image for example:

Source: 3342511302714782.png

We send that to the AI with a specific list of questions (our Taxonomy). We ask it to tag the visual traits—like shape, color, and style. Simultaneously, we run a description prompt that acts as an OCR (Optical Character Recognition) scanner to read any text inside the image.

In the short term we will digitize audio descriptions for all of our stickers, so during intake we also ask the model to describe the image in a format that can easily be ingested into an audio description script, so someone struggling with vision can understand the content of a product that is entirely visual in nature.

3. Procedural Logic: The Safety Net

Here is where the system gets smart. AI is great, but sometimes it misses context.

Let's go back to that MC (Monaco) sticker. Sometimes, the AI gets so focused on reading the letters "MC" that it forgets to classify the shape properly.

To fix this, we wrote a "Reverse Lookup" rule. The script checks the description text. If the AI wrote down "Internal Metadata: UN Code MC" and sees the tag shape_oval, it has a lightbulb moment: "Hey, this must be part of the UN_OVAL Collection!" It then automatically slaps a project tag on it so our downstream scripts can publish it without us lifting a finger.

This brings us back to our team mantra: Let AI do the boring shit while we enjoy our jobs. It means our sticker makers can just drop their images into a shared folder and do absolutely no further cataloging or description work. Like a wise philosopher once said, it's on to the next one.

The resulting infographic was created by asking Gemini to build a slide based on the code I cut and pasted into the prompt, and dare I say it's pretty accurate.

4. Storage: The "Good Enough for Now" Solution

Currently, the script generates individual JSON manifest files for every image it processes. The plan is to have publisher scripts upload these to our Print on Demand (POD) provider and then delete the manifest to keep things tidy.

We know full well we'd be a lot better off throwing this up into a proper database to make us agnostic to whichever ecommerce or fulfillment provider we're using that week. But while we work on building that backend, we’re simply archiving the manifests to be imported later. It works, and it keeps the van moving.

5. The Searchable Brain (NotebookLM)

Finally, as of November 30, 2025, we update a single—and quickly growing—Markdown file for the catalog.

Why a text file? Because for some silly reason, NotebookLM doesn't let you import .json files natively yet. Until we build a better connector, we just dump the data into Markdown so our designers can use the AI to search through our catalog and figure out what kind of things to make next. It’s a hack, but it keeps the creative flow going.

6. Cost Analysis:

Over the course of the last year, we've categorized and sorted through more than images than we care to count just building and iterating on the script. In doing so, We've calculated that it costs us roughly 3 bucks to catalog 100k images. As part of this exercise, we asked Gemini to look at the code along with the above image of the Monte Carlo Oval, and return a graphic calculating how much this automation cost the bottom line:

Conclusions

In an effort to reduce the kind of administrative overhead that drains a creative person's life, we decided to simply automate the noise. Instead of filling out a form describing a product, guessing its print size, writing a technical description that varies in style from designer to designer, and then subjectively assigning tags, we established a single drop zone for any sticker one of our designers dreams up.

Now, a file just needs to be copied to one location and the system catalogs it, indexes it, stores it, assigns an asset tracking SKU, and makes it available for product creation within a few minutes from the time the file is pushed to the shared folder. In keeping with the our stated goal: We built this so the machine handles the boring jobs, while our people do the fun ones.

0 comments

Leave a comment

Please note, comments need to be approved before they are published.