Python
Flask
AI
HuggingFace
Digital Asset Management
Azure
Side Project

Building SmartDAM: An AI-Powered Digital Asset Manager for Food Photography

Mouhssine Lakhili profile
Mouhssine Lakhili
March 27, 20268 min read

How I built SmartDAM — a Flask app that auto-analyzes food images via HuggingFace, generates multilingual tags, supports Azure Blob Storage, and delivers real-time search.

Building SmartDAM: An AI-Powered Digital Asset Manager for Food Photography
Article cover: Case study of SmartDAM, an AI-powered digital asset manager built by Mouhssine Lakhili.

Why I Built SmartDAM

Managing food photography for a product demo should be simple. In practice it never is.

You end up with hundreds of images spread across folders, cloud buckets, and Slack threads. Nobody knows which photo shows a burger with lettuce, which one is portrait-oriented, and which one was already used in the last campaign.

I built SmartDAM to solve exactly that: a digital asset manager tailored for food photography that uses AI to automatically analyze, tag, and make images discoverable — without any manual labeling.

The result is a Flask web application that lets you import images, auto-classify them via the HuggingFace Inference API, store them locally or in Azure Blob Storage, and find anything in milliseconds through real-time search and smart filters.

The Core Problem: Manual Tagging at Scale

The obvious solution to image discovery is tags. The painful part is keeping them consistent and complete across hundreds of assets.

Human taggers are slow, inconsistent, and expensive to scale. The tags they produce reflect their vocabulary, not yours. And the moment a new category of images appears — say, "outdoor brunch scenes" — you need to re-tag your entire library.

My answer: skip manual tagging entirely. Let a vision model see every image and produce structured labels automatically.

Architecture Overview

SmartDAM is intentionally simple. The stack is:

LayerTechnology
Web frameworkPython / Flask
DatabaseSQLite via direct queries
Image processingPillow
AI classificationHuggingFace Inference API
StorageLocal filesystem or Azure Blob Storage
FrontendVanilla JS + Bootstrap 5 + Jinja2 templates

No ORM. No React. No Webpack. The whole point was to ship something fast and keep every dependency purposeful.

Image Import Pipeline

When a user uploads one or more images, the app runs them through a four-step pipeline:

1. Validation

Before touching storage, every file goes through Pillow to confirm it is a real, readable image. Corrupted uploads, zero-byte files, and disguised non-images are rejected immediately with a per-file status message.

from PIL import Image

def validate_image(file_path: str) -> bool:
    try:
        with Image.open(file_path) as img:
            img.verify()
        return True
    except Exception:
        return False

This matters more than it sounds. Without it, a bad upload silently pollutes your database with broken records.

2. Thumbnail Generation

A server-side thumbnail (320×240) is generated for every image at import time. Gallery pages load thumbnails, not full-resolution files. This keeps the UI fast regardless of how large the originals are.

3. Storage

Images can go to the local filesystem or to an Azure Blob Storage container — the app reads STORAGE_BACKEND from the environment and routes accordingly. The abstraction is a small StorageBackend class with save() and url_for() methods, so swapping backends requires zero changes to the import pipeline.

4. AI Analysis

The last step calls the HuggingFace Inference API for image classification and object detection. This is where the intelligence lives.

import requests

HF_API_URL = "https://api-inference.huggingface.co/models/google/vit-base-patch16-224"

def classify_image(image_bytes: bytes, api_key: str) -> list[dict]:
    headers = {"Authorization": f"Bearer {api_key}"}
    response = requests.post(HF_API_URL, headers=headers, data=image_bytes)
    response.raise_for_status()
    return response.json()  # [{"label": "...", "score": 0.95}, ...]

The API returns a ranked list of labels. SmartDAM takes the top results, translates them to French (since the target users are French-speaking), and stores them as the image's tags. If the API is unavailable, the app falls back gracefully — the image is still saved and can be re-analyzed later with a single click.

Person Detection

A separate model run checks whether any people appear in the image. This populates the has_person boolean in the database, enabling a dedicated "people" filter in the gallery.

Database Design

SQLite keeps things simple. The core table:

CREATE TABLE images (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    filename    TEXT NOT NULL,
    original_filename TEXT,
    storage_url TEXT NOT NULL,
    thumbnail_url TEXT,
    tags        TEXT,           -- JSON array of strings
    food_category TEXT,
    environment TEXT,           -- indoor / outdoor
    orientation TEXT,           -- portrait / landscape / square
    has_person  INTEGER DEFAULT 0,
    is_favorite INTEGER DEFAULT 0,
    analyzed_at DATETIME,
    created_at  DATETIME DEFAULT CURRENT_TIMESTAMP
);

Tags are stored as a JSON array in a TEXT column. For the scale SmartDAM targets (hundreds to low thousands of images), this is perfectly fine and avoids the complexity of a join table. Full-text search runs on the in-memory tag string, not a separate index.

Real-Time Search and Filtering

The gallery uses client-side filtering with a 400ms debounce to avoid hammering the server on every keystroke. On input, a fetch call hits /api/images?q=...&filters=..., and the JS re-renders only the cards that match.

let debounceTimer;

searchInput.addEventListener("input", () => {
  clearTimeout(debounceTimer);
  debounceTimer = setTimeout(() => {
    fetchAndRender({ q: searchInput.value, ...activeFilters });
  }, 400);
});

The filters stack: you can search for "salad" while also filtering by landscape orientation, outdoor environment, and no people. Each filter is an independent AND condition.

Search term highlighting is done on the client. When results come back, any matching substring in the card title or tag list gets wrapped in a <mark> element without touching the server response.

Frequent Tags Bar

A sticky row above the gallery shows the top 10 most-used tags as clickable chips. Clicking a chip sets it as the active search query instantly — useful for navigating by category without typing.

XSS Safety

User-supplied data (original filenames, tags translated from an external API) renders in the gallery cards. All output goes through Jinja2's auto-escaping, and the client-side highlight function uses textContent instead of innerHTML when inserting matched fragments. No dangerouslySetInnerHTML equivalent anywhere.

Light / Dark Theme

Theme preference is stored in localStorage and applied before the first paint via a small inline script in <head>. This prevents the flash-of-wrong-theme that plagues CSS-variable-only implementations.

<script>
  const theme = localStorage.getItem("theme") || "light";
  document.documentElement.setAttribute("data-theme", theme);
</script>

Bootstrap 5's data-bs-theme attribute does the rest.

Favorites

Each image can be marked as a favorite. The state is a is_favorite boolean on the database row, toggled by a PATCH /api/images/:id/favorite endpoint. The gallery has a "Favorites only" filter that uses it.

Lessons Learned

1. Validate before storing

The pipeline order matters. Validate → thumbnail → store → analyze. If you store first and validate second, you end up cleaning up broken files from your storage backend on every bad upload.

2. Graceful AI degradation is non-negotiable

The HuggingFace API has rate limits and occasional downtime. Every import path needs to work without it. Storing images without tags and surfacing a "re-analyze" button later is far better than blocking uploads on an external API call.

3. Keep the database schema simple until you need complexity

A JSON column for tags avoids a join table and three extra queries per gallery load. SQLite's json_each() function can query into it if you ever need to. Start simple.

4. Debounce is not optional

Without the 400ms debounce, a fast typist triggers 10–15 fetch requests per second. The gallery would flicker and the server would be doing redundant work constantly. Debounce costs two lines of code and saves significant pain.

5. Thumbnails at import time, not at render time

Generating thumbnails on-demand on the first gallery load is tempting (no extra storage) but it blocks the first request and creates thundering-herd problems if many images are requested simultaneously. Generate once at import and serve statically.

What's Next

SmartDAM is currently a single-user app. The obvious next steps are:

  • Authentication — user accounts so multiple people can share the same library without stepping on each other.
  • Batch re-analysis — re-run the HuggingFace models on all unanalyzed images in a background worker.
  • Better model selection — the current ViT-base model is good for general classification but a fine-tuned food-specific model would produce more useful tags.
  • Production deployment — a Dockerfile and a basic CI pipeline would let this run in any cloud environment in minutes.

Conclusion

SmartDAM took about a week of focused evenings to reach a usable state. The combination of Flask's simplicity, Pillow's reliability, and the HuggingFace Inference API's breadth makes for a surprisingly capable stack for a visual AI tool.

The source code is on GitHub at github.com/LMouhssine/SmartDAM. If you build something on top of it or run into issues, open an issue — I read them.

Share this article

Related articles