Tagging System¶

Tags are a lightweight way to label any document with a bag of strings, then filter, group, and report on those labels later. Unlike features — which grant capabilities — tags describe attributes: where a property is located, what specialty a clinic offers, what risk tier a customer falls into. Craft Easy's tag system supports hierarchical tags with a materialized path, per-tenant category namespaces, and three query operators that compose naturally with ordinary CRUD filters.

The model¶

craft_easy.models.tag.Tag is a tenant-scoped document. Every tenant has its own tag hierarchy; tags do not leak between tenants.

Field	Type	Purpose
`name`	string, indexed	Display name
`category`	string, indexed	Namespace — e.g. `"geography"`, `"specialty"`, `"zone_type"`
`description`	string	Optional description
`icon`	string	Optional icon identifier for UI
`color`	string	Optional color for UI chips
`sort_order`	int	Display order within a category
`parent_id`	ObjectId \| null	Parent tag, for hierarchical categories
`parent_name`	string, readonly	Cached parent name, kept in sync via cascade
`path`	string, readonly	Materialized path using `/` separator
`depth`	int, readonly	Depth in the category tree, `0` = root

Cascade on parent_id is on_delete="null" — deleting a parent orphans its children (their parent_id becomes null and they become new roots) rather than deleting them. That is the safer default for tags, which are often referenced from documents; hard-deleting a subtree could leave dangling references in tagged documents.

Indexes live on (category, tenant_id), parent_id, and path.

Categories¶

A category is a namespace within a tenant. Each category is an independent tree: the geography tags and the specialty tags do not know about each other, even though they share the same collection. Categories exist to keep different tag dimensions visually separate in admin UIs and to scope queries.

geography               specialty               risk_tier
├── Sweden              ├── Ophthalmology       ├── High
│   ├── Stockholm       ├── Cardiology          ├── Medium
│   │   └── Södermalm   └── Neurology           └── Low
│   └── Göteborg
└── Norway
    └── Oslo

Convention: use snake_case for categories and keep them flat (no hierarchy of categories). Within a category, trees can be arbitrarily deep.

Hierarchy with materialized path¶

Like tenants and org nodes, tags carry a materialized path. The path uses / as a separator and contains the IDs of every ancestor, so that Södermalm under Stockholm under Sweden has a path like /65ab.../65c2....

Two things follow from this:

Ancestor lookup is O(1). Split the path by /, parse each segment, done. No recursive query.
Descendant lookup is a single regex. Match on path starting with the ancestor's path. Combined with the path index, this finds every descendant in one query regardless of tree depth.

core/tags/hierarchy.py provides the helpers:

from craft_easy.core.tags.hierarchy import resolve_tag_hierarchy, get_descendant_slugs

# Expand ["stockholm"] to include every descendant ("stockholm", "sodermalm", ...)
expanded = await resolve_tag_hierarchy(["stockholm"], tenant_id=caller.tenant_id)

# Get just the descendants of one tag
descendants = await get_descendant_slugs(tag, tenant_id=caller.tenant_id)

These functions exist because tag filters against documents are usually inclusive of descendants: filtering clinics tagged stockholm should also return clinics tagged sodermalm, without the caller having to enumerate every sub-area manually.

Routes¶

Standard CRUD plus a handful of tree helpers:

GET    /tags                       # list (category optional)
POST   /tags                       # create
GET    /tags/{tag_id}              # fetch
PATCH  /tags/{tag_id}              # update (recomputes path if parent_id changes)
DELETE /tags/{tag_id}              # soft delete (children orphaned)

GET    /tags/by-category/{category}      # flat list, sorted by sort_order
GET    /tags/tree/{category}             # nested tree
GET    /tags/stats/{model_name}          # usage statistics per tag

Create hooks (calculate_path_on_create) auto-populate path, depth, and parent_name on insert. Update hooks (calculate_path_on_update) recalculate the same fields when parent_id changes, and cascade to children if the parent's path itself moved.

Tagging a document¶

Add a tags: list[str] field to any tenant-scoped model:

class Property(Document):
    name: str
    tenant_id: PydanticObjectId
    tags: list[str] = []

    class TenantConfig:
        tenant_scoped = True

The field holds tag slugs (the lowercase, URL-safe form of the name). No foreign key: tags are looked up by slug at query time. That keeps reads fast and allows a document to carry an arbitrary number of tags without a join.

Filtering by tag¶

Three query-parameter operators, provided by core/tags/filtering.py:

Parameter	MongoDB	Behavior
`?tags__in=a,b,c`	`{"$in": [...]}`	Match documents tagged with any of `a`, `b`, `c` (OR)
`?tags__all=a,b,c`	`{"$all": [...]}`	Match documents tagged with all of `a`, `b`, `c` (AND)
`?tags__nin=a,b,c`	`{"$nin": [...]}`	Match documents not tagged with any of them

Examples:

# Properties in Stockholm or Gothenburg
GET /properties?tags__in=stockholm,gothenburg

# Clinics that are both ophthalmology and pediatric
GET /clinics?tags__all=ophthalmology,pediatric

# Everything except construction zones
GET /properties?tags__nin=construction-zone

The three operators can be combined with any other filter (date ranges, name searches, ETag updates) because they are just ordinary query parameters processed by the CRUD layer.

Including descendants automatically¶

resolve_tag_hierarchy can be wired into a route to expand tags__in to include children. That lets a caller filter by stockholm and transparently get everything under stockholm without enumerating sub-tags. Whether to auto-expand is a per-resource decision — for reports it usually makes sense, for exact-match admin lookups it usually does not.

Statistics¶

GET /tags/stats/{model_name}?category=... returns tag usage counts for a given model:

GET /tags/stats/properties?category=zone_type

{
  "model": "properties",
  "category": "zone_type",
  "stats": [
    {"tag": "airport", "count": 47},
    {"tag": "residential", "count": 23},
    {"tag": "commercial", "count": 8}
  ]
}

Internally, core/tags/stats.get_tag_statistics(model, tenant_id, category) runs a single aggregation pipeline that unwinds the tags array, groups by tag value, and counts — all scoped to the caller's tenant. The result is sorted by count descending, so the most-used tag is first.

Common use: a reports dashboard that draws a bar chart of tag usage per model, or a tag-management UI that shows how many documents would be affected by deleting a tag.

Design notes¶

Strings, not foreign keys. Storing tags: list[str] rather than tag_ids: list[ObjectId] means documents are denormalized — renaming a tag requires rewriting every document that references it by slug. The Tag model itself stores the canonical name and slug; the stats endpoint and the hierarchy helpers both work against slugs. If your workload renames tags frequently, add a background job to rewrite denormalized references on tag updates.
Per-tenant hierarchies. Every tenant gets a fresh tag tree. Do not share slugs across tenants — two tenants can both have a tag called stockholm and they will never conflict, but that also means each tenant has to populate its own baseline set on provisioning.
Tags are not features. A feature is a grant of capability checked by a guard. A tag is a descriptor stored on a document. Do not reach for tags when you mean "can this user do X" — use a feature and put it on an agreement or an access group.

See Tenant Isolation for how tag queries pick up tenant scoping automatically, and Feature Guards for the contrast with capability-based access.