Skip to content

GDPR

Craft Easy has first-class GDPR support built into the core framework. Fields are tagged as PII at model definition time, and the framework uses those tags to power:

  • Data portability — export everything a user has, across every collection
  • Right to erasure — depersonalise user data on every tagged field
  • Consent tracking — append-only log of consent decisions
  • GDPR filtering in BI exports — exclude, anonymise, or hash PII on the way out

Enable GDPR endpoints in settings.py:

GDPR_ENABLED=true

Tagging fields as PII

Every model marks its PII fields with json_schema_extra={"gdpr": True, "gdpr_category": "..."}. The gdpr_category decides how a field is depersonalised:

from beanie import Document
from pydantic import Field
from typing import Optional

class User(BaseDocument):
    name: str
    email: Optional[str] = Field(
        default=None,
        json_schema_extra={"gdpr": True, "gdpr_category": "email"},
    )
    phone: Optional[str] = Field(
        default=None,
        json_schema_extra={"gdpr": True, "gdpr_category": "phone"},
    )
    personal_id: Optional[str] = Field(
        default=None,
        json_schema_extra={"gdpr": True, "gdpr_category": "identity"},
    )
    free_notes: Optional[str] = Field(
        default=None,
        json_schema_extra={"gdpr": True, "gdpr_category": "free_text"},
    )

Supported categories

Category Depersonalisation value
identity "DEPERSONALIZED"
contact "***"
email "depersonalized@removed.invalid"
phone "+00000000000"
address "Address removed"
personal "DEPERSONALIZED"
free_text "[Content removed per GDPR]"

The default category (if only gdpr: True is set) is personal. Extend DEPERSONALIZATION_RULES in core/gdpr/service.py to add new categories.

Discovering tagged fields

Every BaseDocument subclass exposes:

User.gdpr_fields()
# → {"email": "email", "phone": "phone", "personal_id": "identity", "free_notes": "free_text"}

The GET /gdpr/schema endpoint walks every registered model and returns the same information as JSON so an admin UI can show the user what is stored about them.

Data portability

GET /gdpr/user-data/{user_id} returns every document tied to that user across every collection. The service iterates ALL_MODELS, finds documents with user_id == target, and collects them into a dict keyed by collection name.

curl "http://localhost:5001/gdpr/user-data/664a1234..." \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "user_id": "664a1234...",
  "collections": {
    "users": [{"id": "...", "name": "Alice", "email": "alice@example.com", ...}],
    "bookings": [{"id": "...", "user_id": "664a1234...", "location": "...", ...}],
    "payments": [{"id": "...", "user_id": "664a1234...", "amount": "199.00", ...}]
  }
}

Soft-deleted documents are excluded automatically.

Downloadable export

For the actual data portability deliverable (what you hand the data subject), use the export endpoint:

# JSON
curl "http://localhost:5001/gdpr/export/664a1234...?format=json" \
  -H "Authorization: Bearer $TOKEN" \
  -o export.json

# CSV (multi-section, one block per collection)
curl "http://localhost:5001/gdpr/export/664a1234...?format=csv" \
  -H "Authorization: Bearer $TOKEN" \
  -o export.csv

The JSON export includes an export_date timestamp and is the recommended format — GDPR requires a "commonly used, machine-readable format" and JSON satisfies that.

Right to erasure

POST /gdpr/depersonalize/{user_id} runs the depersonalisation workflow across every collection:

  1. Iterate ALL_MODELS.
  2. For every model with GDPR-tagged fields, find all documents for the user.
  3. Skip documents that already have is_depersonalized=True.
  4. Apply the DEPERSONALIZATION_RULES for each tagged field.
  5. Set is_depersonalized=True and depersonalized_at=now().
  6. Save the document.
  7. Write an audit entry with the operation depersonalize.
curl -X POST "http://localhost:5001/gdpr/depersonalize/664a1234..." \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "user_id": "664a1234...",
  "status": "depersonalized",
  "collections": {
    "users": {"documents_depersonalized": 1, "fields": ["email", "phone", "personal_id"]},
    "bookings": {"documents_depersonalized": 5, "fields": ["free_notes"]}
  }
}

Depersonalisation is destructive. The original values are overwritten in place — not hidden, not encrypted, not archived. A depersonalised document keeps its links (foreign keys, booking IDs) so the business record survives, but every PII field is gone.

Single-document depersonalisation

If you only need to depersonalise a specific document — for example, a single support ticket that contains PII by mistake — use:

curl -X POST "http://localhost:5001/gdpr/depersonalize/users/664a1234..." \
  -H "Authorization: Bearer $TOKEN"

Full erasure (depersonalise + disable + revoke consents)

For the full "right to be forgotten" flow, the erasure-execute endpoint cascades:

curl -X POST "http://localhost:5001/gdpr/erasure-execute/664a1234..." \
  -H "Authorization: Bearer $TOKEN"

This depersonalises every GDPR-tagged field, revokes all active consents for the user, disables the user account, and writes an audit entry per step. Use this when the legal process is complete and the user must be fully erased from the live system.

For a two-phase workflow where legal review has to happen first, call POST /gdpr/erasure-request/{user_id} to record the request, review it in the admin UI, and then call erasure-execute when approved.

Consent decisions are stored as ConsentRecord documents in an append-only log. Every grant and every revocation is a new record — nothing is updated or deleted, so you always have the full history.

class ConsentRecord(BaseDocument):
    user_id: PydanticObjectId
    consent_type: str                   # "data_processing" | "marketing" | "analytics" | "third_party"
    is_granted: bool
    granted_at: datetime | None
    revoked_at: datetime | None
    source: str                         # "app" | "portal" | "admin"
    ip_address: str | None
    version: str | None                 # Version of the consent text
curl -X POST "http://localhost:5001/gdpr/consent" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "user_id": "664a1234...",
    "consent_type": "marketing",
    "is_granted": true,
    "source": "app",
    "ip_address": "192.168.1.10",
    "version": "v1.0"
  }'
curl "http://localhost:5001/gdpr/consents/664a1234.../status" \
  -H "Authorization: Bearer $TOKEN"

Response:

{
  "user_id": "664a1234...",
  "consents": {
    "data_processing": {
      "granted": true,
      "last_updated": "2026-04-05T12:00:00+00:00",
      "source": "app",
      "version": "v1.0"
    },
    "marketing": {
      "granted": false,
      "last_updated": "2026-04-04T15:30:00+00:00",
      "source": "portal",
      "version": "v1.0"
    }
  }
}

The response reflects the most recent record per consent type — that's the current state.

An authenticated user can withdraw their own consent:

curl -X POST "http://localhost:5001/gdpr/consents/withdraw" \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"consent_type": "marketing", "source": "api"}'

Use require_consent() as a guard at the top of any endpoint that processes personal data:

from craft_easy.core.gdpr.service import require_consent

@router.get("/marketing/newsletter-signup")
async def signup(user_id: str = Depends(current_user_id)):
    await require_consent(user_id, "marketing")
    # If we're here, the user has granted marketing consent
    ...

If the user does not have active consent, the guard raises HTTPException(403) with a structured body:

{
  "error": "consent_required",
  "consent_type": "marketing",
  "message": "Active consent for 'marketing' is required."
}

GDPR filtering in BI exports

The BI export pipeline honours GDPR tags when exporting to external warehouses. Each BIExportConfig has a gdpr_mode:

Mode Effect on GDPR-tagged fields
exclude Removed from the exported row
anonymize Replaced with "***"
hash Replaced with the first 16 chars of the SHA-256 hash — deterministic, so joins still work without leaking values

See BI Export for the full flow. The key point is that you never have to rewrite your warehouse queries to comply with GDPR — the filtering happens before rows leave MongoDB.

BaseDocument flags

Two fields are inherited by every BaseDocument subclass and used by the GDPR subsystem:

Field Purpose
is_depersonalized Set to True after depersonalisation; used to skip already-processed documents
depersonalized_at Timestamp of the depersonalisation

Both are exclude=True in the Pydantic schema, so they never appear in API responses — they are internal bookkeeping.

Endpoint reference

Method Path Purpose
GET /gdpr/schema List every GDPR-tagged field across all models
GET /gdpr/user-data/{user_id} Collect all user data for portability
GET /gdpr/export/{user_id}?format=json\|csv Download data subject export
POST /gdpr/depersonalize/{user_id} Depersonalise all user data
POST /gdpr/depersonalize/{collection}/{item_id} Depersonalise a single document
POST /gdpr/erasure-request/{user_id} Record an erasure request (awaits approval)
POST /gdpr/erasure-execute/{user_id} Execute full erasure (depersonalise + revoke + disable)
GET /gdpr/consents/{user_id} List all consent records
GET /gdpr/consents/{user_id}/status Current consent state per type
GET /gdpr/consent-log/{user_id} Full consent history
POST /gdpr/consent Record a consent decision
POST /gdpr/consents/withdraw Withdraw consent (authenticated user)

All endpoints require authentication and produce Audit entries automatically.