Skip to content

Bank Parsers

The craft-easy-file-import package includes parsers for two Swedish/European bank formats (BgMax and SEPA camt.054) plus generic parsers for CSV, JSON, and positional flat files. Every parser returns strongly typed dataclasses that can be fed straight into the reconciliation or generic import pipelines.

BgMax — Swedish Bankgirot

BgMax is the fixed-width format Bankgirot uses to deliver incoming Swedish OCR and reference payments. It is a hierarchical format with numeric record type prefixes:

Record Purpose
01 File header (file date, version)
05 Payment section header (bankgiro number, currency)
20 Payment detail (positive amount)
25 Deduction / reversal (negative amount)
30/40 Reference information (OCR, free-text)
50/60 Deposit summary
70 File footer (totals)

Parsing a BgMax file

from craft_easy_file_import.parsers.banking import BgMaxParser

parser = BgMaxParser()

with open("incoming.bgm", "r", encoding="latin-1") as f:
    result = parser.parse(f.read())

# Or directly from disk
result = parser.parse_file("incoming.bgm")

print(f"File date: {result.file_date}")
print(f"Bankgiro: {result.bankgiro_number}")
print(f"Total: {result.total_amount} SEK ({result.total_count} payments)")

for payment in result.payments:
    print(
        f"  {payment.amount} SEK  "
        f"OCR={payment.reference}  "
        f"from={payment.sender_name}  "
        f"channel={payment.payment_channel}"
    )

The BgMaxPayment dataclass

@dataclass
class BgMaxPayment:
    amount: Decimal                         # SEK, negative for type-25 deductions
    reference: str | None                   # OCR or free reference
    sender_account: str | None
    sender_name: str | None
    payment_date: datetime | None
    bank_reference: str | None
    bankgiro_number: str | None
    payment_channel: str | None             # see table below
    serial_number: int | None
    line_number: int

payment_channel maps the BgMax channel code to a string:

Code Channel
0 unspecified
1 ocr
2 bankgiro_link
3 self_service
4 electronic
5 bankgiro_direct

The BgMaxResult

@dataclass
class BgMaxResult:
    payments: list[BgMaxPayment]
    file_date: datetime | None
    bankgiro_number: str | None
    total_amount: Decimal
    total_count: int
    deposit_amount: Decimal
    deposit_date: datetime | None
    errors: list[str]

Parsing never throws on malformed lines — any issue is collected into result.errors so you can log it and continue processing good payments. Always check result.errors after parsing.

SEPA camt.054

SEPA camt.054 (ISO 20022 BankToCustomerDebitCreditNotification) is the XML format European banks use to notify customers of credits and debits. The parser handles multi-version namespaces (v2, v8, v9).

Parsing a SEPA file

from craft_easy_file_import.parsers.banking import SEPAParser

parser = SEPAParser()
result = parser.parse_file("camt.054.xml", encoding="utf-8")

print(f"Message ID: {result.message_id}")
print(f"Account: {result.account_iban}")
print(f"Total credit: {result.total_credit}  debit: {result.total_debit}")

for payment in result.payments:
    if payment.credit_debit == "CRDT":
        print(
            f"  +{payment.amount} {payment.currency}  "
            f"ref={payment.reference}  "
            f"from={payment.sender_name} ({payment.sender_iban})"
        )

The SEPAPayment dataclass

@dataclass
class SEPAPayment:
    amount: Decimal
    currency: str                           # e.g. "EUR"
    reference: str | None
    end_to_end_id: str | None               # <EndToEndId>
    sender_name: str | None
    sender_iban: str | None
    sender_bic: str | None
    payment_date: datetime | None
    booking_date: datetime | None
    bank_reference: str | None
    credit_debit: str                       # "CRDT" or "DBIT"
    remittance_info: str | None

For reconciliation you usually want to filter for credits only (credit_debit == "CRDT") — the reconciliation pipeline does this for you automatically.

Multi-version namespace handling

The parser reads the XML namespace from the root element and matches it against known namespaces. Supported versions: camt.054.001.02, camt.054.001.08, camt.054.001.09. If a newer version arrives, the parser will fall back to a best-effort element lookup; anything that can't be parsed is collected in result.errors.

Generic parsers

Alongside the bank-specific parsers, the package includes three general-purpose file parsers.

CSV

from craft_easy_file_import import CSVParser, CSVParserConfig
from craft_easy_file_import.parsers.csv_parser import CSVDialect, CSVCoercion

parser = CSVParser(CSVParserConfig(
    encoding="utf-8",
    dialect=CSVDialect(
        delimiter=";",
        quotechar='"',
        escapechar="\\",
        skipinitialspace=True,
    ),
    columns=None,                                    # infer from header
    coercions=[
        CSVCoercion(column="amount", type="float", float_separator=","),
        CSVCoercion(column="date", type="date", date_format="%Y-%m-%d"),
        CSVCoercion(column="quantity", type="integer"),
    ],
))

rows = parser.parse_file("data.csv")

coercions apply per-column type conversions after parsing. A failed coercion is logged but does not abort the file.

JSON / JSONL

from craft_easy_file_import import JSONParser, JSONParserConfig

# Top-level array
parser = JSONParser(JSONParserConfig())
rows = parser.parse_file("data.json")

# Nested array — use dot notation
parser = JSONParser(JSONParserConfig(records_path="data.transactions"))
rows = parser.parse_file("response.json")

# JSON Lines (one object per line)
parser = JSONParser(JSONParserConfig(jsonl=True))
rows = parser.parse_file("events.jsonl")

Positional flat file

The FlatFileParser powers the BgMax parser and can be used directly for any fixed-width format with a known schema. You describe each record type (prefix, length, field positions) and the parser extracts hierarchical records:

from craft_easy_file_import import FlatFileParser, FlatFileConfig, PostType, PostField

config = FlatFileConfig(
    encoding="utf-8",
    post_types=[
        PostType(
            prefix="01",
            length=80,
            fields=[
                PostField(name="record_id", start_position=1, end_position=3, type="string"),
                PostField(name="file_date", start_position=3, end_position=11, type="date", date_format="date"),
            ],
            child_prefixes=["05"],
        ),
        PostType(
            prefix="05",
            length=80,
            fields=[
                PostField(name="record_id", start_position=1, end_position=3, type="string"),
                PostField(name="amount", start_position=11, end_position=25, type="float", float_decimals=2),
            ],
        ),
    ],
)
parser = FlatFileParser(config)
records = parser.parse_file("file.dat")

for header in records:
    print(header.fields["file_date"])
    for child in header.children:      # "05" records under this "01"
        print("  ", child.fields["amount"])

PostField.type supports string, integer, float, date. date_format accepts shortcut strings ("date"%Y%m%d, "time"%H%M%S, "datetime"%Y%m%d%H%M%S) or any custom strftime pattern.

Writing your own parser

For any format not covered out of the box (proprietary banking formats, SAP IDoc, EDI, XLSX), write a thin adapter. A parser is just a class with a parse() method that returns a list of dicts:

class MyFormatParser:
    def __init__(self, config):
        self.config = config

    def parse(self, content: str) -> list[dict]:
        rows = []
        for line in content.splitlines():
            rows.append(self._parse_line(line))
        return rows

    def parse_file(self, file_path: str) -> list[dict]:
        with open(file_path, "r", encoding=self.config.encoding) as f:
            return self.parse(f.read())

    def _parse_line(self, line: str) -> dict:
        ...

Pass this to GenericImportPipeline(parser=MyFormatParser(...), on_row=...) and you're done. The pipeline expects only a parse(content) method — no base class, no registration, no plugin system.