Bank Parsers¶
The craft-easy-file-import package includes parsers for two Swedish/European bank formats (BgMax and SEPA camt.054) plus generic parsers for CSV, JSON, and positional flat files. Every parser returns strongly typed dataclasses that can be fed straight into the reconciliation or generic import pipelines.
BgMax — Swedish Bankgirot¶
BgMax is the fixed-width format Bankgirot uses to deliver incoming Swedish OCR and reference payments. It is a hierarchical format with numeric record type prefixes:
| Record | Purpose |
|---|---|
01 |
File header (file date, version) |
05 |
Payment section header (bankgiro number, currency) |
20 |
Payment detail (positive amount) |
25 |
Deduction / reversal (negative amount) |
30/40 |
Reference information (OCR, free-text) |
50/60 |
Deposit summary |
70 |
File footer (totals) |
Parsing a BgMax file¶
from craft_easy_file_import.parsers.banking import BgMaxParser
parser = BgMaxParser()
with open("incoming.bgm", "r", encoding="latin-1") as f:
result = parser.parse(f.read())
# Or directly from disk
result = parser.parse_file("incoming.bgm")
print(f"File date: {result.file_date}")
print(f"Bankgiro: {result.bankgiro_number}")
print(f"Total: {result.total_amount} SEK ({result.total_count} payments)")
for payment in result.payments:
print(
f" {payment.amount} SEK "
f"OCR={payment.reference} "
f"from={payment.sender_name} "
f"channel={payment.payment_channel}"
)
The BgMaxPayment dataclass¶
@dataclass
class BgMaxPayment:
amount: Decimal # SEK, negative for type-25 deductions
reference: str | None # OCR or free reference
sender_account: str | None
sender_name: str | None
payment_date: datetime | None
bank_reference: str | None
bankgiro_number: str | None
payment_channel: str | None # see table below
serial_number: int | None
line_number: int
payment_channel maps the BgMax channel code to a string:
| Code | Channel |
|---|---|
| 0 | unspecified |
| 1 | ocr |
| 2 | bankgiro_link |
| 3 | self_service |
| 4 | electronic |
| 5 | bankgiro_direct |
The BgMaxResult¶
@dataclass
class BgMaxResult:
payments: list[BgMaxPayment]
file_date: datetime | None
bankgiro_number: str | None
total_amount: Decimal
total_count: int
deposit_amount: Decimal
deposit_date: datetime | None
errors: list[str]
Parsing never throws on malformed lines — any issue is collected into result.errors so you can log it and continue processing good payments. Always check result.errors after parsing.
SEPA camt.054¶
SEPA camt.054 (ISO 20022 BankToCustomerDebitCreditNotification) is the XML format European banks use to notify customers of credits and debits. The parser handles multi-version namespaces (v2, v8, v9).
Parsing a SEPA file¶
from craft_easy_file_import.parsers.banking import SEPAParser
parser = SEPAParser()
result = parser.parse_file("camt.054.xml", encoding="utf-8")
print(f"Message ID: {result.message_id}")
print(f"Account: {result.account_iban}")
print(f"Total credit: {result.total_credit} debit: {result.total_debit}")
for payment in result.payments:
if payment.credit_debit == "CRDT":
print(
f" +{payment.amount} {payment.currency} "
f"ref={payment.reference} "
f"from={payment.sender_name} ({payment.sender_iban})"
)
The SEPAPayment dataclass¶
@dataclass
class SEPAPayment:
amount: Decimal
currency: str # e.g. "EUR"
reference: str | None
end_to_end_id: str | None # <EndToEndId>
sender_name: str | None
sender_iban: str | None
sender_bic: str | None
payment_date: datetime | None
booking_date: datetime | None
bank_reference: str | None
credit_debit: str # "CRDT" or "DBIT"
remittance_info: str | None
For reconciliation you usually want to filter for credits only (credit_debit == "CRDT") — the reconciliation pipeline does this for you automatically.
Multi-version namespace handling¶
The parser reads the XML namespace from the root element and matches it against known namespaces. Supported versions: camt.054.001.02, camt.054.001.08, camt.054.001.09. If a newer version arrives, the parser will fall back to a best-effort element lookup; anything that can't be parsed is collected in result.errors.
Generic parsers¶
Alongside the bank-specific parsers, the package includes three general-purpose file parsers.
CSV¶
from craft_easy_file_import import CSVParser, CSVParserConfig
from craft_easy_file_import.parsers.csv_parser import CSVDialect, CSVCoercion
parser = CSVParser(CSVParserConfig(
encoding="utf-8",
dialect=CSVDialect(
delimiter=";",
quotechar='"',
escapechar="\\",
skipinitialspace=True,
),
columns=None, # infer from header
coercions=[
CSVCoercion(column="amount", type="float", float_separator=","),
CSVCoercion(column="date", type="date", date_format="%Y-%m-%d"),
CSVCoercion(column="quantity", type="integer"),
],
))
rows = parser.parse_file("data.csv")
coercions apply per-column type conversions after parsing. A failed coercion is logged but does not abort the file.
JSON / JSONL¶
from craft_easy_file_import import JSONParser, JSONParserConfig
# Top-level array
parser = JSONParser(JSONParserConfig())
rows = parser.parse_file("data.json")
# Nested array — use dot notation
parser = JSONParser(JSONParserConfig(records_path="data.transactions"))
rows = parser.parse_file("response.json")
# JSON Lines (one object per line)
parser = JSONParser(JSONParserConfig(jsonl=True))
rows = parser.parse_file("events.jsonl")
Positional flat file¶
The FlatFileParser powers the BgMax parser and can be used directly for any fixed-width format with a known schema. You describe each record type (prefix, length, field positions) and the parser extracts hierarchical records:
from craft_easy_file_import import FlatFileParser, FlatFileConfig, PostType, PostField
config = FlatFileConfig(
encoding="utf-8",
post_types=[
PostType(
prefix="01",
length=80,
fields=[
PostField(name="record_id", start_position=1, end_position=3, type="string"),
PostField(name="file_date", start_position=3, end_position=11, type="date", date_format="date"),
],
child_prefixes=["05"],
),
PostType(
prefix="05",
length=80,
fields=[
PostField(name="record_id", start_position=1, end_position=3, type="string"),
PostField(name="amount", start_position=11, end_position=25, type="float", float_decimals=2),
],
),
],
)
parser = FlatFileParser(config)
records = parser.parse_file("file.dat")
for header in records:
print(header.fields["file_date"])
for child in header.children: # "05" records under this "01"
print(" ", child.fields["amount"])
PostField.type supports string, integer, float, date. date_format accepts shortcut strings ("date" → %Y%m%d, "time" → %H%M%S, "datetime" → %Y%m%d%H%M%S) or any custom strftime pattern.
Writing your own parser¶
For any format not covered out of the box (proprietary banking formats, SAP IDoc, EDI, XLSX), write a thin adapter. A parser is just a class with a parse() method that returns a list of dicts:
class MyFormatParser:
def __init__(self, config):
self.config = config
def parse(self, content: str) -> list[dict]:
rows = []
for line in content.splitlines():
rows.append(self._parse_line(line))
return rows
def parse_file(self, file_path: str) -> list[dict]:
with open(file_path, "r", encoding=self.config.encoding) as f:
return self.parse(f.read())
def _parse_line(self, line: str) -> dict:
...
Pass this to GenericImportPipeline(parser=MyFormatParser(...), on_row=...) and you're done. The pipeline expects only a parse(content) method — no base class, no registration, no plugin system.