Retrieve Invoice Data
/v1/retrieve
1 Credit
Extract structured invoice data from existing e-invoice documents. Returns JSON in the same format used by /v1/generate.
On This Page
Supported Input Formats
The API extracts invoice data from these e-invoice formats:
ZUGFeRD 2.x, Factur-X 1.x, XRechnung with PDF wrapper. Extracts XML attachment from PDF.
Standalone CII XML files (basis for ZUGFeRD/Factur-X).
UBL Invoice, XRechnung, Peppol BIS Billing 3.0 XML files.
Italian SDI e-invoicing format XML files.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
data_base64 |
string | Yes | Base64-encoded PDF or XML file content |
content_type |
string | No | application/pdf, application/xml, or text/xml. Auto-detected if omitted. |
include_source_xml |
boolean | No | If true, includes raw XML in response (default: false) |
Response Format
The response contains extracted invoice data in the same structure as /v1/generate input, enabling roundtrip workflows.
| Field | Type | Description |
|---|---|---|
valid |
boolean | true if extraction succeeded |
format |
object | Detected format info: detected_format, profile, version |
invoice |
object | Extracted invoice data (same structure as /v1/generate input) |
source_xml_base64 |
string | Raw XML (only if include_source_xml=true) |
errors |
array | List of extraction errors (if any) |
warnings |
array | List of non-fatal warnings |
Example Request
curl -X POST https://api.thelawin.dev/v1/retrieve \
-H "Authorization: Bearer your_api_key" \
-H "Content-Type: application/json" \
-d '{
"data_base64": "JVBERi0xLjQKJeLjz9MK...",
"content_type": "application/pdf",
"include_source_xml": false
}'
Example Response
{
"valid": true,
"format": {
"detected_format": "zugferd",
"profile": "EN16931",
"version": "2.3",
"xml_type": "CII",
"has_pdf": true
},
"invoice": {
"number": "2026-001",
"date": "2026-01-15",
"due_date": "2026-02-14",
"currency": "EUR",
"seller": {
"name": "Acme GmbH",
"street": "Hauptstraße 1",
"city": "Berlin",
"postal_code": "10115",
"country": "DE",
"vat_id": "DE123456789"
},
"buyer": {
"name": "Customer AG",
"street": "Marienplatz 1",
"city": "München",
"postal_code": "80331",
"country": "DE",
"vat_id": "DE987654321"
},
"items": [
{
"description": "Consulting Services",
"quantity": 10,
"unit": "HUR",
"unit_price": 150.00,
"vat_rate": 19.0
}
],
"payment": {
"iban": "DE89370400440532013000",
"bic": "COBADEFFXXX"
}
},
"transaction_id": "tx_abc123",
"errors": [],
"warnings": []
}
Roundtrip Workflow
The /v1/retrieve response uses the same invoice structure as /v1/generate input.
This enables powerful roundtrip workflows:
- Convert invoice format (e.g., ZUGFeRD to XRechnung)
- Apply a different PDF template to existing invoices
- Extract data for accounting system integration
- Validate and re-generate invoices with corrections
import requests
import base64
# Step 1: Extract data from existing invoice
with open("original.pdf", "rb") as f:
pdf_base64 = base64.b64encode(f.read()).decode()
retrieve_response = requests.post(
"https://api.thelawin.dev/v1/retrieve",
headers={"Authorization": "Bearer your_api_key"},
json={"data_base64": pdf_base64, "content_type": "application/pdf"}
).json()
# Step 2: Modify and regenerate with different format/template
invoice_data = retrieve_response["invoice"]
invoice_data["notes"] = "Regenerated invoice"
generate_response = requests.post(
"https://api.thelawin.dev/v1/generate",
headers={"Authorization": "Bearer your_api_key"},
json={
"format": "xrechnung", # Convert to XRechnung
"template": "compact", # Different template
"invoice": invoice_data
}
).json()
# Save new PDF
with open("converted.pdf", "wb") as f:
f.write(base64.b64decode(generate_response["pdf_base64"]))
Error Handling
When extraction fails, the response contains details about what went wrong:
{
"valid": false,
"format": {
"detected_format": "unknown",
"has_pdf": true
},
"invoice": null,
"errors": [
{
"code": "NO_EMBEDDED_XML",
"message": "PDF does not contain embedded e-invoice XML",
"severity": "error"
}
],
"warnings": []
}
Common Error Codes
| Code | Description |
|---|---|
NO_EMBEDDED_XML |
PDF doesn't contain e-invoice XML attachment (plain PDF) |
INVALID_XML |
XML is malformed or not a valid invoice format |
UNSUPPORTED_FORMAT |
XML format is not recognized/supported |
SCHEMA_ERROR |
XML doesn't conform to expected schema |
MISSING_FIELD |
Required invoice field is missing in source |