Retrieve Invoice Data

POST /v1/retrieve 1 Credit

Extract structured invoice data from existing e-invoice documents. Returns JSON in the same format used by /v1/generate.

Important: This is metadata extraction from valid e-invoice formats with embedded XML, not OCR. Plain PDFs without embedded e-invoice XML are not supported. The endpoint reads structured machine-readable data only.

Supported Input Formats

The API extracts invoice data from these e-invoice formats:

PDF
PDF/A-3 with embedded XML

ZUGFeRD 2.x, Factur-X 1.x, XRechnung with PDF wrapper. Extracts XML attachment from PDF.

CII XML
UN/CEFACT Cross-Industry Invoice

Standalone CII XML files (basis for ZUGFeRD/Factur-X).

UBL XML
OASIS UBL 2.1 Invoice

UBL Invoice, XRechnung, Peppol BIS Billing 3.0 XML files.

FatturaPA
Italian FatturaPA 1.2.1

Italian SDI e-invoicing format XML files.

Request Body

Parameter Type Required Description
data_base64 string Yes Base64-encoded PDF or XML file content
content_type string No application/pdf, application/xml, or text/xml. Auto-detected if omitted.
include_source_xml boolean No If true, includes raw XML in response (default: false)

Response Format

The response contains extracted invoice data in the same structure as /v1/generate input, enabling roundtrip workflows.

Field Type Description
valid boolean true if extraction succeeded
format object Detected format info: detected_format, profile, version
invoice object Extracted invoice data (same structure as /v1/generate input)
source_xml_base64 string Raw XML (only if include_source_xml=true)
errors array List of extraction errors (if any)
warnings array List of non-fatal warnings

Example Request

cURL
curl -X POST https://api.thelawin.dev/v1/retrieve \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "data_base64": "JVBERi0xLjQKJeLjz9MK...",
    "content_type": "application/pdf",
    "include_source_xml": false
  }'

Example Response

JSON Response
{
  "valid": true,
  "format": {
    "detected_format": "zugferd",
    "profile": "EN16931",
    "version": "2.3",
    "xml_type": "CII",
    "has_pdf": true
  },
  "invoice": {
    "number": "2026-001",
    "date": "2026-01-15",
    "due_date": "2026-02-14",
    "currency": "EUR",
    "seller": {
      "name": "Acme GmbH",
      "street": "Hauptstraße 1",
      "city": "Berlin",
      "postal_code": "10115",
      "country": "DE",
      "vat_id": "DE123456789"
    },
    "buyer": {
      "name": "Customer AG",
      "street": "Marienplatz 1",
      "city": "München",
      "postal_code": "80331",
      "country": "DE",
      "vat_id": "DE987654321"
    },
    "items": [
      {
        "description": "Consulting Services",
        "quantity": 10,
        "unit": "HUR",
        "unit_price": 150.00,
        "vat_rate": 19.0
      }
    ],
    "payment": {
      "iban": "DE89370400440532013000",
      "bic": "COBADEFFXXX"
    }
  },
  "transaction_id": "tx_abc123",
  "errors": [],
  "warnings": []
}

Roundtrip Workflow

The /v1/retrieve response uses the same invoice structure as /v1/generate input. This enables powerful roundtrip workflows:

Use Cases:
  • Convert invoice format (e.g., ZUGFeRD to XRechnung)
  • Apply a different PDF template to existing invoices
  • Extract data for accounting system integration
  • Validate and re-generate invoices with corrections
Roundtrip Example (Python)
import requests
import base64

# Step 1: Extract data from existing invoice
with open("original.pdf", "rb") as f:
    pdf_base64 = base64.b64encode(f.read()).decode()

retrieve_response = requests.post(
    "https://api.thelawin.dev/v1/retrieve",
    headers={"Authorization": "Bearer your_api_key"},
    json={"data_base64": pdf_base64, "content_type": "application/pdf"}
).json()

# Step 2: Modify and regenerate with different format/template
invoice_data = retrieve_response["invoice"]
invoice_data["notes"] = "Regenerated invoice"

generate_response = requests.post(
    "https://api.thelawin.dev/v1/generate",
    headers={"Authorization": "Bearer your_api_key"},
    json={
        "format": "xrechnung",  # Convert to XRechnung
        "template": "compact",   # Different template
        "invoice": invoice_data
    }
).json()

# Save new PDF
with open("converted.pdf", "wb") as f:
    f.write(base64.b64decode(generate_response["pdf_base64"]))

Error Handling

When extraction fails, the response contains details about what went wrong:

Error Response
{
  "valid": false,
  "format": {
    "detected_format": "unknown",
    "has_pdf": true
  },
  "invoice": null,
  "errors": [
    {
      "code": "NO_EMBEDDED_XML",
      "message": "PDF does not contain embedded e-invoice XML",
      "severity": "error"
    }
  ],
  "warnings": []
}

Common Error Codes

Code Description
NO_EMBEDDED_XML PDF doesn't contain e-invoice XML attachment (plain PDF)
INVALID_XML XML is malformed or not a valid invoice format
UNSUPPORTED_FORMAT XML format is not recognized/supported
SCHEMA_ERROR XML doesn't conform to expected schema
MISSING_FIELD Required invoice field is missing in source
Need OCR? This API extracts structured metadata from valid e-invoice formats. For plain PDFs without embedded XML, you'll need an OCR service. We recommend exploring dedicated invoice OCR APIs if that's your use case.