> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vlm.run/llms.txt
> Use this file to discover all available pages before exploring further.

# MarkdownPage

> A visual guide to the MarkdownPage schema used for document extraction and processing.

The `MarkdownDocument` schema is the cornerstone of VLM Run's document processing system, providing a standardized, machine-readable representation of complex documents. This technical reference guide details the schema's architecture, components, and implementation patterns.

## `MarkdownDocument` Data Model

The `MarkdownDocument` schema addresses the fundamental challenges in document processing:

1. **Structural Preservation**: Maintains document hierarchy and relationships
2. **Content Extraction**: Handles mixed content types (text, tables, figures, code)
3. **Spatial Understanding**: Preserves layout and positioning information
4. **Data Integrity**: Ensures accurate representation of structured elements
5. **Extensibility**: Supports custom annotations and metadata

### 1. `MarkdownPage`

A `MarkdownDocument` is a list of `MarkdownPage` objects, each representing a page in the document.

<div className="mermaid" style={{ width: '100%', margin: '0 auto 2rem auto' }}>
  ```mermaid theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  classDiagram
      class MarkdownPage {
          PageMetadata metadata
          List~Table~ tables
          List~Figure~ figures
          String content
      }

      class PageMetadata {
          +String language
          +Integer page_number
      }

      class Table {
          TableMetadata metadata
          List~TableHeader~ headers
          List~TableRowDict~ data
          BoxCoords bbox
      }

      class Figure {
          String id
          String title
          String caption
          BoxCoords bbox
      }

      MarkdownPage "1" *-- "1" PageMetadata : has
      MarkdownPage "1" *-- "*" Table : has
      MarkdownPage "1" *-- "*" Figure : has

      %% Add rounded edges styling
      classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px,rx:10,ry:10
      classDef relationship fill:none,stroke:#666,stroke-width:1px
  ```
</div>

<Tip>
  Here's an alternative way to visualize the `MarkdownPage` schema:

  <Accordion title="Tabular Representation of `MarkdownPage`">
    | Component            | Field              | Type                 | Description                             |
    | -------------------- | ------------------ | -------------------- | --------------------------------------- |
    | **MarkdownDocument** |                    |                      |                                         |
    |                      | `pages`            | `List[MarkdownPage]` | Pages in the document                   |
    | **MarkdownPage**     |                    |                      |                                         |
    |                      | `metadata`         | `PageMetadata`       | Metadata of the page                    |
    |                      | `tables`           | `List[Table]`        | Tables in the page                      |
    |                      | `figures`          | `List[Figure]`       | Figures in the page                     |
    |                      | `content`          | `str`                | Content of the page                     |
    | **PageMetadata**     |                    |                      |                                         |
    |                      | `language`         | `str`                | Language of the document                |
    |                      | `page_number`      | `int`                | Page number of the document (0-indexed) |
    | **Table**            |                    |                      |                                         |
    |                      | `metadata.title`   | `str`                | Title of the table                      |
    |                      | `metadata.caption` | `str`                | Caption of the table                    |
    |                      | `metadata.notes`   | `str`                | Notes about the table                   |
    |                      | `headers.id`       | `str`                | Unique identifier for the header        |
    |                      | `headers.column`   | `int`                | Column index of the header              |
    |                      | `headers.name`     | `str`                | Name of the header                      |
    |                      | `headers.dtype`    | `str`                | Data type of the header                 |
    |                      | `data.*`           | `dict[str, Any]`     | Maps column header ids to values        |
    |                      | `bbox`             | `BoxCoords`          | Bounding box of the table               |
    | **Figure**           |                    |                      |                                         |
    |                      | `id`               | `int`                | Unique identifier for the figure        |
    |                      | `title`            | `str`                | Title of the figure                     |
    |                      | `caption`          | `str`                | Caption of the figure                   |
    |                      | `bbox`             | `BoxCoords`          | Bounding box of the figure              |

    ***
  </Accordion>
</Tip>

### 2. `MarkdownTable`

Tables are represented with a `<Table id="tb-{id}"/>` tag in the markdown content, with the actual table content stored in the `tables` list. This allows for rich representation of table's data while maintaining the document's flow.

<div className="mermaid" style={{ width: '100%', margin: '0 auto 2rem auto' }}>
  ```mermaid theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  classDiagram
    class Table {
        TableMetadata metadata
        List~TableHeader~ headers
        List~TableRowDict~ data
        BoxCoords bbox
    }

    class TableMetadata {
        String title
        String caption
        String notes
    }

    class TableHeader {
        String id
        Integer column
        String name
        String dtype
    }

    class TableRowDict {
        String id
        Any value
    }

    class BoxCoords {
        List~float~ xywh
    }

    Table "1" *-- "1" TableMetadata: has
    Table "1" *-- "*" TableHeader: has
    Table "1" *-- "*" TableRowDict: has
    Table "1" *-- "1" BoxCoords: has

    %% Add rounded edges styling
    classDef default fill:#f9f9f9,stroke:#333,stroke-width:2px,rx:10,ry:10
    classDef relationship fill:none,stroke:#666,stroke-width:1px
  ```
</div>

### 3. Charts and Figures

Charts and figures are represented with a `<Chart id="ch-{id}"/>` tag in the content. The chart details are stored in the `figures` list, including properties like:

## Example Usage

Here's an example of how the `MarkdownPage` model is used to process a document:

<CodeGroup>
  ```python Python theme={"theme":{"light":"github-light","dark":"dark-plus"}}
  from pathlib import Path
  from vlmrun.client import VLMRun

  from vlmrun.client.types import PredictionResponse, MarkdownDocument

  # Initialize client
  client = VLMRun(api_key="<VLMRUN_API_KEY>")

  # Process document
  response: PredictionResponse = client.document.generate(
      file=Path("document.pdf"),
      domain="document.markdown",
      batch=True,
  )

  # Access processed document
  doc: MarkdownDocument = client.predictions.wait(response.id, timeout=120)
  print(doc.model_dump_json(indent=2))
  ```
</CodeGroup>

## Example JSON Response

Here's an example of how the MarkdownPage schema appears in a JSON response:

```json theme={"theme":{"light":"github-light","dark":"dark-plus"}}
{
  "pages": [
    {                                 // page 0
      "metadata": {
        "page_number": 0
      },
      "tables": [
          {
            "metadata": {
            "title": "Sample Data Table",
            "caption": "Table showing example data"
          },
          "content": "| Header 1 | Header 2 |\n|----------|----------|\n| Data 1   | Data 2   |\n| Data 3   | Data 4   |",
          "headers": [
            {
              "id": "h1",
              "column": 0,
              "name": "Header 1",
              "dtype": "string"
            },
            ...
          ],
          "data": [
            {
              "h1": "Data 1",
              "h2": "Data 2"
            },
            ...
          ]
        }
      ],
      "figures": [
        {
          "id": 0,
          "title": "Sample Bar Chart",
          "caption": "Example visualization",
          "content": "..."
        }
        ...
      ],
      "content": "..."
    },
    {                                 // page 1
      ...
    },
    {                                 // page 2
      ...
    },
    ...
  ]
}
```
