Reading Excel

Examples of reading Excel files with various range options

Sling supports reading Excel files (.xlsx) with flexible range selection options. You can extract data from specific sheets, cell ranges, row ranges, or column ranges.

Range Syntax

The sheet source option supports several range formats:

Format
Example
Description

No sheet

(empty)

Uses first sheet in workbook

Sheet only

Sheet1

Read entire sheet

Range only

!A:F

First sheet, columns A through F

Cell range

Sheet1!A1:F100

Standard cell range

Column range

Sheet1!A:F

All rows, columns A through F

Row range

Sheet1!5:20

Rows 5-20, auto-detect columns with data

Row start

Sheet1!5:

From row 5 to end, auto-detect columns

Partial range

Sheet1!A5:F

From A5 to column F, last row

circle-info

When no sheet name is provided (or when using ! prefix without a sheet name), Sling automatically uses the first sheet in the workbook.

Excel to Database

Using CLI Flags

sling.sh
# Read first sheet (no sheet option = uses first sheet)
$ sling run --src-stream 'file:///path/to/data.xlsx' \
  --tgt-conn MY_POSTGRES \
  --tgt-object 'public.excel_data' \
  --mode full-refresh

# Read first sheet with column range (! prefix without sheet name)
$ sling run --src-stream 'file:///path/to/data.xlsx' \
  --src-options '{ sheet: "!A:F" }' \
  --tgt-conn MY_POSTGRES \
  --tgt-object 'public.excel_data' \
  --mode full-refresh

# Read specific sheet
$ sling run --src-stream 'file:///path/to/data.xlsx' \
  --src-options '{ sheet: "Sales" }' \
  --tgt-conn MY_POSTGRES \
  --tgt-object 'public.sales_data' \
  --mode full-refresh

# Read specific cell range (A1 to F100)
$ sling run --src-stream 'file:///path/to/data.xlsx' \
  --src-options '{ sheet: "Sales!A1:F100" }' \
  --tgt-conn MY_POSTGRES \
  --tgt-object 'public.sales_data' \
  --mode full-refresh

# Read column range (all rows, columns A through F)
$ sling run --src-stream 'file:///path/to/data.xlsx' \
  --src-options '{ sheet: "Sales!A:F" }' \
  --tgt-conn MY_POSTGRES \
  --tgt-object 'public.sales_data' \
  --mode full-refresh

# Read row range (rows 5-20, auto-detect columns)
$ sling run --src-stream 'file:///path/to/data.xlsx' \
  --src-options '{ sheet: "Sales!5:20" }' \
  --tgt-conn MY_POSTGRES \
  --tgt-object 'public.sales_subset' \
  --mode full-refresh

# Read from row 5 to end (auto-detect columns and last row)
$ sling run --src-stream 'file:///path/to/data.xlsx' \
  --src-options '{ sheet: "Sales!5:" }' \
  --tgt-conn MY_POSTGRES \
  --tgt-object 'public.sales_from_row5' \
  --mode full-refresh

# Read partial range (from A5 to column F, extends to last row)
$ sling run --src-stream 'file:///path/to/data.xlsx' \
  --src-options '{ sheet: "Sales!A5:F" }' \
  --tgt-conn MY_POSTGRES \
  --tgt-object 'public.sales_data' \
  --mode full-refresh

Using Replication

Running with Sling: sling run -r /path/to/replication.yaml

Using Python

Excel to File (Parquet, CSV, JSON)

Convert Excel files to other formats like Parquet for efficient storage and querying.

Using CLI Flags

Using Replication

Using Python

Advanced Options

Password-Protected Excel Files

Custom Date Patterns

Type Casting

Environment Variables

Variable
Description
Default

SLING_SAMPLE_SIZE

Number of rows to sample for type inference

900

SLING_ROW_NUM_COLUMN

Add _sling_row_num column with Excel row number

false

SLING_STREAM_URL_COLUMN

Add _sling_stream_url column with file path

false

Tips

  1. Skip Header Rows: Use row ranges like 5:1000 or partial ranges like A5:F to skip title rows or headers that aren't your column names.

  2. Auto-Detect Columns: Row-only ranges (10:500) automatically detect the rightmost column with data, useful for sheets where column count varies.

  3. Memory Efficiency: Sling streams data row-by-row, so large Excel files are processed efficiently without loading the entire file into memory.

  4. Type Inference: Increase SLING_SAMPLE_SIZE for better type detection on large files with varied data.

  5. Multiple Sheets: Process multiple sheets by creating separate stream entries for each sheet in your replication config.

Last updated

Was this helpful?