Read xls and xlsx files.

While read_excel() auto detects the format from the file extension, read_xls() and read_xlsx() can be used to read files without extension.

read_excel(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
  guess_max = min(1000, n_max))

read_xls(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
  guess_max = min(1000, n_max))

read_xlsx(path, sheet = NULL, range = NULL, col_names = TRUE,
  col_types = NULL, na = "", trim_ws = TRUE, skip = 0, n_max = Inf,
  guess_max = min(1000, n_max))

Arguments

path

Path to the xls/xlsx file

sheet

Sheet to read. Either a string (the name of a sheet), or an integer (the position of the sheet). Ignored if the sheet is specified via range. If neither argument specifies the sheet, defaults to the first sheet.

range

A cell range to read from, as described in cell-specification. Includes typical Excel ranges like "B3:D87", possibly including the sheet name like "Budget!B2:G14", and more. Interpreted strictly, even if the range forces the inclusion of leading or trailing empty rows or columns. Takes precedence over skip, n_max and sheet.

col_names

TRUE to use the first row as column names, FALSE to get default names, or a character vector giving a name for each column. If user provides col_types as a vector, col_names can have one entry per column, i.e. have the same length as col_types, or one entry per unskipped column.

col_types

Either NULL to guess all from the spreadsheet or a character vector containing one entry per column from these options: "skip", "guess", "logical", "numeric", "date", "text" or "list". If exactly one col_type is specified, it will be recycled. The content of a cell in a skipped column is never read and that column will not appear in the data frame output. A list cell loads a column as a list of length 1 vectors, which are typed using the type guessing logic from col_types = NULL, but on a cell-by-cell basis.

na

Character vector of strings to use for missing values. By default, readxl treats blank cells as missing data.

trim_ws

Should leading and trailing whitespace be trimmed?

skip

Minimum number of rows to skip before reading anything, be it column names or data. Leading empty rows are automatically skipped, so this is a lower bound. Ignored if range is given.

n_max

Maximum number of data rows to read. Trailing empty rows are automatically skipped, so this is an upper bound on the number of rows in the returned tibble. Ignored if range is given.

guess_max

Maximum number of data rows to use for guessing column types.

Value

A tibble

See also

cell-specification for more details on targetting cells with the range argument

Examples

datasets <- readxl_example("datasets.xlsx") read_excel(datasets)
#> # A tibble: 150 x 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3.0 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5.0 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa #> 7 4.6 3.4 1.4 0.3 setosa #> 8 5.0 3.4 1.5 0.2 setosa #> 9 4.4 2.9 1.4 0.2 setosa #> 10 4.9 3.1 1.5 0.1 setosa #> # ... with 140 more rows
# Specify sheet either by position or by name read_excel(datasets, 2)
#> # A tibble: 32 x 11 #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> # ... with 22 more rows
read_excel(datasets, "mtcars")
#> # A tibble: 32 x 11 #> mpg cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 #> # ... with 22 more rows
# Skip rows and use default column names read_excel(datasets, skip = 148, col_names = FALSE)
#> # A tibble: 3 x 5 #> X__1 X__2 X__3 X__4 X__5 #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 6.5 3.0 5.2 2.0 virginica #> 2 6.2 3.4 5.4 2.3 virginica #> 3 5.9 3.0 5.1 1.8 virginica
# Recycle a single column type read_excel(datasets, col_types = "text")
#> # A tibble: 150 x 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <chr> <chr> <chr> <chr> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa #> 7 4.6 3.4 1.4 0.3 setosa #> 8 5 3.4 1.5 0.2 setosa #> 9 4.4 2.9 1.4 0.2 setosa #> 10 4.9 3.1 1.5 0.1 setosa #> # ... with 140 more rows
# Specify some col_types and guess others read_excel(datasets, col_types = c("text", "guess", "numeric", "guess", "guess"))
#> # A tibble: 150 x 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <chr> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3.0 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa #> 4 4.6 3.1 1.5 0.2 setosa #> 5 5 3.6 1.4 0.2 setosa #> 6 5.4 3.9 1.7 0.4 setosa #> 7 4.6 3.4 1.4 0.3 setosa #> 8 5 3.4 1.5 0.2 setosa #> 9 4.4 2.9 1.4 0.2 setosa #> 10 4.9 3.1 1.5 0.1 setosa #> # ... with 140 more rows
# Accomodate a column with disparate types via col_type = "list" df <- read_excel(readxl_example("clippy.xlsx"), col_types = c("text", "list")) df
#> # A tibble: 4 x 2 #> name value #> <chr> <list> #> 1 Name <chr [1]> #> 2 Species <chr [1]> #> 3 Approx date of death <dttm [1]> #> 4 Weight in grams <dbl [1]>
df$value
#> [[1]] #> [1] "Clippy" #> #> [[2]] #> [1] "paperclip" #> #> [[3]] #> [1] "2007-01-01 UTC" #> #> [[4]] #> [1] 0.9 #>
sapply(df$value, class)
#> [[1]] #> [1] "character" #> #> [[2]] #> [1] "character" #> #> [[3]] #> [1] "POSIXct" "POSIXt" #> #> [[4]] #> [1] "numeric" #>
# Limit the number of data rows read read_excel(datasets, n_max = 3)
#> # A tibble: 3 x 5 #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 5.1 3.5 1.4 0.2 setosa #> 2 4.9 3.0 1.4 0.2 setosa #> 3 4.7 3.2 1.3 0.2 setosa
# Read from an Excel range using A1 or R1C1 notation read_excel(datasets, range = "C1:E7")
#> # A tibble: 6 x 3 #> Petal.Length Petal.Width Species #> <dbl> <dbl> <chr> #> 1 1.4 0.2 setosa #> 2 1.4 0.2 setosa #> 3 1.3 0.2 setosa #> 4 1.5 0.2 setosa #> 5 1.4 0.2 setosa #> 6 1.7 0.4 setosa
read_excel(datasets, range = "R1C2:R2C5")
#> # A tibble: 1 x 4 #> Sepal.Width Petal.Length Petal.Width Species #> <dbl> <dbl> <dbl> <chr> #> 1 3.5 1.4 0.2 setosa
# Specify the sheet as part of the range read_excel(datasets, range = "mtcars!B1:D5")
#> # A tibble: 4 x 3 #> cyl disp hp #> <dbl> <dbl> <dbl> #> 1 6 160 110 #> 2 6 160 110 #> 3 4 108 93 #> 4 6 258 110
# Read only specific rows or columns read_excel(datasets, range = cell_rows(102:151), col_names = FALSE)
#> # A tibble: 50 x 5 #> X__1 X__2 X__3 X__4 X__5 #> <dbl> <dbl> <dbl> <dbl> <chr> #> 1 6.3 3.3 6.0 2.5 virginica #> 2 5.8 2.7 5.1 1.9 virginica #> 3 7.1 3.0 5.9 2.1 virginica #> 4 6.3 2.9 5.6 1.8 virginica #> 5 6.5 3.0 5.8 2.2 virginica #> 6 7.6 3.0 6.6 2.1 virginica #> 7 4.9 2.5 4.5 1.7 virginica #> 8 7.3 2.9 6.3 1.8 virginica #> 9 6.7 2.5 5.8 1.8 virginica #> 10 7.2 3.6 6.1 2.5 virginica #> # ... with 40 more rows
read_excel(datasets, range = cell_cols("B:D"))
#> # A tibble: 150 x 3 #> Sepal.Width Petal.Length Petal.Width #> <dbl> <dbl> <dbl> #> 1 3.5 1.4 0.2 #> 2 3.0 1.4 0.2 #> 3 3.2 1.3 0.2 #> 4 3.1 1.5 0.2 #> 5 3.6 1.4 0.2 #> 6 3.9 1.7 0.4 #> 7 3.4 1.4 0.3 #> 8 3.4 1.5 0.2 #> 9 2.9 1.4 0.2 #> 10 3.1 1.5 0.1 #> # ... with 140 more rows
# Get a preview of column names names(read_excel(readxl_example("datasets.xlsx"), n_max = 0))
#> [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"