excel_sheets() associate a larger set of file extensions with xlsx and are better able to guess the format of a file with a nonstandard or missing extension. This is about deciding whether to treat a file as xls or xlsx. (#342, #411, #457)
excel_format()is the newly-exported format-guessing function.
format_from_ext()is a low-level helper, also exported, that only consults file extension. In addition to the obvious interpretation of
.xlsx, the extensions
.xltmare now associated with xlsx.
format_from_signature()is a low-level helper, also exported, that consults the file’s signature (a.k.a. magic number). It’s handy for files that lack an extension.
Integer-y numbers larger than 2^31 are coerced properly to string (xls, #346)
Shared strings are only compared to NA strings after lookup, never on the basis of their index. (xlsx, #401)
Better checks and messaging around nonexistent files. (#392)
Empty cells, rows, columns (xlsx #248 and #240, xls #271): Cells with no content are no longer loaded, even if they appear in the file. Affects cells that have no data but that carry explicit formatting, detectable in Excel as seemingly empty cells with a format other than “General”. Such cells may still exist in the returned tibble, with value
NA, depending on the sheet geometry.
col_names are processed relative to user-supplied
col_types, if given. Specifically,
col_names is considered valid if it has the same length as
col_types, before or after removing skipped columns. (#81, #261)
"logical" is a new accepted value for
col_types = NULL, it is the guessed type for cells Excel advertises as Boolean. When a column has no data, it is now filled with logical
NA. (#277, #270)
"guess" is a new accepted value for
col_types. Allows the user to specify some column types, while allowing others to be guessed (#286)
na can now hold multiple NA values, e.g.,
read_excel("missing-values.xls", na = c("NA", "1")). (#13, #56, @jmarshallnz)
Coercions and cell data:
NAinstead of their integer representation. Throws warning. (#277, #263)
read_excel()attempts to coerce the string to numeric and falls back to
NAif unsuccessful. Throws warning. (#277, #217, #106)
NA(instead of the string
"error"). (#277, #62)
"Unknown type: 517". (#274, #259)
Many 3rd party tools write xls and xlsx that comply with the spec, but that are quite different from files produced by Excel.
Namespace prefixes are now stripped from element names and attributes when parsing XML from xlsx. Workaround for the creative approach taken in some other s/w, coupled with rapidxml’s lack of namespace support. (#295, #268, #202, #80)
Excel mixes 0- and 1-indexing in reported row and column dimensions for xls and libxls expects that. Other s/w may index from 0 for both, preventing libxls from reading the last column. Patched to restore access to those cells. (#273, #180, #152, #99)
The Lotus 1-2-3 leap year bug is now accounted for, i.e. date-times prior to March 1, 1900 import correctly. Date-times on the non-existent leap day February 29, 1900 import as NA and throw a warning. (#264, #148, #292)