Reading multiple xlsx files each with multiple sheets - purrr


Reading multiple xlsx files each with multiple sheets - purrr



I have multiple excel files, each with different worksheets. I have tried to use readxl and map to import it to R. However, I was only able to do it using a for loop. The code below works fine but I would like to know if there is a clever way to do this. I keep thinking that I could have done it with map2 but I am missing something.


library(tidyverse)
library(readxl)
library(writexl)

### As a first step, I get all the files from my project folder and create an empty list for looping purposes

files <- list.files(pattern = ".xlsx")
data_xlsx <- list()

### I then use seq_along in all the files and map_df to read the each excel file

for (i in seq_along(files)) {
data_xlsx[[i]] <- files[i] %>%
excel_sheets() %>%
set_names() %>%
map_df(
~ read_xlsx(path = files[i], sheet = .x, range = "H3"),
.id = "sheet")
}

# I use the code below to get the files name into the list

data_xlsx <- set_names(data_xlsx, files)

# This final code is just to transform the list into a data frame with a column with the name of the files

data_xlsx_df <- map2_df(data_xlsx, files, ~update_list(.x, file = .y))



Created on 2018-07-01 by the reprex package (v0.2.0).




1 Answer
1



You can use nested map_df calls to replace the for loop. As far as I know map2 can only operate on two lists of length n and return a list of length n, I don't think it's a way to generate a length n * m list from two lists of length n and m.


map_df


map2


n


n


n * m


n


m


files <- list.files(pattern = ".xlsx")

data_xlsx_df <- map_df(set_names(files), function(file) {
file %>%
excel_sheets() %>%
set_names() %>%
map_df(
~ read_xlsx(path = file, sheet = .x, range = "H3"),
.id = "sheet")
}, .id = "file")






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

PySpark - SparkContext: Error initializing SparkContext File does not exist

django NoReverseMatch Exception

Python Tkinter Error, “Too Early to Create Image”