Import all Excel files from all subfolders in a directory - python

I'm new to Python and having some trouble looping all the files in my directory.
I am trying to import data from all Excel files from all of the subfolders I have in one single directory. For example, I have a directory named "data" which has five different subfolders and each subfolder contains Excel files from which I want to extract data.
I guess my current code is not working because it just loops all the files in a directory without considering the subfolders. How do I modify my current code to extract data from all the subfolders in my directory?
data_location = "data/"
for file in os.listdir(data_location):
df_file = pd.read_excel(data_location + file)
df_file.set_index(df_file.columns[0], inplace=True)
selected_columns = df_file.loc["Country":"Score", :'Unnamed: 1']
selected_columns.dropna(inplace=True)
df_total = pd.concat([selected_columns, df_total], ignore_index=True)
Also, I've been trying to create a new variable using each file name as I import them. For example, if there are 5 files(file1~file5) in a directory, I want to create a new variable called "Source" and each value would be file1, file2, file3, file4, file5. I want python to append this value for the new variable as it imports each file in the loop. Could anyone please help me with this?

to go through subdirectories recursively, try something like this:
data_location = 'C:/path/to/data'
for subdir, dirs, files in os.walk(data_location):
for file in files:
df_file = pd.read_excel(data_location + file)

Related

Python rename files and save in different folder

I Have around 1500 folders, each containing relevant data and some other irrelavant data.
Every folder is in my data directory. For example one of those folders 'folder_00' contains irrelavant folder, csv files(the actual data) and other csv files.
now i try to iterate over my data folder, entry each folder, count the amount of data, copy those data into another path, rename the data in the other path by theire folder name and number of data.
For now i just try to count the main folders of the data (like data/folder_00 but not the folder inside folder_00) and count the data inside that foulder .
dir_count=0
file_count = 0
for subdir, dirs, files in os.walk(src_path):
for dir_name in dirs:
print(os.path.join(src_path, dir_name))
dir_count=dir_count+1
print(dir_count)
for file_name in files:
if file_name.startswith("relevant_keyword")and not file_name.startswith("irrelevant_keyword"):
file_count+=1
print(file_name)
print(file_count)
I tried to debug that code but it still doesnt work as i want it to^^
it seems like the loop would only access one folder and count the data. It starts at 200 for some reason but counts currectly.
But the for loop where i want to count the folder doesnt work correctly too. Something is really wrong here^^

Opening excels from different folders with python

Hi have a folder and inside that folder I have got nfolders(400)
In each of those folders I have several documents and one of them is an excel with a key name
Is there any possibility of oppening those excel as df1, df2,dfn?
Does anyone know how to Do a foor loop that opens each of those 400 folders?
Thanks!!
Assuming your excel files have extension '.xlsx'.
I use os.walk(path) from os package. os.walk traverses all the subfolders.
Put the path to the parent folder in path variable.
import os
import pandas as pd
path_to_parentfolder = 'Parent_Folder/'
files = []
for r, d, f in os.walk(path_to_parentfolder):
for file in f:
if '.xlsx' in file: #Enter the extension for your file type
files.append(os.path.join(r, file).replace('/','\\'))
df_list = [pd.read_excel(open(file)) for file in files] #All your data is stored in the list
Read about os.walk in its docs

Python Zip Files Read

I am new to python, I have a .zip file which has multiple sub-folders and each sub-folders has multiple .txt files. I am trying to read all .txt files But I want to store files folder specific into a variable But I am not able to do so.
For eg:
"test.zip" which has three folders "a","b","c", each has multiple(>10,000) .txt files
I want to read all files inside folder "a" and store it into a variable a_file and same with folder "b" and "c"
I tried the following code:
for file in os.listdir():
if file.endswith('test.zip'):
zfile=zipfile.ZipFile(file)
fnames= [f.filename for f in zfile.infolist()]
for subfile in fnames:
if fnames == "a" . #name of a folder
if subfile.endswith('.txt'):
lines=zfile.open(subfile).read()
print(lines)
But the code is extracting all files from multiple folders and not displaying any output maybe because of if condition
it. Instead of a reading folder specific and storing it
Thank You in Advance for helping
That happened because zip file lists the files as follows:
a/a1.txt a/a2.txt b/b1.txt b/b2.txt
So you need to separate files from directory using split('/')
You could try this:
import os
from zipfile import ZipFile
for file in os.listdir():
if file.endswith('test.zip'):
zfile = ZipFile(file);
fnames = [f.filename for f in zfile.filelist];
for subfile in fnames:
dir_name = subfile.split('/')[0];
if(dir_name == 'a'):
if(subfile.endswith('.txt')):
lines = zfile.open(subfile).read();
print(lines);

Walking into sub directories not wokring

I'm trying to export all of my maps that are in my subdirectories.
I have the code to export, but I cannot figure out where to add the loop that will make it do this for all subdirectories. As of right now, it is exporting the maps in the directory, but not the subfolders.
import arcpy, os
arcpy.env.workspace = ws = r"C:\Users\162708\Desktop\Burn_Zones"
for subdir, dirs, files in os.walk(ws):
for file in files:
mxd_list = arcpy.ListFiles("*.mxd")
for mxd in mxd_list:
current_mxd = arcpy.mapping.MapDocument(os.path.join(ws, mxd))
pdf_name = mxd[:-4] + ".pdf"
arcpy.mapping.ExportToPDF(current_mxd, pdf_name)
del mxd_list
What am I doing wrong that it isn't able to iterate through the subfolders?
Thank you!
Iterating through os.walk result you should give tuples containing (path, dirs, files) (the first in the tuple is the current path that contains files which is why I tend to name it that way). The current directory does not change automatically so you need to incorporate it into the path you're giving to arcpy.ListFiles like this:
arcpy.ListFiles(os.path.join(path, "*.mxd"))
You should also remove the loop for file in files. It seems like you're exporting the files per directory so why export the whole directory every time for each file?
Also you should change arcpy.mapping.MapDocument(os.path.join(ws, mxd)) to arcpy.mapping.MapDocument(os.path.join(path, mxd)) where path is again the first element from os.walk.

Walking sub directories in Python and saving to same sub directory

First of all thanks for reading this. I am a little stuck with sub directory walking (then saving) in Python. My code below is able to walk through each sub directory in turn and process a file to search for certain strings, I then generate an xlsx file (using xlsxwriter) and post my search data to an Excel.
I have two problems...
The first problem I have is that I want to process a text file in each directory, but the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt (would I use glob here?)
The second problem is that when I open/create an Excel I would like to save the file to the same sub directory where the .txt file has been found and processed. Currently my Excel is saving to the python script directory, and consequently gets overwritten each time a new sub directory is opened and processed. Would it be wiser to save the Excel at the end to the sub directory or can it be created with the current sub directory path from the start?
Here's my partially working code...
for root, subFolders, files in os.walk(dir_path):
if 'Textfile.txt' in files:
with open(os.path.join(root, 'Textfile.txt'), 'r') as f:
#f = open(file, "r")
searchlines = f.readlines()
searchstringsFilter1 = ['Filter Used :']
searchstringsFilter0 = ['Filter Used : 0']
timestampline = None
timestamp = None
f.close()
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Excel.xlsx', {'strings_to_numbers': True})
worksheetFilter = workbook.add_worksheet("Filter")
Thanks again for looking at this problem.
MikG
I will not solve your code completely, but here are hints:
the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt
you can list all files in directory, then check file extension
for filename in files:
if filename.endswith('.txt'):
# do stuff
Also when creating woorkbook, can you enter path? You have root, right? Why not use it?
You don't want glob because you already have a list of files in the files variable. So, filter it to find all the text files:
import fnmatch
txt_files = filter(lambda fn: fnmatch.fnmatch(fn, '*.txt'), files)
To save the file in the same subdirectory:
outfile = os.path.join(root, 'someoutfile.txt')

Categories