Hi have a folder and inside that folder I have got nfolders(400)
In each of those folders I have several documents and one of them is an excel with a key name
Is there any possibility of oppening those excel as df1, df2,dfn?
Does anyone know how to Do a foor loop that opens each of those 400 folders?
Thanks!!
Assuming your excel files have extension '.xlsx'.
I use os.walk(path) from os package. os.walk traverses all the subfolders.
Put the path to the parent folder in path variable.
import os
import pandas as pd
path_to_parentfolder = 'Parent_Folder/'
files = []
for r, d, f in os.walk(path_to_parentfolder):
for file in f:
if '.xlsx' in file: #Enter the extension for your file type
files.append(os.path.join(r, file).replace('/','\\'))
df_list = [pd.read_excel(open(file)) for file in files] #All your data is stored in the list
Read about os.walk in its docs
Related
I have a script to loop through folders and merge the PDF's inside into one. I have a question that is above my pay grade and I don't know how to do, I am very new to programming and am betting on the benevolence of the internet for help.
Is there anyway I can specify the name of the PDF to be saved to be the same as the folder where they are found or even the name of one of the original files?
import PyPDF2
import os
Path = 'C:/Users/Folder location/'
folders = os.listdir(Path)
pdfMerger = PyPDF2.PdfFileMerger()
name = os.listdir(Path)
def pdf_merge(filelist):
pdfMerger = PyPDF2.PdfFileMerger()
for file in os.listdir(filelist):
if file.endswith('.pdf'):
pdfMerger.append(filelist+'/'+file)
pdfOutput = open(Path+folder+'/.pdf', 'wb')
pdfMerger.write(pdfOutput)
pdfOutput.close()
for folder in folders:
pdf_merge(Path+folder)
I'm new to Python and having some trouble looping all the files in my directory.
I am trying to import data from all Excel files from all of the subfolders I have in one single directory. For example, I have a directory named "data" which has five different subfolders and each subfolder contains Excel files from which I want to extract data.
I guess my current code is not working because it just loops all the files in a directory without considering the subfolders. How do I modify my current code to extract data from all the subfolders in my directory?
data_location = "data/"
for file in os.listdir(data_location):
df_file = pd.read_excel(data_location + file)
df_file.set_index(df_file.columns[0], inplace=True)
selected_columns = df_file.loc["Country":"Score", :'Unnamed: 1']
selected_columns.dropna(inplace=True)
df_total = pd.concat([selected_columns, df_total], ignore_index=True)
Also, I've been trying to create a new variable using each file name as I import them. For example, if there are 5 files(file1~file5) in a directory, I want to create a new variable called "Source" and each value would be file1, file2, file3, file4, file5. I want python to append this value for the new variable as it imports each file in the loop. Could anyone please help me with this?
to go through subdirectories recursively, try something like this:
data_location = 'C:/path/to/data'
for subdir, dirs, files in os.walk(data_location):
for file in files:
df_file = pd.read_excel(data_location + file)
I have multiple zip folders named zip_folder_1, zip_folder_2, ..., zip_folder_n.
All these zip folders are located in the same directory. Each one of these zip folders contains a csv file named "selected_file.csv".
I need to read each one of the "selected_file.csv" located at each one of the zip folders and concatenate them into a single file
Could someone give me a hint on the required python code to solve this problem? I appreciate your help!
This should produce concatenated_data.csv in your working directory, and assumes that all files in my_data_dir are zip files with data in them.
import os, numpy as np, zipfile
def add_data_to_file(new_data,file_name):
if os.path.isfile(file_name):
mode = 'ab'
else:
mode = 'wb'
with open(file_name,mode) as f:
np.savetxt(f,np.array([new_data]),delimiter=',')
my_data_dir = 'C:/my/zip/data/dir/'
data_files = os.listdir(my_data_dir)
for data_file in data_files:
full_path = os.path.join(my_data_dir,data_file)
with zipfile.ZipFile(full_path,'r',zipfile.ZIP_DEFLATED) as zip_file:
with zip_file.open('selected_file.csv','r') as selected_file:
data = np.loadtxt(selected_file,delimiter=",")
add_data_to_file(data,'concatenated_data.csv')
I am new to python, I have a .zip file which has multiple sub-folders and each sub-folders has multiple .txt files. I am trying to read all .txt files But I want to store files folder specific into a variable But I am not able to do so.
For eg:
"test.zip" which has three folders "a","b","c", each has multiple(>10,000) .txt files
I want to read all files inside folder "a" and store it into a variable a_file and same with folder "b" and "c"
I tried the following code:
for file in os.listdir():
if file.endswith('test.zip'):
zfile=zipfile.ZipFile(file)
fnames= [f.filename for f in zfile.infolist()]
for subfile in fnames:
if fnames == "a" . #name of a folder
if subfile.endswith('.txt'):
lines=zfile.open(subfile).read()
print(lines)
But the code is extracting all files from multiple folders and not displaying any output maybe because of if condition
it. Instead of a reading folder specific and storing it
Thank You in Advance for helping
That happened because zip file lists the files as follows:
a/a1.txt a/a2.txt b/b1.txt b/b2.txt
So you need to separate files from directory using split('/')
You could try this:
import os
from zipfile import ZipFile
for file in os.listdir():
if file.endswith('test.zip'):
zfile = ZipFile(file);
fnames = [f.filename for f in zfile.filelist];
for subfile in fnames:
dir_name = subfile.split('/')[0];
if(dir_name == 'a'):
if(subfile.endswith('.txt')):
lines = zfile.open(subfile).read();
print(lines);
First of all thanks for reading this. I am a little stuck with sub directory walking (then saving) in Python. My code below is able to walk through each sub directory in turn and process a file to search for certain strings, I then generate an xlsx file (using xlsxwriter) and post my search data to an Excel.
I have two problems...
The first problem I have is that I want to process a text file in each directory, but the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt (would I use glob here?)
The second problem is that when I open/create an Excel I would like to save the file to the same sub directory where the .txt file has been found and processed. Currently my Excel is saving to the python script directory, and consequently gets overwritten each time a new sub directory is opened and processed. Would it be wiser to save the Excel at the end to the sub directory or can it be created with the current sub directory path from the start?
Here's my partially working code...
for root, subFolders, files in os.walk(dir_path):
if 'Textfile.txt' in files:
with open(os.path.join(root, 'Textfile.txt'), 'r') as f:
#f = open(file, "r")
searchlines = f.readlines()
searchstringsFilter1 = ['Filter Used :']
searchstringsFilter0 = ['Filter Used : 0']
timestampline = None
timestamp = None
f.close()
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Excel.xlsx', {'strings_to_numbers': True})
worksheetFilter = workbook.add_worksheet("Filter")
Thanks again for looking at this problem.
MikG
I will not solve your code completely, but here are hints:
the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt
you can list all files in directory, then check file extension
for filename in files:
if filename.endswith('.txt'):
# do stuff
Also when creating woorkbook, can you enter path? You have root, right? Why not use it?
You don't want glob because you already have a list of files in the files variable. So, filter it to find all the text files:
import fnmatch
txt_files = filter(lambda fn: fnmatch.fnmatch(fn, '*.txt'), files)
To save the file in the same subdirectory:
outfile = os.path.join(root, 'someoutfile.txt')