Very new to programming. Cobbled some code together. Looked through many different Q&As and haven't found what I am looking for.
I am using Python to extract many zip files that all contain one Excel file. Each Excel file is named incorrectly and I want to rename it using the name of the file it was unzipped from. I know how to extract them, but not rename them.
Here is what I have so far:
import os
import zipfile
dir_name = r"C:\Users\Nuffanael\Documents\Python Scripts\UnZIP Test"
extension = ".zip"
xlsx = ".xlsx"
os.chdir(dir_name) # change directory from working dir to dir with files
for item in os.listdir(dir_name) : #loop through items in dir
if item.endswith(extension) : # check for ".zip" extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(dir_name) # extract file to dir
zip_ref.close() # close file
I am trying to write a python 3.6 script that will add key/value pairs from a folder tree dictionary to a csv file. Files in the folder three are the keys and their paths are the values.
There seems to be an error in how I am iterating through the dictionary because in the csv file I only get the key/value pairs from one of the folders, and not the entire folder tree. I just don't see where my error is. Here is my code:
import os
import csv
root_dir = '.'
for root, dirs, files in os.walk (root_dir, topdown='true'):
folder_dict = {filename:root for filename in files}
print (folder_dict)
with open ('test.csv', 'w') as csvfile:
for key in folder_dict:
csvfile.write ('%, %s\n'% (key, folder_dict [key]))
I get the dictionary but in the csv file there are only the key/value pairs for one item.
Because of the line folder_dict = {filename:root for filename in files}, you overwrite the data on each loop, leaving the last dictionary as the only thing for the later write to the CSV.
You don't really need this interim data structure at all. Just write the CSV as you discover files to write. You weren't actually using the CSV module, so I added it to the solution.
import os
import csv
root_dir = '.'
with open ('test.csv', 'w') as fileobj:
csvfile = csv.writer(fileobj)
for root, dirs, files in os.walk (root_dir, topdown='true'):
csvfile.writerows((filename, root) for filename in files)
Hi have a folder and inside that folder I have got nfolders(400)
In each of those folders I have several documents and one of them is an excel with a key name
Is there any possibility of oppening those excel as df1, df2,dfn?
Does anyone know how to Do a foor loop that opens each of those 400 folders?
Thanks!!
Assuming your excel files have extension '.xlsx'.
I use os.walk(path) from os package. os.walk traverses all the subfolders.
Put the path to the parent folder in path variable.
import os
import pandas as pd
path_to_parentfolder = 'Parent_Folder/'
files = []
for r, d, f in os.walk(path_to_parentfolder):
for file in f:
if '.xlsx' in file: #Enter the extension for your file type
files.append(os.path.join(r, file).replace('/','\\'))
df_list = [pd.read_excel(open(file)) for file in files] #All your data is stored in the list
Read about os.walk in its docs
I'm using python 3 and pandas. I have a folder of multiple CSV files where each contain stats on a given date for all the regions of a country. I have created another folder for CSV files I created for each of the regions, one named for each of the regions listed in the CSV files in the first folder. I want to append the appropriate row from each of the first set of files to their respective region file in the second folder.
This shows a portion of a CSV file from first folder
This shows the CSV files I created in the second folder
Here is the code I'm running after creating the new set of region named files in the second folder. I don't get any errors, but I don't get the results I'm looking for either, which is a CSV file for each region in the second folder containing the daily stats from each of the files in the first folder.
for csvname in os.listdir("NewTables"):
if csvname.endswith(".csv"):
df1 = pd.read_csv("NewTables/"+ csvname)
name1 = os.path.splitext(filename)[0]
for file in os.listdir():
if file.endswith(".csv"):
df2 = pd.read_csv(file)
D = df2[df2["denominazione_regione"] == name1 ]
df1.append(D, ignore_index = True)
df1.to_csv("NewTables/"+ csvname)
Here are a few lines from a CSV file in the first folder:
data,stato,codice_regione,denominazione_regione,lat,long,ricoverati_con_sintomi,terapia_intensiva,totale_ospedalizzati,isolamento_domiciliare,totale_positivi,variazione_totale_positivi,nuovi_positivi,dimessi_guariti,deceduti,totale_casi,tamponi,note_it,note_en
2020-02-24T18:00:00,ITA,13,Abruzzo,42.35122196,13.39843823,0,0,0,0,0,0,0,0,0,0,5,,
2020-02-24T18:00:00,ITA,17,Basilicata,40.63947052,15.80514834,0,0,0,0,0,0,0,0,0,0,0,,
2020-02-24T18:00:00,ITA,04,P.A. Bolzano,46.49933453,11.35662422,0,0,0,0,0,0,0,0,0,0,1,,
I would not use pandas here because there is little data processing and mainly file processing. So I would stick to the csv module.
I would look over the csv files in the first directory and process them one at a time. For each row I would just append it in the file with the relevant name in the second folder. I assume that the number of regions is reasonably small, so I would keep the files in second folder opened to save open/close time on each row.
The code could be:
import glob
import os.path
import csv
outfiles = {} # cache the open files and the associated writer in 2nd folder
for csvname in glob.glob('*.csv'): # loop over csv files from 1st folder
with open(csvname) as fdin:
rd = csv.DictReader(fdin) # read the file as csv
for row in rd:
path = "NewTables/"+row['denominazione_regione']+'.csv'
newfile = not os.path.exists(path) # a new file?
if row['denominazione_regione'] not in outfiles:
fdout = open(path, 'a', newline='') # not in cache: open it
wr = csv.DictWriter(fdout, rd.fieldnames)
if newfile:
wr.writeheader() # write header line only for new files
outfiles[row['denominazione_regione']] = (wr, fdout) # cache
wr = outfiles[row['denominazione_regione']][0]
wr.writerow(row) # write the row in the relevant file
for file in outfiles.values(): # close every outfile
file[1].close()
Filenames:
File1: new_data_20100101.csv
File2: samples_20100101.csv
timestamp is always = %Y%m%din the filename after a _ and before .csv
I want to find the files where there is a data and a samplesfile and then do something with those files:
My Code so far:
for all_files in os.listdir():
if all_files.__contains__("data_"):
dataList.append(all_files.split('_')[2])
if all_files.__contains__("samples_"):
samplesList.append(all_files.split('_')[1])
that gives me the filenames cut down to the Timestamp and the extension .csv
Now I would like to try something like this
for day in dataList:
if day in sampleList:
open day as csv.....
I get a list of days where both files have timestamps... how can I undo that files.split now so aI can go on working with the files since now I would get an error telling me that for instance _2010010.csvdoes not exist because it's new_data_2010010.csv
I'm kinda unsure on how to use the os.basename so I would appreciated some advice on the data names.
thanks
You could instead use the glob module to get your list. This allows you to filter just your CSV files.
The following script creates two dictionaries with the key for each dictionary being the date portion of your filename and the value holding the whole filename. A list comprehension creates a list of tuples holding each matching pair:
import glob
import os
csv_files = glob.glob('*.csv')
data_files = {file.split('_')[2] : file for file in csv_files if 'data_' in file}
sample_files = {file.split('_')[1] : file for file in csv_files if 'samples_' in file}
matching_pairs = [(sample_files[date], file) for date, file in data_files.items() if date in sample_files]
for sample_file, data_file in sorted(matching_pairs):
print('{} <-> {}'.format(sample_file, data_file))
For your two file example, this would display the following:
samples_20100101.csv <-> new_data_20100101.csv