Appending a single row from multiple CSV files to another CSV - python
I'm using python 3 and pandas. I have a folder of multiple CSV files where each contain stats on a given date for all the regions of a country. I have created another folder for CSV files I created for each of the regions, one named for each of the regions listed in the CSV files in the first folder. I want to append the appropriate row from each of the first set of files to their respective region file in the second folder.
This shows a portion of a CSV file from first folder
This shows the CSV files I created in the second folder
Here is the code I'm running after creating the new set of region named files in the second folder. I don't get any errors, but I don't get the results I'm looking for either, which is a CSV file for each region in the second folder containing the daily stats from each of the files in the first folder.
for csvname in os.listdir("NewTables"):
if csvname.endswith(".csv"):
df1 = pd.read_csv("NewTables/"+ csvname)
name1 = os.path.splitext(filename)[0]
for file in os.listdir():
if file.endswith(".csv"):
df2 = pd.read_csv(file)
D = df2[df2["denominazione_regione"] == name1 ]
df1.append(D, ignore_index = True)
df1.to_csv("NewTables/"+ csvname)
Here are a few lines from a CSV file in the first folder:
data,stato,codice_regione,denominazione_regione,lat,long,ricoverati_con_sintomi,terapia_intensiva,totale_ospedalizzati,isolamento_domiciliare,totale_positivi,variazione_totale_positivi,nuovi_positivi,dimessi_guariti,deceduti,totale_casi,tamponi,note_it,note_en
2020-02-24T18:00:00,ITA,13,Abruzzo,42.35122196,13.39843823,0,0,0,0,0,0,0,0,0,0,5,,
2020-02-24T18:00:00,ITA,17,Basilicata,40.63947052,15.80514834,0,0,0,0,0,0,0,0,0,0,0,,
2020-02-24T18:00:00,ITA,04,P.A. Bolzano,46.49933453,11.35662422,0,0,0,0,0,0,0,0,0,0,1,,
I would not use pandas here because there is little data processing and mainly file processing. So I would stick to the csv module.
I would look over the csv files in the first directory and process them one at a time. For each row I would just append it in the file with the relevant name in the second folder. I assume that the number of regions is reasonably small, so I would keep the files in second folder opened to save open/close time on each row.
The code could be:
import glob
import os.path
import csv
outfiles = {} # cache the open files and the associated writer in 2nd folder
for csvname in glob.glob('*.csv'): # loop over csv files from 1st folder
with open(csvname) as fdin:
rd = csv.DictReader(fdin) # read the file as csv
for row in rd:
path = "NewTables/"+row['denominazione_regione']+'.csv'
newfile = not os.path.exists(path) # a new file?
if row['denominazione_regione'] not in outfiles:
fdout = open(path, 'a', newline='') # not in cache: open it
wr = csv.DictWriter(fdout, rd.fieldnames)
if newfile:
wr.writeheader() # write header line only for new files
outfiles[row['denominazione_regione']] = (wr, fdout) # cache
wr = outfiles[row['denominazione_regione']][0]
wr.writerow(row) # write the row in the relevant file
for file in outfiles.values(): # close every outfile
file[1].close()
Related
How do I put filenames into a csv file and trigger the cell next to each filename to enter a specific category related to the file itself?
I am trying to write a python script that will look in a certain directory for all of the files and put them in column 1 of a csv file named, "Name". I want the code to look at each filename for a certain keyword and enter a category name into the cell beside it. Example: Filename = Daily_CFS_2023_02_14 Keyword "CFS" would post the category "Daily Reports > CFS" in the adjacent cell. This is the code I am using so far and it is listing the filenames in the csv, but I have no idea how to get it to write the category in the adjacent cell. import os import pandas as pd path = "/Users/tjjaglinski/Downloads/--DailyReports/" filename = [] for (root,dirs, file) in os.walk(path): for f in file: if not f.startswith('.') and os.path.isfile(os.path.join(root, f)): filename.append(f) print(f) df = pd.DataFrame(list(zip(filename)),columns=['Name']) df.to_csv("DailyReport.csv") I also need to get rid of the column that is numbering the rows.
Writing and appending multiple csv data into a single csv from different directories using python
I had a single directory consisting of multiple csv files and I was able to write their data into a single csv. Now the scenario has changed as given below: I have a main folder name 'X' which inturn consists of 4 subfolders namely 'w','y','z' & 'b'. All these subfolders contains multiple CSV files. I need to read all these subfolders one by one and append the csv files data present in them into a single csv based on the condition that the data from file name starting with MAAP should only be written to the main csv. Currently I am able to read data from single folder and append it's content to the main csv and there's no need to check for file name starting with MAAP as all files are MAAP file. But the scenario is changed as mentioned above. How do I improvise my code? Given below is the code that's currently working for me for single folder: import os import csv import shutil path = r'C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ' fileNames = os.listdir(path) ClearFile = r'C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ\Z_ALL_DATA.csv' f = open(ClearFile,"w+") f.close() output = r'C:\Users\hu170f\Documents\WORK\MAAP_FILE_DB_REQ\Z_ALL_DATA.csv' # Copy the first file: shutil.copyfile(os.path.join(path,fileNames[0]), output) # Append the remaining file contents, excluding each first line out = open(output, 'a') for file in fileNames[1:]: in_ = open(os.path.join(path, file), 'r') out.write(''.join(in_.readlines()[1:])) out.close()
How to create multiple csv file in a multiple folders
I have a function that generates multiple CSV files with the current date stamp. but I need to store that CSV file in a specific folder with a different timestamp. I want to store those number of folders in some location. for example: First I need to create folders. then in each folder, I need to store generated csv files This is my code for creating multiple days but it didn't create the CSV file under that created folder: import os import csv from datetime import datetime sequenc_date = ['20210311', '20210312', '20210313', '20210314', '20210315', '20210316', '20210317'] path = os.getcwd() print(path) data1 = ["csvfiles_data"] for folder in sequenc_date: if not os.path.isdir(folder): os.makedirs(os.path.join(path, folder)) with open("output_file_{}-{}.csv".format(datetime.now().replace(hour=00, minute=00, second=00).strftime("%Y%m%d%H%M%S"),datetime.now().replace(hour=00, minute=00, second=00).strftime("%Y%m%d%H%M%S")), 'w') as f: writer = csv.writer(f) writer.writerows(data1)
You need to give the full path or the relative path to the CSV file, otherwise, python just assumes it to be in the current working directory. with open("./{}/output_file_{}-{}.csv".format(folder, datetime.now().replace(hour=00, minute=00, second=00).strftime("%Y%m%d%H%M%S"),datetime.now().replace(hour=00, minute=00, second=00).strftime("%Y%m%d%H%M%S")), 'w') as f:
Parsing through each folder to pull in information in python
I have a directory with a folder for each customer. In each customer folder there is a csv file named surveys.csv. I want to open each customer folder and then pull the data from the csv and concatenate. I also want to create a column with that customer id which is the name of the folder. import os rootdir = '../data/customer_data/' for subdir, dirs, files in os.walk(rootdir): for file in files: csvfiles = glob.glob(os.path.join(mycsvdir, 'surveys.csv')) # loop through the files and read them in with pandas dataframes = [] # a list to hold all the individual pandas DataFrames for csvfile in csvfiles: df = pd.read_csv(csvfile) df['patient_id'] = os.path.dirname dataframes.append(df) # concatenate them all together result = pd.concat(dataframes, ignore_index=True) result.head() This code is only giving me a dataframe with one customer's data. In the directory : '../data/customer_data/' there should be about 25 folders with customer data. I want to concatenate all the 25 of the surveys.csv files into a dataframe. Please help
Put this line: dataframes = [] Outside the outer for loop. It erases the list every loop. Another issues: In this line csvfiles = glob.glob(os.path.join(mycsvdir, 'surveys.csv')) - use subdir to get full path of the files. csvfiles is only one file - why do you use loop to read it?
How to apply the same process to multiple csv files in pandas and save it in another directory?
I've been trying to create a code, which runs through all the csv files inside the directory and applies the same operation to all of them. Afterwards it should save the new csv files in another directory. I've got two problems: First the code only saves the last iteration and second how do I save the files with different names? Here's my code so far: from pathlib import Path import pandas as pd dir = r'C:\my\path\to\file' csv_files = [f for f in Path(dir).glob('*.csv')] #list all csv for csv in csv_files: #iterate list df = pd.read_csv(csv, encoding = 'ISO-8859-1', engine='python', delimiter = ';') #read csv df.drop(df.index[:-1], inplace = True) #drop all but the last row df.to_csv("C:\new\path\to\file\variable name") #save the file in a new dir Rakesh answer works perfectly for me. Thank you guys for your input! :)
In this case maybe best thing is to save new file with same name/with a common suffix or in new directory. I've got two problems: First the code only saves the last iteration - It is because you are saving files with same name so each iteration overrides this file & only last file is available. and second how do I save the files with different names? - may be use same name for new files to & save in new directory or use some suffix like mycsv_modified.csv Below i created an example to save in new directory (I tested this code on non-window environment & using jupyter notebook)- from pathlib import Path import pandas as pd dir_b = r'/Users/rakeshkumar/bigquery' csv_files = [f for f in Path(dir_b).glob('*.csv')] #list all csv #!mkdir -p processed #I created new directory to save modified file in notebook itself, you can decide yourself about new directory for csv in csv_files: #iterate list df = pd.read_csv(csv, encoding = 'ISO-8859-1', engine='python', delimiter = ';') #read csv df.drop(df.index[:-1], inplace = True) #drop all but the last row print (df) df.to_csv(dir_b + "/processed/" + csv.name) #save the file in a new dir