I have a bunch of CSV files which I will be combining to a single CSV file named 'Combined'. For each CSV file, once the data is appended to the 'Combined' file, I want to insert a fresh column before column 1 in 'Combined' and insert the name of the CSV file from which data was copied in that iteration. Is there any way of doing this in Python?
This can be done as follows. First open a CSV file for output. Now use Python's glob library to list you all of the CSV files in a folder. For each row in a CSV file, prefix the filename as the first column entry and then write it to output.csv:
import glob
import csv
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
for filename in glob.glob('*.csv'):
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
for row in csv_input:
row.insert(0, filename)
csv_output.writerow(row)
So for example, if you had these two CSV files:
num.csv
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
letter.csv
a,b,c,d,e,f
a,b,c,d,e,f
a,b,c,d,e,f
a,b,c,d,e,f
It would create the following output.csv file:
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
num.csv,1,2,3,4,5
num.csv,1,2,3,4,5
num.csv,1,2,3,4,5
This assumes you are using Python 3.x.
Related
I can read a text file with names and print in ascending order to console. I simply want to write the sorted names to a column in a CSV file. Can't I take the printed(file) and send to CSV?
Thanks!
import csv
with open('/users/h/documents/pyprojects/boy-names.txt','r') as file:
for file in sorted(file):
print(file, end='')
#the following isn't working.
with open('/users/h/documents/pyprojects/boy-names.csv', 'w', newline='') as csvFile:
names = ['Column1']
writer = csv.writer(names)
print(file)
You can do something like this:
import csv
with open('boy-names.txt', 'rt') as file, open('boy-names.csv', 'w', newline='') as csv_file:
csv_writer = csv.writer(csv_file, quoting=csv.QUOTE_MINIMAL)
csv_writer.writerow(['Column1'])
for boy_name in sorted(file.readlines()):
boy_name = boy_name.rstrip('\n')
print(boy_name)
csv_writer.writerow([boy_name])
This is covered in the documentation.
The only tricky part is converting the lines from the file to a list of 1-element lists.
import csv
with open('/users/h/documents/pyprojects/boy-names.txt','r') as file:
names = [[k.strip()] for k in sorted(file.readlines())]
with open('/users/h/documents/pyprojects/boy-names.csv', 'w', newline='') as csvFile:
writer = csv.writer(csvFile)
writer.writerow(['Column1'])
writer.writerows(names)
So, names will contain (for example):
[['Able'],['Baker'],['Charlie'],['Delta']]
The CSV recorder expects to write a row or a set of rows. EACH ROW has to be a list (or tuple). That's why I created it like I did. By calling writerows, the outer list contains the set of rows to be written. Each element of the outer list is a row. I want each row to contain one item, so each is a one element list.
If I had created this:
['Able','Baker','Charlie','Delta']
then writerows would have treated each string as a sequence, resulting in a CSV file like this:
A,b,l,e
B,a,k,e,r
C,h,a,r,l,i,e
D,e,l,t,a
which is amusing but not very useful. And I know that because I did it while I was creating your answer.
I have a folder with several csv files (5k+), to work with them it would be ideal to have the same variable names and number of columns. But this is not the case.
To proceed for the cleaning, I would like to create some subfolders conditional on their columns. For example, if two or more csv have the same columns and variable names, create a subfolder with them.
So far I found how to combine all the files, but I don't know where to put the condition with the matching columns subfolders.
import glob
import pandas as pd
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
col_combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
To merge all CSV files together with the same header in a folder, the following approach could be used:
import csv
import glob
csv_files = {} # (header as tuple) : csv.writer()
header_type_count = 1
for filename in glob.glob('*.csv'):
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
header = tuple(next(csv_input))
try:
csv_files[header].writerows(csv_input)
except KeyError:
f_output = open(f'header_v{header_type_count:02}.csv', 'w', newline='')
header_type_count += 1
csv_output = csv.writer(f_output)
csv_files[header] = csv_output
csv_output.writerow(header)
csv_output.writerows(csv_input)
This works by keeping track of all of the different header types and allows them to be concatenated on the fly. For each new header type found, it opens a new output CSV file (e.g. header_v01.csv).
csv_files maps header types to open csv.writer() objects to allow extra rows to be written.
This approach avoids needing to hold all the data in memory at the same time.
Suppose , I have an input.csv file which has 10 rows.Now i have to write only the odd rows to another csv file output.csv.how to do this using csv in python?
I tried using writer.writerow but ,it writes only the last odd row.That is , it overwirtes the rows from input.csv in just one row in output.csv.
how to resolve this?
def filter_rows(s):
r=0
data=[]
with open('csv/github_issues_preproc1.csv', 'rb') as f,open('csv/preproc1f.csv', 'wb') as f_out:
reader = csv.reader(f)
writer = csv.writer(f_out)
for row in reader:
r=r+1
if r==s:
data.append(row)
writer.writerows(data)
I have a txt file that have several lines for the headers which are represented by a '#'.
Then I have three columns each with their own header that I want to copy into a csv file that will allow for each column to have their own column in the spreadsheet.
Currently all I am able to get is a file that has all three columns in one section of the csv.
import csv
infile = r'path\seawater_nh.txt'
outfile = r'path\emissivity_new.csv'
print "definitions successful"
in_txt = csv.reader(open(infile, 'rb'), delimiter = '\t')
out_csv = csv.writer(open(outfile, 'wb'))
out_csv.writerows(in_txt)
In the absence of your sample input and output files, I'm guessing here. But perhaps change how your files are read and written to (note: depending on the OS, you may need to change how the lines are read).
import csv
infile = r'path\seawater_nh.txt'
outfile = r'path\emissivity_new.csv'
with open(infile, "r") as in_text:
in_reader = csv.reader(infile , delimiter = '\t')
with open(outfile, "w") as out_csv:
out_writer = csv.writer(out_csv, newline='')
for row in in_reader:
out_writer.writerow(row)
I have CSV files in which Data is formatted as follows:
file1.csv
ID,NAME
001,Jhon
002,Doe
fille2.csv
ID,SCHOOLS_ATTENDED
001,my Nice School
002,His lovely school
file3.csv
ID,SALARY
001,25
002,40
ID field is kind of primary key that will be used to fetch record.
What is the most efficient way to read 3 to 4 files and get corresponding data and store in another CSV file having headings (ID,NAME,SCHOOLS_ATTENDED,SALARY)?
The file sizes are in the hundreds of MBs (100, 200 Mb).
Hundreds of megabytes aren't that much. Why not go for a simple approach using the csv module and collections.defaultdict:
import csv
from collections import defaultdict
result = defaultdict(dict)
fieldnames = {"ID"}
for csvfile in ("file1.csv", "file2.csv", "file3.csv"):
with open(csvfile, newline="") as infile:
reader = csv.DictReader(infile)
for row in reader:
id = row.pop("ID")
for key in row:
fieldnames.add(key) # wasteful, but I don't care enough
result[id][key] = row[key]
The resulting defaultdict looks like this:
>>> result
defaultdict(<type 'dict'>,
{'001': {'SALARY': '25', 'SCHOOLS_ATTENDED': 'my Nice School', 'NAME': 'Jhon'},
'002': {'SALARY': '40', 'SCHOOLS_ATTENDED': 'His lovely school', 'NAME': 'Doe'}})
You could then combine that into a CSV file (not my prettiest work, but good enough for now):
with open("out.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile, sorted(fieldnames))
writer.writeheader()
for item in result:
result[item]["ID"] = item
writer.writerow(result[item])
out.csv then contains
ID,NAME,SALARY,SCHOOLS_ATTENDED
001,Jhon,25,my Nice School
002,Doe,40,His lovely school
Following is the working code for combining multiple csv files with specific keywords in their names into 1 final csv file. I have set the default keyword to "file" but u can set it blank if u want to combine all csv files from a folder_path. This code will take header from your first csv file and use it as a header in final combined csv file. It will ignore headers of all other csv files.
import glob,os
#staticmethod
def Combine_multiple_csv_files_thatContainsKeywordInTheirNames_into_one_csv_file(folder_path,keyword='file'):
#takes header only from 1st csv, all other csv headers are skipped and data is appened to final csv
fileNames = glob.glob(folder_path + "*" + keyword + "*"+".csv") # fileNames INCLUDES FOLDER_PATH TOO
with open(folder_path+"Combined_csv.csv", "w", newline='') as fout:
print('Combining multiple csv files into 1')
csv_write_file = csv.writer(fout, delimiter=',')
# a.writerows(op)
with open(fileNames[0], mode='rt') as read_file: # utf8
csv_read_file = csv.reader(read_file, delimiter=',') # CSVREADER READS FILE AS 1 LIST PER ROW. SO WHEN WRITIN TO ANOTHER CSV FILE WITH FUNCTION WRITEROWS, IT INTRODUCES ANOTHER NEW LINE '\N' CHARACTER. SO TO AVOID DOUBLE NEWLINES , WE SET NEWLINE AS '' WHEN WE OPEN CSV WRITER OBJECT
csv_write_file.writerows(csv_read_file)
for num in range(1, len(fileNames)):
with open(fileNames[num], mode='rt') as read_file: # utf8
csv_read_file = csv.reader(read_file, delimiter=',') # CSVREADER READS FILE AS 1 LIST PER ROW. SO WHEN WRITIN TO ANOTHER CSV FILE WITH FUNCTION WRITEROWS, IT INTRODUCES ANOTHER NEW LINE '\N' CHARACTER. SO TO AVOID DOUBLE NEWLINES , WE SET NEWLINE AS '' WHEN WE OPEN CSV WRITER OBJECT
next(csv_read_file) # ignore header
csv_write_file.writerows(csv_read_file)