I'm trying to write a function that reads a sheet of an existing .csv file and every 20 rows are copied to a newly created csv file. Therefore, it needs to be designed like a file counter "file_01, file_02, file_04,...," where the first 20 rows are copied to file_01, the next 20 to file_02.csv, and so on.
Currently I have this code which hasn't worked for me work so far.
import csv
import os.path
from itertools import islice
N = 20
new_filename = ""
filename = ""
with open(filename, "rb") as file: # the a opens it in append mode
reader = csv.reader(file)
for i in range(N):
line = next(file).strip()
#print(line)
with open(new_filename, 'wb') as outfh:
writer = csv.writer(outfh)
writer.writerow(line)
writer.writerows(islice(reader, 2))
I have attached a file for testing.
https://1drv.ms/u/s!AhdJmaLEPcR8htYqFooEoYUwDzdZbg
32.01,18.42,58.98,33.02,55.37,63.25,12.82,-32.42,33.99,179.53,
41.11,33.94,67.85,57.61,59.23,94.69,19.43,-19.15,21.71,-161.13,
49.80,54.12,72.78,100.74,56.97,128.84,26.95,-6.76,10.07,-142.62,
55.49,81.02,68.93,148.17,49.25,157.32,34.94,5.39,0.44,-123.32,
56.01,112.81,59.27,177.87,38.50,179.63,43.43,18.42,-5.81,-102.24,
50.79,142.87,48.06,-162.32,26.60,-161.21,52.38,34.37,-7.42,-79.64,
41.54,167.36,37.12,-145.93,15.01,-142.84,60.90,57.05,-4.47,-56.54,
30.28,-172.09,27.36,-130.24,5.11,-123.66,66.24,91.12,-0.76,-35.44,
18.64,-153.20,19.52,-114.09,-1.54,-102.96,64.77,131.32,5.12,-21.68,
7.92,-134.07,14.24,-96.93,-3.79,-80.91,57.10,162.35,12.51,-9.21,
-0.34,-113.74,11.80,-78.73,-2.49,-58.46,46.75,-175.86,20.81,2.87,
-4.81,-91.85,11.78,-60.28,0.59,-39.26,35.75,-158.12,29.79,15.71,
-4.76,-68.67,13.79,-43.84,6.82,-24.69,25.27,-141.56,39.05,30.71,
-1.33,-46.42,18.44,-30.23,14.53,-11.95,16.21,-124.45,47.91,50.25,
4.14,-29.61,24.89,-18.02,23.01,0.10,9.59,-106.05,54.46,77.07,
11.04,-15.39,32.33,-6.66,31.92,12.48,6.24,-86.34,55.72,110.53,
18.69,-2.32,40.46,4.57,41.11,26.87,6.07,-65.68,50.25,142.78,
26.94,10.56,49.18,16.67,49.92,45.39,8.06,-46.86,40.13,168.29,
35.80,24.58,58.45,31.99,56.83,70.92,12.96,-31.90,28.10,-171.07,
44.90,41.72,67.41,55.89,59.21,103.94,19.63,-18.67,15.97,-152.40,
-5.41,-77.62,11.40,-63.21,4.80,-29.06,31.33,-151.44,43.00,37.25,
-2.88,-54.38,13.08,-46.00,12.16,-15.86,21.21,-134.62,51.25,59.16,
1.69,-35.73,17.44,-32.01,20.37,-3.78,13.06,-117.10,56.18,88.98,
8.15,-20.80,23.70,-19.66,29.11,8.29,7.74,-98.22,54.91,123.30,
15.52,-7.45,31.04,-8.22,38.22,21.78,5.76,-77.99,47.34,153.31,
23.53,5.38,39.07,2.98,47.29,38.71,6.58,-57.45,36.18,176.74,
32.16,18.76,47.71,14.88,55.08,61.71,9.76,-40.52,23.99,-163.75,
41.27,34.36,56.93,29.53,59.23,92.75,15.53,-26.40,12.16,-145.27,
49.92,54.65,66.04,51.59,57.34,126.97,22.59,-13.65,2.14,-126.20,
55.50,81.56,72.21,90.19,49.88,155.84,30.32,-1.48,-4.71,-105.49,
55.92,113.45,70.26,139.40,39.23,178.48,38.55,10.92,-7.09,-83.11,
50.58,143.40,61.40,172.50,27.38,-162.27,47.25,24.86,-4.77,-60.15,
41.30,167.74,50.34,-166.33,15.74,-143.93,56.21,43.14,-0.54,-38.22,
30.03,-171.78,39.24,-149.48,5.71,-124.87,63.77,70.19,4.75,-24.15,
18.40,-152.91,29.17,-133.78,-1.18,-104.31,66.51,108.81,11.86,-11.51,
7.69,-133.71,20.84,-117.74,-3.72,-82.28,61.95,146.15,20.05,0.65,
-0.52,-113.33,14.97,-100.79,-2.58,-59.75,52.78,172.46,28.91,13.29,
-4.91,-91.36,11.92,-82.84,0.34,-40.12,41.93,-167.91,38.21,27.90,
These are some of the problems with your current solution.
You created a csv.reader object but then you did not use it
You read each line but then you did not store them anywhere
You are not keeping track of 20 rows which was supposed to be your requirement
You created the output file in a separate with block which does not have access anymore to the read lines or the csv.reader object
Here's a working solution:
import csv
inp_file = "input.csv"
out_file_pattern = "file_{:{fill}2}.csv"
max_rows = 20
with open(inp_file, "r") as inp_f:
reader = csv.reader(inp_f)
all_rows = []
cur_file = 1
for row in reader:
all_rows.append(row)
if len(all_rows) == max_rows:
with open(out_file_pattern.format(cur_file, fill="0"), "w") as out_f:
writer = csv.writer(out_f)
writer.writerows(all_rows)
all_rows = []
cur_file += 1
The flow is as follows:
Read each row of the CSV using a csv.reader
Store each row in an all_rows list
Once that list gets 20 rows, open a file and write all the rows to it
Use the csv.writer's writerows method
Use a cur_file counter to format the filename
Every time 20 rows are dumped to a file, empty out the list and increment the file counter
This solution includes the blank lines as part of the 20 rows. Your test file has actually 19 rows of CSV data and 1 row for a blank line. If you need to skip the blank line, just add a simple check of
if not row:
continue
Also, as I mentioned in a comment, I assume that the input file is an actual CSV file, meaning it's a plain text file with CSV formatted data. If the input is actually an Excel file, then solutions like this won't work, because you'll need some special libraries to read Excel files, even if the contents visually looks like CSV or even if you rename the file to .csv.
Without using any special CSV libraries (e.g. csv, though you could, just that I don't know how to use them, however don't think it is necessary for this case), you could:
excel_csv_fp = open(r"<file_name>", "r", encoding="utf-8") # Check proper encoding for your file
csv_data = excel_csv_fp.readlines()
file_counter = 0
new_file_name = ""
new_fp = ""
for line in csv_data:
if line == "":
if new_fp != "":
new_fp.close()
file_counter += 1
new_file_name = "file_" + "{:02d}".format(file_counter) # 1 turns into 01 and 10 turns 10 i.e. remains the same
new_fp = open("<some_path>/" + new_file_name + ".csv", "w", encoding="utf-8") # Makes a new CSV file to start writing to
elif new_fp != "": # Updated code to make sure new_fp is a file pointer and not a string
new_fp.write(line) # Write each line after a space
If you have any questions on any of the code (how it works, why I choose what etc.), just ask in the comments and I'll try to reply as soon as possible.
I have this code that reads through my csv files ( p01_results, p02_results, ..... ) to remove some unwanted rows based on its number from, and it works. Right now I trying to add two columns participantID and session. For participantID I tried to read the name of the csv file, save the ID number (01,02, ...) and fill the column with it. For session, I tried to fill every 18 rows with 1s, 2s, 3s and 4s.
I tried to use this code into mine, but didn't work:
test4 = ['test4', 4, 7, 10]
with open(data.csv, 'r') as ifile
with open(adjusted.csv, 'w') as ofile:
for line, new in zip(ifile, test4):
new_line = line.rstrip('\n') + ',' + str(new) + '\n'
ofile.write(new_line)
import os
base_directory = 'C:\\Users\\yosal\\Desktop\\results'
for dir_path, dir_name_list, file_name_list in os.walk(base_directory):
for file_name in file_name_list:
# If this is not a CSV file
if not file_name.endswith('results.csv'):
# Skip it
continue
file_path = os.path.join(dir_path, file_name)
with open(file_path, 'r') as ifile:
line_list = ifile.readlines()
with open(file_path, 'w') as ofile:
# only write these rows to the new file
ofile.writelines(line_list[0])
ofile.writelines(line_list[2:20])
ofile.writelines(line_list[21:39])
ofile.writelines(line_list[40:58])
ofile.writelines(line_list[59:77])
Try reading the CSV into a list. Then, loop through each element of the list (each element being a row in the CSV), and add a string with the delimieter plus the desired string. Then, write a new CSV, either named differently or replacing the old one, and just use your list as the input.
I tried adding a column to my csv file using pandas. So you can try out something like this. First you have to install pandas by running "pip install pandas".
import pandas as pd
df = pd.read_csv('data.csv') ## read the csv file
df.set_index('S/N', inplace=True) ## you can set an index with any column
##you have that already exists in your csv
##in my case it is the "S/N" column i used
df["test"] = ["values","you want","add"]
df.to_csv('data.csv')
Took me some time, but I did it.
import os
base_directory = 'C:\\Users\\yosal\\Desktop\\results'
for dir_path, dir_name_list, file_name_list in os.walk(base_directory):
for file_name in file_name_list:
# If this is not a CSV file
if not file_name.endswith('results.csv'):
# Skip it
continue
file_path = os.path.join(dir_path, file_name)
with open(file_path, 'r') as ifile:
line_list = ifile.readlines()
with open(file_path, 'w') as ofile:
ofile.writelines(str(line_list[0]).rstrip()+",participant,session\n")
for x in range(2, 20):
ofile.writelines(str(line_list[x]).rstrip()+","+file_path[len(base_directory)+2:len(base_directory)+4]+",1\n")
for y in range(21, 39):
ofile.writelines(str(line_list[y]).rstrip()+","+file_path[len(base_directory)+2:len(base_directory)+4]+",2\n")
for h in range(40, 58):
ofile.writelines(str(line_list[h]).rstrip()+","+file_path[len(base_directory)+2:len(base_directory)+4]+",3\n")
for z in range(59 ,77):
ofile.writelines(str(line_list[z]).rstrip()+","+file_path[len(base_directory)+2:len(base_directory)+4]+",4\n")
I am trying to create an archive to store a list of available books in the system. I want my program to ask the user to input csv file, read a list of books from that file, check the year of publication and delete the row if the book is older than 7 years.
I want to keep everything in a single file.
So far, instead of deleting certain rows, writerow deletes everything in the file. Could someone help me to understand how to fix it?
import csv
import os
import time
archive = os.listdir()
def get_user_files(self):
while True:
for position, file_name in enumerate(archive):
print(position, "-", file_name)
userInput = input("\n\n ")
if (int(userInput) < 0) or (int(userInput) > len(archive)):
print("Invalid Input. Try again. \n")
else:
print("Loading succesful!")
break
global cvs_list
cvs_list = archive[int(userInput)] # Store file
archive.remove(cvs_list) # Remove from the list
with open(cvs_list, 'r') as in_file, open(cvs_list, 'w') as out_file:
reader = csv.reader(in_file)
writer = csv.writer(out_file)
for row in reader:
next(reader) #skip headers
if int(row[2]) < 2011:
writer.writerow(row)
Edit:
with open(cvs_list, 'r') as in_file:
csv_in = csv.reader(in_file, quoting=csv.QUOTE_ALL)
filtered_list = []
row1 = next(csv_in)
filtered_list.append(row1)
for row in csv_in:
if int(row[2]) >= 2011:
row.append(filtered_list)
with open(cvs_list, 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows(filtered_list)
It's generally not advised to read and write to the same open file handle for reasons like this. Instead, read the entire file to a data structure, and in a separate with block, write your new data. This also makes it easier to write to a different file (perhaps with a timestamp attached), which can be handy when you (like everyone) inevitably screw something up and need to try your new code on your old data- you have a backup.
import csv
def filter_dates(csv_filepath)
with open(csv_filepath, 'r') as in_file:
csv_in = list(csv.reader(in_file))
# create accumulator for new list to add only valid values
filtered_list = []
filetered_list.append(csv_in[0]) # append header to new list
# filter the list making sure no errors appear BEFORE writing to the file
for row in csv_in[1:]: #skip header (first row)
if int(row[2]) >= 2011: # if entry is NEWER OR EQUAL TO than 7 years, we add it to filtered_list
filtered_list.append(row)
# now filtered_list contains only entires where index position 2 contains valid years
with open(csv_filepath, 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows
Here is a fully independent example:
csv_in = [
['name', 'sbin', 'year'],
['moby_dick', 'sbin', '1851'],
['new_book', 'sbin', '2011'],
['newest_book', 'sbin', '2018'],
]
filtered_list = []
filtered_list.append(csv_in[0]) # this is where the header is added
for row in csv_in[1:]: #skip header (first row)
if int(row[2]) >= 2011:
filtered_list.append(row)
print(filtered_list)
A couple of notes:
it's generally good to store this kind of stuff in memory before you open the file to write (or overwrite in this case) so that any error while reading and filtering the file happens before we try to modify the output
easiest way to overwrite a file is to first read it, commit the contents to memory (csv_in the array I've defined in the first with block this case), and then *finally8 once the data is ready (filtered_list) for 'shipping' commit it to a file
never ever use the global declaration in python, it's never worth it and causes a lot of headaches down the line
test
You have:
with open(cvs_list, 'r') as in_file:
csv_in = csv.reader(in_file, quoting=csv.QUOTE_ALL)
filtered_list = []
row1 = next(csv_in)
filtered_list.append(row1)
for row in csv_in:
if int(row[2]) >= 2011:
row.append(filtered_list)
# WRONG! you are opening the same file for output
# in the upper block
with open(cvs_list, 'w') as out_file:
writer = csv.writer(out_file)
writer.writerows(filtered_list)
It is far better to read and write at the same time then copy the tmp file onto the source.
Like this:
# NOT TESTED!
with open(cvs_list, 'r') as in_file, open(tmp_file, 'w') as out_file:
csv_in = csv.reader(in_file, quoting=csv.QUOTE_ALL)
writer = csv.writer(out_file)
writer.writerow(next(csv_in))
writer.writerows(row for row in csv_in if int(row[2])>=2011)
Then at the end of that with block you can copy the temp file on top of the source file:
from shutil import move
move(tmp_file, cvs_list)
I have a CSV file that is being constantly appended. It has multiple headers and the only common thing among the headers is that the first column is always "NAME".
How do I split the single CSV file into separate CSV files, one for each header row?
here is a sample file:
"NAME","AGE","SEX","WEIGHT","CITY"
"Bob",20,"M",120,"New York"
"Peter",33,"M",220,"Toronto"
"Mary",43,"F",130,"Miami"
"NAME","COUNTRY","SPORT","NUMBER","SPORT","NUMBER"
"Larry","USA","Football",14,"Baseball",22
"Jenny","UK","Rugby",5,"Field Hockey",11
"Jacques","Canada","Hockey",19,"Volleyball",4
"NAME","DRINK","QTY"
"Jesse","Beer",6
"Wendel","Juice",1
"Angela","Milk",3
If the size of the csv files is not huge -- so all can be in memory at once -- just use read() to read the file into a string and then use a regex on this string:
import re
with open(ur_csv) as f:
data=f.read()
chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',data,re.S | re.M)
for i, chunk in enumerate(chunks, 1):
with open('/path/{}.csv'.format(i), 'w') as fout:
fout.write(chunk.group(1))
If the size of the file is a concern, you can use mmap to create something that looks like a big string but is not all in memory at the same time.
Then use the mmap string with a regex to separate the csv chunks like so:
import mmap
import re
with open(ur_csv) as f:
mf=mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
chunks=re.finditer(r'(^"NAME".*?)(?=^"NAME"|\Z)',mf,re.S | re.M)
for i, chunk in enumerate(chunks, 1):
with open('/path/{}.csv'.format(i), 'w') as fout:
fout.write(chunk.group(1))
In either case, this will write all the chunks in files named 1.csv, 2.csv etc.
Copy the input to a new output file each time you see a header line. Something like this (not checked for errors):
partNum = 1
outHandle = None
for line in open("yourfile.csv","r").readlines():
if line.startswith('"NAME"'):
if outHandle is not None:
outHandle.close()
outHandle = open("part%d.csv" % (partNum,), "w")
partNum += 1
outHandle.write(line)
outHandle.close()
The above will break if the input does not begin with a header line or if the input is empty.
You can use the python csv package to read your source file and write multile csv files based on the rule that if element 0 in your row == "NAME", spawn off a new file. Something like this...
import csv
outfile_name = "out_%.csv"
out_num = 1
with open('nameslist.csv', 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
csv_buffer = []
for row in csvreader:
if row[0] != "NAME":
csv_buffer.append(row)
else:
with open(outfile_name % out_num, 'wb') as csvout:
for b_row in csv_buffer:
csvout.writerow(b_row)
out_num += 1
csv_buffer = [row]
P.S. I haven't actually tested this but that's the general concept
Given the other answers, the only modification that I would suggest would be to open using csv.DictReader. pseudo code would be like this. Assuming that the first line in the file is the first header
Note that this assumes that there is no blank line or other indicator between the entries so that a 'NAME' header occurs right after data. If there were a blank line between appended files the you could use that as an indicator to use infile.fieldnames() on the next row. If you need to handle the inputs as a list, then the previous answers are better.
ifile = open(filename, 'rb')
infile = cvs.Dictreader(ifile)
infields = infile.fieldnames
filenum = 1
ofile = open('outfile'+str(filenum), 'wb')
outfields = infields # This allows you to change the header field
outfile = csv.DictWriter(ofile, fieldnames=outfields, extrasaction='ignore')
outfile.writerow(dict((fn, fn) for fn in outfields))
for row in infile:
if row['NAME'] != 'NAME':
#process this row here and do whatever is needed
else:
close(ofile)
# build infields again from this row
infields = [row["NAME"], ...] # This assumes you know the names & order
# Dict cannot be pulled as a list and keep the order that you want.
filenum += 1
ofile = open('outfile'+str(filenum), 'wb')
outfields = infields # This allows you to change the header field
outfile = csv.DictWriter(ofile, fieldnames=outfields, extrasaction='ignore')
outfile.writerow(dict((fn, fn) for fn in outfields))
# This is the end of the loop. All data has been read and processed
close(ofile)
close(ifile)
If the exact order of the new header does not matter except for the name in the first entry, then you can transfer the new list as follows:
infileds = [row['NAME']
for k in row.keys():
if k != 'NAME':
infields.append(row[k])
This will create the new header with NAME in entry 0 but the others will not be in any particular order.