Python: add column to CSV file based on existing column - python

I already have written what I need for identifying and parsing the value I am seeking, I need help writing a column to the csv file (or a new csv file) with the parsed value. Here's some pseudocode / somewhat realistic Python code for what I am trying to do:
# Given a CSV file, this function creates a new CSV file with all values parsed
def handleCSVfile(csvfile):
with open(csvfile, 'rb') as file:
reader = csv.reader(file, delimiter=',', lineterminator='\n')
for row in reader:
for field in row:
if isWhatIWant(field):
parsedValue = parse(field)
# write new column to row containing parsed value
I've already written the isWhatIWant and parse functions. If I need to write a completely new csv file, then I am not sure how to have both open simultaneously and read and write from one into the other.

I'd do it like this. I'm guessing that isWhatIWant() is something that is supposed to replace a field in-place.
import csv
def handleCSVfile(infilename, outfilename):
with open(infilename, 'rb') as infile:
with open(outfilename, 'wb') as outfile:
reader = csv.reader(infile, lineterminator='\n')
writer = csv.writer(outfile, lineterminator='\n')
for row in reader:
for field_index, field in enumerate(row):
if isWhatIWant(field):
row[field_index] = parse(field)
writer.writerow(row)
This sort of pattern occurs a lot and results in really long lines. It can sometimes be helpful to break out the logic from opening and files into a different function, like this:
import csv
def load_save_csvfile(infilename, outfilename):
with open(infilename, 'rb') as infile:
with open(outfilename, 'wb') as outfile:
reader = csv.reader(infile, lineterminator='\n')
writer = csv.writer(outfile, lineterminator='\n')
read_write_csvfile(reader, writer)
def read_write_csvfile(reader, writer)
for row in reader:
for field_index, field in enumerate(row):
if isWhatIWant(field):
row[field_index] = parse(field)
writer.writerow(row)
This modularizes the code, making it easier for you to change the way the files and formats are handled from the logic independently from each other.
Additional hints:
Don't name variables file as that is a built-in function. Shadowing those names will bite you when you least expect it.
delimiter=',' is the default so you don't need to specify it explicitly.

Related

Delete rows from csv file using function in Python

def usunPsa(self, ImiePsa):
with open('schronisko.csv', 'rb') as input, open('schronisko.csv', 'wb') as output:
writer = csv.writer(output)
for row in csv.reader(input):
if row[0] == ImiePsa:
writer.writerow(row)
with open(self.plik, 'r') as f:
print(f.read())
Dsac;Chart;2;2020-11-04
Dsac;Chart;3;2020-11-04
Dsac;Chart;4;2020-11-04
Lala;Chart;4;2020-11-04
Sda;Chart;4;2020-11-04
Sda;X;4;2020-11-04
Sda;Y;4;2020-11-04
pawel;Y;4;2020-11-04`
If I use usunPsa("pawel") every line gets removed.
Following code earse my whole csv file instead only one line with given ImiePsa,
What may be the problem there?
I found the problem. row[0] in your code returns the entire row, that means the lines are not parsed correctly. After a bit of reading, I found that csv.reader has a parammeter called delimiter to sepcify the delimiter between columns.
Adding that parameter solves your problem, but not all problems though.
The code that worked for me (just in case you still want to use your original code)
import csv
def usunPsa(ImiePsa):
with open('asd.csv', 'rb') as input, open('schronisko.csv', 'wb') as output:
writer = csv.writer(output)
for row in csv.reader(input, delimiter=';'):
if row[0] == ImiePsa:
writer.writerow(row)
usunPsa("pawel")
Notice that I changed the output filename. If you want to keep the filename the same however, you have to use Hamza Malik's answer.
Just read the csv file in memory as a list, then edit that list, and then write it back to the csv file.
lines = list()
members= input("Please enter a member's name to be deleted.")
with open('mycsv.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field == members:
lines.remove(row)
with open('mycsv.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)

Appending a row of data to line 2 in a large CSV File

I'm sure this is a really easy question but I can't seem to find any information on it.
I have a very large CSV file which I need to insert a row directly after the header which helps with another code that reads the csv and joins it to a parcel shapefile.
I have the code to append the row of data that I want, but it will only go to the last line. I cannot figure out how to get the code to insert my row immediately after the header row. Here is my code:
import os
import csv
insert_row = '"AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00'
os.chdir(r"D:\PROPERTY\PINELLAS\Data_20201001_t")
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail.csv", 'a', newline = "") as new_file:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(new_file)
csv_writer.writerow(insert_row)
So that's it. I just need the insert_row line of data to be in row position number 2 instead of at the end of the file. Thank you.
You can't just insert a row in the middle of a file unless replacing data of exactly the same length. You have to read the entire file, edit it, and re-write it.
Something like this should work:
import csv
# This must be an iterable not a string
insert_row = "AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail_updated.csv", 'w', newline = "") as new_file:
csv_reader = csv.reader(csv_file)
csv_writer = csv.writer(new_file)
header = next(csv_reader)
csv_writer.writerow(header)
csv_writer.writerow(insert_row)
for line in csv_reader:
csv_writer.writerow(line)
If the CSV file is not too large to fit entirely in memory than you can read all the lines at once, edit them, and write them back out to the same file. It's riskier if there is a problem. Safer to write to a new file, then delete original and rename if no errors:
import csv
# This must be an iterable not a string
insert_row = "AAAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00
with open("owner_mail.csv", 'r') as csv_file:
rows = list(csv.reader(csv_file))
rows.insert(1,insert_row) # insert after header row
with open("owner_mail.csv", 'w') as csv_file:
w = csv.writer(csv_file)
w.writerows(rows)
Please try this:
import os
import csv
insert_row = '"AAAAAAAAAAAAAAAAAA","**********","**********","**********","**********","**********","**********","**","**********","**********","****","**********",999999,9999,00'
with open("owner_mail.csv", 'r') as csv_file, open("owner_mail.csv", 'w') as new_file:
csv_reader = csv.reader(csv_file)
reader = list(csv_reader)
reader.insert(1,insert_row)
csv_writer = csv.writer(new_file)
csv_writer.writerows(reader)

Need help in extracting data from csv and writing to a text file

I have a csv with two columns of data. I want to extract data from one column and write to a text file with single-quote on each element and separated by a comma. For example, I have this..
taxable_entity_id,id
45efc167-9254-406c-b5a8-6aef91a73dd9,331999
5ae97680-f489-4182-9dcb-eb07a73fab15,103507
00018d93-ae71-4367-a0da-f252cea4dfa2,32991
I want all the taxable_entity_ids in a text file like this
'45efc167-9254-406c-b5a8-6aef91a73dd9','5ae97680-f489-4182-9dcb-eb07a73fab15','00018d93-ae71-4367-a0da-f252cea4dfa2'
without any space between two elements, separated by a comma.
Edit:
This is what i tried..
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_MINIMAL)
for row in reader:
writer.writerow(row["taxable_entity_id"])
# print(row["taxable_entity_id"])
text_file.close()
csv_File.close()
and this is what I got..
4,5,e,f,c,1,6,7,-,9,2,5,4,-,4,0,6,c,-,b,5,a,8,-,6,a,e,f,9,1,a,7,3,d,d,9
5,a,e,9,7,6,8,0,-,f,4,8,9,-,4,1,8,2,-,9,d,c,b,-,e,b,0,7,a,7,3,f,a,b,1,5
0,0,0,1,8,d,9,3,-,a,e,7,1,-,4,3,6,7,-,a,0,d,a,-,f,2,5,2,c,e,a,4,d,f,a,2
You were close. Simply as you want one single line in the output file, you should write it at once by using a comprehension:
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
# use QUOTE_ALL to force the quoting
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_ALL)
writer.writerow((row["taxable_entity_id"] for row in reader))
And do not use close as you have (correctly) used with.
try that
import pandas as pd
df = pd.read_csv('nameoffile.csv',delimiter = ',')
X = df[0].values
f = open('newfile.txt','w')
for i in X:
f.write(X[i] + ',')
f.close()
It's seems a little odd that you basically want a one row csv file for the taxable_entity_ids, but certain possible. You also don't need to explicitly close() the open files because the with context manager will do it for you automatically.
You also need to open the CSV file with newline='' as shown in all the examples in the csv module's documentation.
Lastly, if you want the all the fields to be quoted you need to use quoting=csv.QUOTE_ALL instead of quoting=csv.QUOTE_MINIMAL.
import csv
inp_filename = "Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv"
outp_filename = "te_id.csv"
with open(outp_filename, 'w', newline='') as text_file, \
open(inp_filename, 'r', newline='') as csv_File:
reader = csv.DictReader(csv_File)
writer = csv.writer(text_file, quotechar="'", quoting=csv.QUOTE_ALL)
taxable_entity_ids = (row["taxable_entity_id"] for row in reader)
writer.writerow(taxable_entity_ids)
print('done')

Viewing a CSV in Python

How would I go about correcting this code, so that I can view the contents of the CSV?
import csv
def csv_to_list("jo.csv", delimiter=','):
with open("jo.csv", 'r') as csv_con:
reader = csv.reader(csv_con, delimiter=delimiter)
return list(reader)
I don't know what you are trying to do but the proper usage of csv.reader is:
import csv
with open("jo.csv", 'r') as csv_con:
reader = csv.reader(csv_con, delimiter=delimiter)
for row in reader:
# Process rows here
print(', '.join(row))
One of the goals of csv.reader is not to load the whole file in the reader but to access it row by row.

How to skip the headers when processing a csv file using Python?

I am using below referred code to edit a csv using Python. Functions called in the code form upper part of the code.
Problem: I want the below referred code to start editing the csv from 2nd row, I want it to exclude 1st row which contains headers. Right now it is applying the functions on 1st row only and my header row is getting changed.
in_file = open("tmob_notcleaned.csv", "rb")
reader = csv.reader(in_file)
out_file = open("tmob_cleaned.csv", "wb")
writer = csv.writer(out_file)
row = 1
for row in reader:
row[13] = handle_color(row[10])[1].replace(" - ","").strip()
row[10] = handle_color(row[10])[0].replace("-","").replace("(","").replace(")","").strip()
row[14] = handle_gb(row[10])[1].replace("-","").replace(" ","").replace("GB","").strip()
row[10] = handle_gb(row[10])[0].strip()
row[9] = handle_oem(row[10])[1].replace("Blackberry","RIM").replace("TMobile","T-Mobile").strip()
row[15] = handle_addon(row[10])[1].strip()
row[10] = handle_addon(row[10])[0].replace(" by","").replace("FREE","").strip()
writer.writerow(row)
in_file.close()
out_file.close()
I tried to solve this problem by initializing row variable to 1 but it didn't work.
Please help me in solving this issue.
Your reader variable is an iterable, by looping over it you retrieve the rows.
To make it skip one item before your loop, simply call next(reader, None) and ignore the return value.
You can also simplify your code a little; use the opened files as context managers to have them closed automatically:
with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
reader = csv.reader(infile)
next(reader, None) # skip the headers
writer = csv.writer(outfile)
for row in reader:
# process each row
writer.writerow(row)
# no need to close, the files are closed automatically when you get to this point.
If you wanted to write the header to the output file unprocessed, that's easy too, pass the output of next() to writer.writerow():
headers = next(reader, None) # returns the headers or `None` if the input is empty
if headers:
writer.writerow(headers)
Another way of solving this is to use the DictReader class, which "skips" the header row and uses it to allowed named indexing.
Given "foo.csv" as follows:
FirstColumn,SecondColumn
asdf,1234
qwer,5678
Use DictReader like this:
import csv
with open('foo.csv') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
print(row['FirstColumn']) # Access by column header instead of column number
print(row['SecondColumn'])
Doing row=1 won't change anything, because you'll just overwrite that with the results of the loop.
You want to do next(reader) to skip one row.
Simply iterate one time with next()
with open(filename) as file:
csvreaded = csv.reader(file)
header = next(csvreaded)
for row in csvreaded:
empty_list.append(row) #your csv list without header
or use [1:] at the end of reader object
with open(filename) as file:
csvreaded = csv.reader(file)
header = next(csvreaded)
for row in csvreaded[1:]:
empty_list.append(row) #your csv list without header
Inspired by Martijn Pieters' response.
In case you only need to delete the header from the csv file, you can work more efficiently if you write using the standard Python file I/O library, avoiding writing with the CSV Python library:
with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
next(infile) # skip the headers
outfile.write(infile.read())

Categories