Copy specific rows from csv to csv in Python 2.7 - python

So far I have been trying to copy specific rows including headers from original csv file to a new one. However, once I run my code it was copying a total mess creating a huge document.
This is one of the options I have tried so far, which seems to be the closest to the solution:
import csv
with open('D:/test.csv', 'r') as f,open('D:/out.csv', 'w') as f_out:
reader = csv.DictReader(f)
writer = csv.writer(f_out)
for row in reader:
if row["ICLEVEL"] == "1":
writer.writerow(row)
The thing is that I have to copy only those rows where value of "ICLEVEL"(Header name) is equal to "1".
Note: test.csv is very huge file and I cannot hardcode all header names in the writer.
Any demostration of pythonic way of doing this is greatly appreciated. Thanks.

writer.writerow expects a sequence (a tuple or list). You can use DictWriter which expects a dict.
import csv
with open('D:/test.csv', 'r') as f, open('D:/out.csv', 'w') as f_out:
reader = csv.DictReader(f)
writer = csv.DictWriter(f_out, fieldnames=reader.fieldnames)
writer.writeheader() # For writing header
for row in reader:
if row['ICLEVEL'] == '1':
writer.writerow(row)

Your row is a dictionary. CSV writer cannot write dictionaries. Select the values from the dictionary and write just them:
writer.writerow(reader.fieldnames)
for row in reader:
if row["ICLEVEL"] == "1":
values = [row[field] for field in reader.fieldnames]
writer.writerow(values)

I would actually use Pandas, not a CSV reader:
import pandas as pd
df=pd.read_csv("D:/test.csv")
newdf = df[df["ICLEVEL"]==1]
newdf.to_csv("D:/out.csv",index=False)
The code is much more compact.

Related

Create multiple files from unique values of a column using inbuilt libraries of python

I started learning python and was wondering if there was a way to create multiple files from unique values of a column. I know there are 100's of ways of getting it done through pandas. But I am looking to have it done through inbuilt libraries. I couldn't find a single example where its done through inbuilt libraries.
Here is the sample csv file data:
uniquevalue|count
a|123
b|345
c|567
d|789
a|123
b|345
c|567
Sample output file:
a.csv
uniquevalue|count
a|123
a|123
b.csv
b|345
b|345
I am struggling with looping on unique values in a column and then print them out. Can someone explain with logic how to do it ? That will be much appreciated. Thanks.
import csv
from collections import defaultdict
header = []
data = defaultdict(list)
DELIMITER = "|"
with open("inputfile.csv", newline="") as csvfile:
reader = csv.reader(csvfile, delimiter=DELIMITER)
for i, row in enumerate(reader):
if i == 0:
header = row
else:
key = row[0]
data[key].append(row)
for key, value in data.items():
filename = f"{key}.csv"
with open(filename, "w", newline="") as f:
writer = csv.writer(f, delimiter=DELIMITER)
rows = [header] + value
writer.writerows(rows)
import csv
with open('sample.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
with open(f"{row[0]}.csv", 'a') as inner:
writer = csv.writer(
inner, delimiter='|',
fieldnames=('uniquevalue', 'count')
)
writer.writerow(row)
the task can also be done without using csv module. the lines of the file are read, and with read_file.read().splitlines()[1:] the newline characters are stripped off, also skipping the header line of the csv file. with a set a unique collection of inputdata is created, that is used to count number of duplicates and to create the output files.
with open("unique_sample.csv", "r") as read_file:
items = read_file.read().splitlines()[1:]
for line in set(items):
with open(line[:line.index('|')] + '.csv', 'w') as output:
output.write((line + '\n') * items.count(line))

Python - Reading the contents of csv in python and appending it

import csv
with open("somecities.csv") as f:
reader = csv.DictReader(f)
data = [r for r in reader]
Contents of somecities.csv:
Country,Capital,CountryPop,AreaSqKm
Canada,Ottawa,35151728,9984670
USA,Washington DC,323127513,9833520
Japan,Tokyo,126740000,377972
Luxembourg,Luxembourg City,576249,2586
New to python and I'm trying to read and append a csv file. I've spent some time experimenting with some responses to similar questions with no luck--which is why I believe the code above to be pretty useless.
What I am essentially trying to achieve is to store each row from the CSV in memory using a dictionary, with the country names as keys, and values being tuples containing the other information in the table in the sequence they are in within the CSV file.
And from there I am trying to add three more cities to the csv(Country, Capital, CountryPop, AreaSqKm) and view the updated csv. How should I go about doing all of this?
The desired additions to the updated csv are:
Brazil, Brasília, 211224219, 8358140
China, Beijing, 1403500365, 9388211
Belgium, Brussels, 11250000, 30528
EDIT:
Import csv
with open("somecities.csv", "r") as csvinput:
with open(" somecities_update.csv", "w") as csvresult:
writer = csv.writer(csvresult, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
headers = next(reader)
for row in reader:
all.append(row)
# Now we write to the new file
writer.write(headers)
for record in all:
writer.write(record)
#row.append(Brazil, Brasília, 211224219, 8358140)
#row.append(China, Beijing, 1403500365, 9388211)
#row.append(Belgium, Brussels, 11250000, 30528)
So assuming you can use pandas for this I would go about it this way:
import pandas as pd
df1 = pd.read_csv('your_initial_file.csv', index_col='Country')
df2 = pd.read_csv('your_second_file.csv', index_col='Country')
dfs = [df1,df2]
final_df = pd.concat(dfs)
DictReader will only represent each row as a dictionary, eg:
{
"Country": "Canada",
...,
"AreaSqKm": "9984670"
}
If you want to store the whole CSV as a dictionary you'll have to create your own:
import csv
all_data = {}
with open("somecities.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
# Key = country, values = tuple containing the rest of the data.
all_data[row["Country"]] = (row["Capital"], row["CountryPop"], row["AreaSqKm"])
# Add the new cities to the dictionary here...
# To write the data to a new CSV
with open("newcities.csv", "w") as f:
writer = csv.writer(f)
for key, values in all_data.items():
writer.writerow([key] + list(values))
As others have said, though, the pandas library could be a good choice. Check out its read_csv and to_csv functions.
Just another idea with creating and list and appending the new values through list construct as below, not tested:
import csv
with open("somecities.csv", "r") as csvinput:
with open("result.csv", "w") as csvresult:
writer = csv.writer(csvresult, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append(Brazil, Brasília, 211224219, 8358140)
row.append(China, Beijing, 1403500365, 9388211)
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
The simplest Form i see, tested in python 3.6
Opening a file with the 'a' parameter allows you to append to the end of the file instead of simply overwriting the existing content. Try that.
>>> with open("somecities.csv", "a") as fd:
... fd.write("Brazil, Brasília, 211224219, 8358140")
OR
#!/usr/bin/python3.6
import csv
fields=['Brazil', 'Brasília', '211224219','8358140']
with open(r'somecities.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow(fields)

Viewing a CSV in Python

How would I go about correcting this code, so that I can view the contents of the CSV?
import csv
def csv_to_list("jo.csv", delimiter=','):
with open("jo.csv", 'r') as csv_con:
reader = csv.reader(csv_con, delimiter=delimiter)
return list(reader)
I don't know what you are trying to do but the proper usage of csv.reader is:
import csv
with open("jo.csv", 'r') as csv_con:
reader = csv.reader(csv_con, delimiter=delimiter)
for row in reader:
# Process rows here
print(', '.join(row))
One of the goals of csv.reader is not to load the whole file in the reader but to access it row by row.

Python to insert quotes to column in CSV

I have no knowledge of python.
What i want to be able to do is create a script that will edit a CSV file so that it will wrap every field in column 3 around quotes. I haven't been able to find much help, is this quick and easy to do? Thanks.
column1,column2,column3
1111111,2222222,333333
This is a fairly crude solution, very specific to your request (assuming your source file is called "csvfile.csv" and is in C:\Temp).
import csv
newrow = []
csvFileRead = open('c:/temp/csvfile.csv', 'rb')
csvFileNew = open('c:/temp/csvfilenew.csv', 'wb')
# Open the CSV
csvReader = csv.reader(csvFileRead, delimiter = ',')
# Append the rows to variable newrow
for row in csvReader:
newrow.append(row)
# Add quotes around the third list item
for row in newrow:
row[2] = "'"+str(row[2])+"'"
csvFileRead.close()
# Create a new CSV file
csvWriter = csv.writer(csvFileNew, delimiter = ',')
# Append the csv with rows from newrow variable
for row in newrow:
csvWriter.writerow(row)
csvFileNew.close()
There are MUCH more elegant ways of doing what you want, but I've tried to break it down into basic chunks to show how each bit works.
I would start by looking at the csv module.
import csv
filename = 'file.csv'
with open(filename, 'wb') as f:
reader = csv.reader(f)
for row in reader:
row[2] = "'%s'" % row[2]
And then write it back in the csv file.

Python: add column to CSV file based on existing column

I already have written what I need for identifying and parsing the value I am seeking, I need help writing a column to the csv file (or a new csv file) with the parsed value. Here's some pseudocode / somewhat realistic Python code for what I am trying to do:
# Given a CSV file, this function creates a new CSV file with all values parsed
def handleCSVfile(csvfile):
with open(csvfile, 'rb') as file:
reader = csv.reader(file, delimiter=',', lineterminator='\n')
for row in reader:
for field in row:
if isWhatIWant(field):
parsedValue = parse(field)
# write new column to row containing parsed value
I've already written the isWhatIWant and parse functions. If I need to write a completely new csv file, then I am not sure how to have both open simultaneously and read and write from one into the other.
I'd do it like this. I'm guessing that isWhatIWant() is something that is supposed to replace a field in-place.
import csv
def handleCSVfile(infilename, outfilename):
with open(infilename, 'rb') as infile:
with open(outfilename, 'wb') as outfile:
reader = csv.reader(infile, lineterminator='\n')
writer = csv.writer(outfile, lineterminator='\n')
for row in reader:
for field_index, field in enumerate(row):
if isWhatIWant(field):
row[field_index] = parse(field)
writer.writerow(row)
This sort of pattern occurs a lot and results in really long lines. It can sometimes be helpful to break out the logic from opening and files into a different function, like this:
import csv
def load_save_csvfile(infilename, outfilename):
with open(infilename, 'rb') as infile:
with open(outfilename, 'wb') as outfile:
reader = csv.reader(infile, lineterminator='\n')
writer = csv.writer(outfile, lineterminator='\n')
read_write_csvfile(reader, writer)
def read_write_csvfile(reader, writer)
for row in reader:
for field_index, field in enumerate(row):
if isWhatIWant(field):
row[field_index] = parse(field)
writer.writerow(row)
This modularizes the code, making it easier for you to change the way the files and formats are handled from the logic independently from each other.
Additional hints:
Don't name variables file as that is a built-in function. Shadowing those names will bite you when you least expect it.
delimiter=',' is the default so you don't need to specify it explicitly.

Categories