How to find specific rows in csv document in python

How to find specific rows in csv document in python - python

What I'm trying to do is read into a csv document and find all values in the SN column > 20 and make a new file with only the rows with SN > 20.
I know that I need to do:
Read the original File
Open a new file
Iterate over rows of the original file
What I've been able to do is find the rows that have a value of SN > 20
import csv
import os
os.chdir("C:\Users\Robert\Documents\qwe")
with open("gdweights_feh_robert_cmr.csv",'rb') as f:
reader = csv.reader(f, delimiter= ',')
zerovar = 0
for row in reader:
if zerovar==0:
zerovar = zerovar + 1
else:
sn = row [11]
zerovar = zerovar + 1
x = float(sn)
if x > 20:
print x
So my question is how do I take the rows with SN > 20 and turn it into a new file?

Save the data in a list, then write the list to a file.
import csv
import os
os.chdir(r"C:\Users\Robert\Documents\qwe")
output_ary = []
with open("gdweights_feh_robert_cmr.csv",'rb') as f:
reader = csv.reader(f, delimiter= ',')
zerovar = 0
for row in reader:
if zerovar==0:
zerovar = zerovar + 1
else:
sn = row [11]
zerovar = zerovar + 1
x = float(sn)
if x > 20:
print x
output_ary.append(row)
with open("output.csv",'w') as f2:
for row in output_ary:
for item in row:
f2.write(item + ",")

In the code, the reading / looping through the rows is is quite complex. It could be cleaned up (and run faster in Python) with the following:
with open('gdweights_feh_robert_cmr.csv', 'rb') as f:
output_ary = [row for row in f if float(row[11]) > 20]
Using list comprehension ([row for row if f]) is optimised in python, so it will preform more efficiently. AND... you avoid having to create the reader array, which will reduce the memory required, also very handy if the csv file is large.
You can then proceed to write out the outout_ary as suggested in the other answers.
Hope this helps!

Related

Python output every nth row without Pandas

I'm really new to Python and my task is to rewrite a CSV with Python. I managed to program a working script for my task already. Now I would like to get only every 10th row of the CSV as output.
Is there an easy way to do this?
I already tried to use Jason Reeks answer.
Now it works, thank you!
import csv
import sys
userInputFileName = sys.argv[1]
outPutFileSkipped = userInputFileName.split('.')[0] + '-Skipped.csv'
cnt = 0
first = True
with open(outPutFileSkipped, 'w', newline='') as outputCSV:
csv_reader_object_skipped = csv.reader((x.replace('\0', '') for x in open(userInputFileName)), delimiter=',')
csv_writer_object_skipped = csv.writer(outputCSV, delimiter=',')
for row, line in enumerate(csv_reader_object_skipped):
if row % 10 == 0:
print(line)
csv_writer_object_skipped.writerow(line)
print('Es wurden erfolgreich ' + str(cnt) + ' Zeilen formatiert!')

Here's a native way to do it without pandas:
import csv
with open('file.csv', 'r') as f:
reader = csv.reader(f)
for row, line in enumerate(reader):
# Depending on your reference point you may want to + 1 to row
# to get every 10th row.
if row % 10 == 0:
print(line)

There's an easy way with Pandas:
import pandas as pd
df = pd.DataFrame({"a": range(100), "b": range(100, 200)})
df.loc[::10]

Unsure how to write text into specific column in csv file.

Hey I'm working on this project where I take this text and translate it and store it back into the same CSV file. The next open column is at index 10 or Column K. I've been trying to write the data but I just can't get it.
Reading works fine. I tried to do all this into single while loop but I couldn't get it to work. Sorry for any formatting errors!
from googletrans import Translator
import csv
translater = Translator()
f = open("#ElNuevoDia.csv", "r+")
csv_f = csv.reader(f)
csv_wf = csv.writer(f)
tmp = {}
x = 0
for row in csv_f:
tmp[x] = translater.translate(row[4], dest="en")
#print(tmp[x].text)
#print("\n")
#print(tmp[x].text)
x = x + 1
x = 0
f.close()
csv_wf = csv.writer(f)
for row in csv_wf:
csv_wf[10].writerow(tmp[x].text)
f.close()

You should update row in reader and then write it back (as you mentioned in the comment, writer is not iterable). Something like that (part of your code):
for row in csv_f:
row[10] = translater.translate(row[4], dest="en")
tmp[x] = row
x = x + 1
x = 0
f.close()
csv_wf = csv.writer(f)
for row in tmp:
csv_wf.writerow(row)
f.close()
Edit 1:
For text variable you can do that:
row[10] = translater.translate(row[4], dest="en").text
and you can write it back in one step:
csv_wf.writerows(tmp)

Create new file for each row in csv (trouble iterating)

Hi I am trying to iterate over each row in a csv file with python and create new csv files for each row. So my thought process is open the file, and loop through each row and for each row create a file named n_file.csv (where 'n' is the iteration), so here is my code:
import csv
csvfile = open('sample.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = '.'):
csvFileArray.append(row)
print(row)
n = 0
n += 1
file = open(str(n) + "_file.csv", 'w+')
file.write(str(row))
print(n) # returns 1 every time
Unfortunately this is not iterating properly (because it is only create a file named 1_file.csv and overwriting it each time). How can I fix this?

for row in csv.reader(csvfile, delimiter='.'):
csvFileArray.append(row)
print(row)
n = 0 # << you do n=0 each loop!!
n += 1
so it's better be,
for idx, row in enumerate(csv.reader(csvfile, delimiter='.')):
csvFileArray.append(row)
print(row)
file = open(str(idx) + "_file.csv", 'w+') # enumerate do same as you want!
file.write(str(row))

You set n to 0 each time, because you declared it inside the loop. Declare it before the for statement.

Try this:
import csv
with open('sample.csv','rb') as csvfile:
for i, row in enumerate(csv.reader(csvfile, delimiter = '.')):
with open("{}_file.csv".format(i), "w") as file:
file.write(str(row))

Python: Effective reading from a file using csv module

I have just started learning csv module recently. Suppose we have this CSV file:
John,Jeff,Judy,
21,19,32,
178,182,169,
85,74,57,
And we want to read this file and create a dictionary containing names (as keys) and totals of each column (as values). So in this case we would end up with:
d = {"John" : 284, "Jeff" : 275, "Judy" : 258}
So I wrote this code which apparently works well, but I am not satisfied with it and was wondering if anyone knows of better or more efficient/elegant way of doing this. Because there's just too many lines in there :D (Or maybe a way we could generalize it a bit - i.e. we would not know how many fields are there.)
d = {}
import csv
with open("file.csv") as f:
readObject = csv.reader(f)
totals0 = 0
totals1 = 0
totals2 = 0
totals3 = 0
currentRowTotal = 0
for row in readObject:
currentRowTotal += 1
if currentRowTotal == 1:
continue
totals0 += int(row[0])
totals1 += int(row[1])
totals2 += int(row[2])
if row[3] == "":
totals3 += 0
f.close()
with open(filename) as f:
readObject = csv.reader(f)
currentRow = 0
for row in readObject:
while currentRow <= 0:
d.update({row[0] : totals0})
d.update({row[1] : totals1})
d.update({row[2] : totals2})
d.update({row[3] : totals3})
currentRow += 1
return(d)
f.close()
Thanks very much for any answer :)

Not sure if you can use pandas, but you can get your dict as follows:
import pandas as pd
df = pd.read_csv('data.csv')
print(dict(df.sum()))
Gives:
{'Jeff': 275, 'Judy': 258, 'John': 284}

Use the top row to figure out what the column headings are. Initialize a dictionary of totals based on the headings.
import csv
with open("file.csv") as f:
reader = csv.reader(f)
titles = next(reader)
while titles[-1] == '':
titles.pop()
num_titles = len(titles)
totals = { title: 0 for title in titles }
for row in reader:
for i in range(num_titles):
totals[titles[i]] += int(row[i])
print(totals)
Let me add that you don't have to close the file after the with block. The whole point of with is that it takes care of closing the file.
Also, let me mention that the data you posted appears to have four columns:
John,Jeff,Judy,
21,19,32,
178,182,169,
85,74,57,
That's why I did this:
while titles[-1] == '':
titles.pop()

It's a little dirty, but try this (operating without the empty last column):
#!/usr/bin/python
import csv
import numpy
with open("file.csv") as f:
reader = csv.reader(f)
headers = next(reader)
sums = reduce(numpy.add, [map(int,x) for x in reader], [0]*len(headers))
for name, total in zip(headers,sums):
print("{}'s total is {}".format(name,total))

Base on Michasel's solution, I would try with less code and less variables and no dependency on Numpy:
import csv
with open("so.csv") as f:
reader = csv.reader(f)
titles = next(reader)
sum_result = reduce(lambda x,y: [ int(a)+int(b) for a,b in zip(x,y)], list(reader))
print dict(zip(titles, sum_result))

Read and Compare 2 CSV files on a row and column basis

I have two CSV files. data.csv and data2.csv.
I would like to first of Strip the two data files down to the data I am interested in. I have figured this part out with data.csv. I would then like to compare by row making sure that if a row is missing to add it.
Next I want to look at column 2. If there is a value there then I want to write to column 3 if there is data in column 3 then write to 4, etc.
My current program looks like sow. Need some guidance
Oh and I am using Python V3.4
__author__ = 'krisarmstrong'
#!/usr/bin/python
import csv
searched = ['aircheck', 'linkrunner at', 'onetouch at']
def find_group(row):
"""Return the group index of a row
0 if the row contains searched[0]
1 if the row contains searched[1]
etc
-1 if not found
"""
for col in row:
col = col.lower()
for j, s in enumerate(searched):
if s in col:
return j
return -1
inFile = open('data.csv')
reader = csv.reader(inFile)
inFile2 = open('data2.csv')
reader2 = csv.reader(inFile2)
outFile = open('data3.csv', "w")
writer = csv.writer(outFile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
header = next(reader)
header2 = next(reader2)
"""Built a list of items to sort. If row 12 contains 'LinkRunner AT' (group 1),
one stores a triple (1, 12, row)
When the triples are sorted later, all rows in group 0 will come first, then
all rows in group 1, etc.
"""
stored = []
writer.writerow([header[0], header[3]])
for i, row in enumerate(reader):
g = find_group(row)
if g >= 0:
stored.append((g, i, row))
stored.sort()
for g, i, row in stored:
writer.writerow([row[0], row[3]])
inFile.close()
outFile.close()

Perhaps try:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
col1.append(row[0])
col2.append(row[1])
for i in xrange(len(col1))
if col1[i] == '':
#thing to do if there is nothing for col1
if col2[i] == '':
#thing to do if there is nothing for col2
This is a start at "making sure that if a row is missing to add it".

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find specific rows in csv document in python - python

Related

Python output every nth row without Pandas

Unsure how to write text into specific column in csv file.

Create new file for each row in csv (trouble iterating)

Python: Effective reading from a file using csv module

Read and Compare 2 CSV files on a row and column basis

Categories

Resources