I'm having a problem in rewriting a CSV file. What I had was a CSV file with 20 columns and I rewrote it to only 5. Now, I need to take out a couple of unnecessary points, where SN < 20. It works, the only problem is that it doesn't separate the rows. It puts everything in row 1. I'm guessing that its from the,
output_ary.append(row)
but I don't know what else to write there. Here is a part of the code:
import csv
import os
import matplotlib.pyplot as plt
os.chdir("C:\Users\Robert\Documents\qwe")
r = csv.reader(open("gdweights_feh_robert_cmr.csv",'rU'))
w = csv.writer(open("gdweight.csv",'wb',buffering=0))
zerovar2 = 0
for row in r:
if zerovar2==0:
zerovar2 = zerovar2 + 1
else:
sn = float(row[11])
rweight = float(row[17])
tarweight = float(row[18])
fehadop = float(row[25])
weight = rweight*tarweight*fehadop
w.writerow([sn,rweight,tarweight,fehadop,weight])
output_ary = []
with open("gdweight.csv",'rU') as f:
reader = csv.reader(f, delimiter= ',')
zerovar = 0
for row in reader:
if zerovar==0:
zerovar = zerovar + 1
else:
sn = row [0]
zerovar = zerovar + 1
x = float(sn)
if x > 20:
output_ary.append(row)
with open("ouput1.csv",'w') as f2:
for row in output_ary:
for item in row:
f2.write(item + ",")
with open("ouput1.csv",'w') as f2:
for row in output_ary:
for item in row:
f2.write(item + ",")
f2.write("\n") # this is what you're missing
Simple rewrite the last as :
with open("ouput1.csv",'w') as f2:
for row in output_ary:
f2.write(",".join([str(e) for e in item] + '\n')
Now here are a couple of additional comments :
you can use enumerate instead of using a counter :
for i_row, row in enumerate(r) :
...
you can also use a csv writer :
with open("output.txt", "w") as f :
csv_w = csv.writer(f)
for i_row, row in enumerate(output) :
if i_row== 0 :
continue
csv_w.writerow(row)
Related
I have a Python script that reads 1000 lines from a CSV file.
Why is iterating the csv.reader object in a list comprehension 4 orders of magnitude slower than the equivalent explicit for loop?
Here are the code snippets and their times (via time.time()) on my machine:
with open("file.csv", 'r', newline = "\n") as f:
reader = csv.reader(f, delimiter = ",")
data = [r for i, r in enumerate(reader) if i < 1000]
# Time: 26.459498167037964 s
with open("file.csv", 'r', newline = "\n") as f:
reader = csv.reader(f, delimiter = ",")
data = []
for i, r in enumerate(reader):
if i >= 1000:
break
data.append(r)
# Time: 0.005597114562988281 s
EDIT:
As per the answers, the list comprehension version reads the whole file and only selects the elements that satisfy the condition i < 1000, while the explicit loop stops as it reaches i == 1000.
For future readers of this question, the most elegant solution was written by #decenze in the comments:
import csv
from itertools import islice
with open("file.csv", 'r', newline = "\n") as f:
reader = csv.reader(f, delimiter = ",")
data = list(islice(reader, 0, 1000))
as #Barmar was saying, your list comprehension is iterating over all your csv file lines and it is not stopping at index 1000, to stop at index 1000 you can use islice:
with open("file.csv", 'r', newline = "\n") as f:
reader = csv.reader(f, delimiter = ",")
data = [r for r in islice(reader, 0, 1000)]
an equivalent code to your list comprehension to have a better understanding why is taking so long:
with open("file.csv", 'r', newline = "\n") as f:
reader = csv.reader(f, delimiter = ",")
data = []
for i, r in enumerate(reader):
if i < 1000:
data.append(r)
# else iterate till the end of file
Here is the answer
In this code,
with open("file.csv", 'r', newline = "\n") as f:
reader = csv.reader(f, delimiter = ",")
data = []
for i, r in enumerate(reader):
if i >= 1000:
break
data.append(r)
in the for scope, the for scope will break if i >= 1000
But this in this code, there's no break keyword (since you used List Comprehension)
with open("file.csv", 'r', newline = "\n") as f:
reader = csv.reader(f, delimiter = ",")
data = [r for i, r in enumerate(reader) if i < 1000]
Why is this slow? Because data = [r ...] will not break if i < 1000, but will always read until enumerate(reader) is finished. This is different from the previous code where for scope will break if i >= 1000. But not for the list comprehension
You can use the similar code by using raise StopIteration or another method, for example
def LoopEnd():
raise StopIteration()
with open("file.csv", 'r', newline = "\n") as f:
reader = csv.reader(f, delimiter = ",")
data = [r if i < 1000 else LoopEnd() for i, r in enumerate(reader)]
import pandas as pd
import numpy as np
import csv
with open('predictions.csv', 'r') as f:
csvreader = csv.reader(f)
next(csvreader)
for r in csvreader:
df = r[:-1]
a = float(str(df[:][0]))
b = float(str(df[:][1]))
if a > b :
print(1, 0)
else:
print(0, 1)
0 1
0 1
0 1
0 1
0 1
0 1
I printed my result from predictions.csv file, and I'm trying to save my result, that showed as 0 and 1, to CSV file in 2 columns. So how can I save my result to a CSV file of 2 columns as it showed?
You don't have to save the printed output - you can directly save the data to a CSV when you're printing it in the loop, like this:
with open('new_file.csv', mode='a', newline='') as file:
csv_writer = csv.writer(file)
if a > b :
print(1, 0)
csv_writer.writerow(['1', '0'])
else:
print(0, 1)
csv_writer.writerow(['0', '1'])
%%capture cap --no-stderr
with open('predictions.csv', 'r') as f:
csvreader = csv.reader(f)
next(csvreader)
for r in csvreader:
df = r[:-1]
a = float(str(df[:][0]))
b = float(str(df[:][1]))
with open('output.csv', 'w') as writer:
if a > b :
print(1,',',0)
else:
print(0,',',1)
writer.write(cap.stdout)
Hi everyone, I received data in a excel (xls) spreadsheet that is formatted in the first table, illustrated above.
I am attempting to rearrange this data into the format, in the table, just below. Any help would be greatly appreciated.
Thanks much.
First, save it to a .csv file
import csv
curr = []
with open('file.csv') as infile, open('path/to/output', 'w') as fout:
outfile = csv.writer(fout)
for area, pop10, pop20, pop50 in csv.reader(infile):
if curr and curr[0] != area:
outfile.writerow(curr)
curr = [area, pop10, pop20, pop50]
continue
if pop10: curr[1] = pop10
if pop20: curr[2] = pop20
if pop50: curr[3] = pop50
You can do this pretty succinctly using Pandas:
import pandas as pd
dataframe = pd.read_excel("in.xlsx")
merged = dataframe.groupby("AREA").sum()
merged.to_excel("out.xlsx")
so, if the csv has 11 columns where 'AREA' is the second column, would the code be:
def CompressRow(in_csv,out_file):
curr = []
with open(in_csv) as infile, open(out_file, 'w') as fout:
outfile = csv.writer(fout)
for a,b,c,d,e,f,g,h,i,j,k in csv.reader(infile):
if curr and curr[1] != b:
outfile.writerow(curr)
curr = [a,b,c,d,e,f,g,h,i,j,k]
continue
if a: curr[0] = a
if c: curr[2] = c
if d: curr[3] = d
if e: curr[4] = e
if f: curr[5] =f
if g: curr[6]=g
if h: curr[7]=h
if i: curr[8]=i
if j: curr[9]=j
if k: curr[10]=k
#execute CompressRow(in_csv,out_file)
I tried executing it and it gives me
if a: curr[0]=a
IndexError: list assignment index out of range
We have to compute the average of of the 5th row in an excel sheet, saved as a csv file. The first row of the file has the names of the columns making them strings. I can't seem to get a code that loops around all the row[4] and compute it into one variable 'sum'. Here is my code. Also,
import csv
import os
sum = x_length = 0
with open('2.5_week.csv', newline='') as f:
rows = csv.reader(f)
for row in rows:
if row[4] is int:
sum = sum + float(row[4])
x_length = x_length + 1
x_average = sum/len(x_length)
print(x_average)
I'm using python 3.4.x
This example should help get you towards the goal you are trying to solve with your program:
import csv
import random
import statistics
def main():
make_csv()
read_csv_1()
read_csv_2()
def make_csv():
with open('2.5_week.csv', 'w', newline='') as file:
writer = csv.writer(file)
for index in range(1000):
row = (random.random() * index,
random.randint(index, index * 2),
random.randrange(1 + index * 3),
random.random() + index,
random.randint(index, index + 10),
random.randrange(1 + index ** 2))
writer.writerow(row)
def read_csv_1():
with open('2.5_week.csv', 'r', newline='') as file:
table = pivot_table(csv.reader(file))
print(statistics.mean(map(float, table[4])))
def pivot_table(table):
iterator = iter(table)
pivot = tuple([cell] for cell in next(iterator))
for row in iterator:
for column, cell in zip(pivot, row):
column.append(cell)
return pivot
def read_csv_2():
with open('2.5_week.csv', 'r', newline='') as file:
print(statistics.mean(float(row[4]) for row in csv.reader(file)))
if __name__ == '__main__':
main()
total = 0
count = 0
with open("data.csv") as source:
rdr = csv.reader(source, delimiter=',')
next(rdr, None) #skip the header
for row in rdr:
try:
if isinstance(float(row[4]),float):
total+= float(row[4])
count += 1
except ValueError:
pass
ave = round(total / count,2)
print(ave)
I am very new to Python and I am trying to do a very simple merge of every two lines in a csv file. Basically I want it like this:
Input:
[H1],1,2,3,4
[H2],5,6,7,8
[H1],a,b,c,d
[H2],e,f,g,h
Output:
[H1],1,2,3,4,[H2],5,6,7,8
[H1],a,b,c,d,[H2],e,f,g,h
This is a brief example, but the csv file has up to 167 columns with the two lines combined. This is what I have:
import csv
f = open("sample.csv",'rU').read().split("\n")
reader = csv.reader(f)
for row in reader:
if row[0].startswith("[H1]"):
i=[]
while i<167: n = row.append([",".join(row[i]+row[i+1])])
print n
However when I run it I get the following error:
print n
NameError: name 'n' is not defined
Any help is appreciated, thanks.
Input i.csv:
1,2,3
foo,bar,baz
4,5,6
qux,quux.quuux
Python codce:
with open("i.csv") as f:
reader = csv.reader(f)
i = 0
for row in reader:
if i % 2 == 0:
newRow = row
else:
newRow = newRow + row
print(newRow)
i = i + 1
Output:
['1', '2', '3', 'foo', 'bar', 'baz']
['4', '5', '6', 'qux', 'quux', 'quuux']
import csv
f = open("sample.csv",'rU').read().split("\n")
reader = csv.reader(f)
i = 0
for row in reader:
if i % 2 == 0:
line = row
else:
line = line + row
print ", ".join(line)
i += 1
Writing while i<167: n = row.append([",".join(row[i]+row[i+1])]) is like writing:
while i<167:
n = row.append([",".join(row[i]+row[i+1])])
So the scope of n is the loop block. Your print n is out of that scope thus raising NameError.
You could add n = None right before the while:
n = None
while i<167: n = row.append([",".join(row[i]+row[i+1])])
print n
Or move print n into the loop block:
while i<167:
n = row.append([",".join(row[i]+row[i+1])])
print n
Note that any of these changes will avoid your program interruption by the out of scope error but you will print a lot of lines containing None because append returns None: https://docs.python.org/2/tutorial/datastructures.html
Here is a way to combine the line pairs:
import csv
from itertools import izip
def main():
with open('sample.csv', 'rb') as input_file:
reader = csv.reader(input_file)
for even_row, odd_row in izip(reader, reader):
combined_row = even_row + odd_row
print combined_row
if __name__ == '__main__':
main()