range(x,y) problems, CSV file updating problems - python

When using different values in range loop, I am receiving different output to file. I think should not be like that. This is the initial content of data.csv:
a1,b1,c1
a2,b2,c2
a3,b3,c3
a4,b4,c4
a5,b5,c5
a6,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
This is the script (run on initial content):
import csv
csv_f = open("data.csv","r+", newline='')
csv_w = csv.writer(csv_f)
for x in range(1,4):
csv_w.writerow(["e"+str(x)]+["f"+str(x)]+["g"+str(x)])
csv_f.flush()
csv_f.close()
Output:
e1,f1,g1
e2,f2,g2
e3,f3,g3
a4,b4,c4
a5,b5,c5
a6,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
Result as expected.
This is the modified script (range values only) (run on initial content):
import csv
csv_f = open("data.csv","r+", newline='')
csv_w = csv.writer(csv_f)
for x in range(12,15):
csv_w.writerow(["e"+str(x)]+["f"+str(x)]+["g"+str(x)])
csv_f.flush()
csv_f.close()
Output:
e12,f12,g12
e13,f13,g13
e14,f14,g14
a5,b5,c5
a6,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
Output is not as expected, a row is lost.
If you play with range values it is doing something that you would not expect with output.
It definitely shouldn't do it. What is the reason? I know there are better ways to open CSV files, but I would like to know what is the problem here. Another example below run on initial content as well.
import csv
csv_f = open("data.csv","r+", newline='')
csv_w = csv.writer(csv_f)
for x in range(13,17):
csv_w.writerow(["e"+str(x)]+["f"+str(x)]+["g"+str(x)])
csv_f.flush()
csv_f.close()
Output:
e13,f13,g13
e14,f14,g14
e15,f15,g15
e16,f16,g16
,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10
Again the output is not as expected.

You read in the csv into a buffer with open("data.csv","r+", newline='').
This buffer is just a stream of chars. You can imagine it as one string: a1,b1,c1\r\na2,b2,c2\r\na3,b3,c3\r\na4,b4,c4
Now you start writing lines to the start of your buffer. First e.g. e12,f12,g12\r\n
So it writes the e to the first position replacing the a
1 replacing 1
2 replacing ,
Now you can see what this leads too. Because with double-digit numbers you have more chars then before the write behavior is not as you expect.
Alternative solution:
So to achieve the behavior that I think you wanted you could read in the old lines:
rows = []
with open('data.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
rows.append(row)
Then you could replace the rows that you want:
In your case you wanted to replace the first four rows:
for x in range(0,4):
offset = 12
rows[x] = ["e"+str(x+offset)]+["f"+str(x+offset)]+["g"+str(x+offset)]
Then you can just specify an offset for the actual content-indice you want (12-15 or whatever)
In the end you just write the rows back:
with open('data.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
for x in range(0, len(rows)):
writer.writerow(rows[x])
This leads to the following output:
e12,f12,g12
e13,f13,g13
e14,f14,g14
e15,f15,g15
a5,b5,c5
a6,b6,c6
a7,b7,c7
a8,b8,c8
a9,b9,c9
a10,b10,c10

Related

Converting CSV into Array in Python

I have a csv file like below. A small csv file and I have uploaded it here
I am trying to convert csv values into array.
My expectation output like
My solution
results = []
with open("Solutions10.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
but I am getting a
ValueError: could not convert string to float: ' [1'
There is a problem with your CSV. It's just not csv (coma separated values). To do this you need some cleaning:
import re
# if you expect only integers
pattern = re.compile(r'\d+')
# if you expect floats (uncomment below)
# pattern = re.compile(r'\d+\.*\d*')
result = []
with open(filepath) as csvfile:
for row in csvfile:
result.append([
int(val.group(0))
# float(val.group(0))
for val in re.finditer(pattern, row)
])
print(result)
You can also solve this with substrings if it's easier for you and you know the format exactly.
Note: Also I see there is "eval" suggestion. Please, be careful with it as you can get into a lot of trouble if you scan unknown/not trusted files...
You can do this:
with open("Solutions10.csv") as csvfile:
result = [eval(k) for k in csvfile.readlines()]
Edit: Karl is cranky and wants you todo this:
with open("Solutions10.csv") as csvfile:
result = []
for line in csvfile.readlines():
line = line.replace("[","").replace("]","")
result.append([int(k) for k in line.split(",")]
But you're the programmer so you can do what you want. If you trust your input file eval is fine.

Why are the digits in my numbers printing separately rather than together?

This is an example of my code. It is not the whole code, it is just the part where I am having trouble. Does anyone understand why it prints like this rather than the full numbers, like 104.0 and 96.0? They are strings, but it will not allow me to convert it to a float because the period in some of the digits..
with open('file.csv','w') as file:
with open('file2.csv', 'r') as file2:
reader = csv.DictReader(file2)
file.write(','.join(row))
file.write('\n')
for num,row in enumerate(reader):
outrow = []
for x in row['numbers']:
print(x)
When I execute this, it prints out the values I am looking for but separately like this:
1
0
4
.
0
9
6
.
0
N
a
N
1
3
6
.
0
N
a
N
6
2
.
0
The 'NaN' are values I am changing, but the rest of the numbers I have to use. I cannot insert them into a list because they will end up separated right?
Seems like you want something like:
with open('file.csv','w') as file:
with open('file2.csv', 'r') as file2:
reader = csv.DictReader(file2)
file.write(','.join(row))
file.write('\n')
for num,row in enumerate(reader):
number = row['numbers']
print(number)
for x in row['numbers'] means, "Iterate over every individual character in the numbers cell/vallue".
Also, what are you doing here?
file.write(','.join(row))
file.write('\n')
You don't have a row variable/object at that point (at least not visible in your example). Are you trying to write the header? Presumably it's working, so you defined row before, maybe like, row = ['col1', 'numbers']
If so, maybe take this general approach:
import csv
# Do your reading and processing in one step
rows = []
with open('input.csv', newline='') as f:
reader = csv.DictReader(f)
for row in reader:
# do some work on row, like...
number = row['numbers']
if row['numbers'] == 'NaN':
row['numbers'] = '-1' # whatever you do with NaN
rows.append(row)
# Do your writing in another step
my_field_names = rows[0].keys()
with open('output.csv', 'w', newline='') as f:
# Use the provided writer, in addition to reader
writer = csv.DictWriter(f, fieldnames=my_field_names)
writer.writeheader()
writer.writerows(rows)
At the very least, use the provide writer and DictWriter classes, they will make your life much easier.
I mocked up this sample CSV:
input.csv
id,numbers
id_1,100.4
id2,NaN
id3,23
and the program above produced this:
output.csv
id,numbers
id_1,100.4
id2,-1
id3,23

Trying to get CSV data into source-target format for a bibliometrics graph (Gephi)

I'm trying to get a csv which is comprised of rows which describe the research groups (from 1 to n groups) which have worked in a particular publication into a csv with combinations of each 2 groups who have collaborated.
The csv that I have is like that: (each row corresponds to a particular publication)
group1;group2;group3
group1;group8
group8;group2;group1
I need to convert it into Gephi edges format, which uses a csv source-target format:
group1;group2
group1;group3
group2;group3
group1;group8
group8;group2
group8;group1
group2;group1
(do not need all permutations, just combinations, as it's an undirected graph)
I first done it with just one of the rows and got the general idea of how to do it:
b = "group1;group2;group3"
b_split = b.split(";")
print list(combinations(b_split,2))
Result: [('group1', 'group2'), ('group1', 'group3'), ('group2', 'group3')]
But when I try to open the whole csv, it seems the split function doesn't work well.
with open('grups.csv','rb') as origin_file:
reader = csv.reader(origin_file, delimiter=";")
a = list(reader)
for row in a:
c = list(combinations(row,2))
with open('output.csv','wb') as result_file:
for each in c:
wr = csv.writer(result_file)
wr.writerow(each)
But the result I get in the file is just the last line.
Got it working by this:
with open('grups.csv','rb') as origin_file:
reader = csv.reader(origin_file, delimiter=";")
a = list(reader)
with open('output_grups.csv','wb') as result_file:
for row in a:
c = list(combinations(row,2))
for each in c:
wr = csv.writer(result_file,delimiter=';',dialect='excel')
wr.writerow(each)

Appending data to csv file

I am trying to append 2 data sets to my csv file. Below is my code. The code runs but my data gets appended below a set of data in the first column (i.e. col[0]). I would however like to append my data sets in separate columns at the end of file. Could I please get advice on how I might be able to do this? Thanks.
import csv
Trial = open ('Trial_test.csv', 'rt', newline = '')
reader = csv.reader(Trial)
Trial_New = open ('Trial_test.csv', 'a', newline = '')
writer = csv.writer(Trial_New, delimiter = ',')
Cortex = []
Liver = []
for col in reader:
Cortex_Diff = float(col[14])
Liver_Diff = float(col[17])
Cortex.append(Cortex_Diff)
Liver.append(Liver_Diff)
Avg_diff_Cortex = sum(Cortex)/len(Cortex)
Data1 = str(Avg_diff_Cortex)
Avg_diff_Liver = sum(Liver)/len(Liver)
Data2 = str(Avg_diff_Liver)
writer.writerows(Data1 + Data2)
Trial.close()
Trial_New.close()
I think I see what you are trying to do. I won't try to rewrite your function entirely for you, but here's a tip: assuming you are dealing with a manageable size of dataset, try reading your entire CSV into memory as a list of lists (or list of tuples), then perform your calculations on the values on this object, then write the python object back out to the new CSV in a separate block of code. You may find this article or this one of use. Naturally the official documentation should be helpful too.
Also, I would suggest using different files for input and output to make your life easier.
For example:
import csv
data = []
with open('Trial_test.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in reader:
data.append(row)
# now do your calculations on the 'data' object.
with open('Trial_test_new.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ', quotechar='|')
for row in data:
writer.writerow(row)
Something like that, anyway!

Python: redirect text parsing output to CSV file

I want to write a simple script which will parse a text file of mine.
Pattern is the following:
0.061024 seconds for Process 0 to send.
0.060062 seconds for Process 1 to receive.
This goes on in a loop.
The python file looks like this:
import fileinput, csv
data = []
for line in fileinput.input():
time, sep, status = line.partition("seconds")
if sep:
print(time.strip())
with open('result.csv', 'w') as f:
w = csv.writer(f)
w.writerow('send receive'.split())
w.writerows(data)
this gives me the desired output on the bash and also creates two columns with the send and receive. How do I fill them with input which is printed by
print(time.strip())
I would like to have this output in a CSV file in two columns.
how shall I do it?
You can use the writer function that comes with the csv module:
import csv
with open('file.csv', 'wb') as csvfile:
cwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for var in list_of_values:
cwriter.writerow(var)
This takes into consideration, that you have all the rows as separate lists within list_of_values, as in:
list_of_values = [['col1', 'col2'],['col1', 'col2']]

Categories