I am new to Python. I am trying to write numbers in a CSV file. The first number makes the first element of the row. Second number second and then a new row should start. However, the way that my code works, instead of adding the second element to the same row, it makes a new row.
For instance what I want is:
a1,b1
a2,b2
But what I get is:
a1
b1
a2
b2
I use a loop to continuously write values into a CSV file:
n = Ratio # calculated in each loop
with open('ex1.csv', 'ab') as f:
writer = csv.writer(f)
writer.writerow([n])
...
m = Ratio2 # calculated in each loop
with open('ex1.csv', 'ab') as f:
writer = csv.writer(f)
writer.writerow([m])
I would like the results to be in format of
n1,m1
n2,m2
Example for writing to a file and then reading it back and printing it:
import csv
with open('ex1.csv', 'w') as f: # open file BEFORE you loop
writer = csv.writer(f) # declare your writer on the file
for rows in range(0,4): # do one loop per row
myRow = [] # remember all column values, clear list here
for colVal in range(0,10): # compute 10 columns
m = colVal * rows # heavy computing (your m or n)
myRow.append(m) # store column in row-list
writer.writerow(myRow) # write list containing all columns
with open('ex1.csv', 'r') as r: #read it back in
print(r.readlines()) # and print it
Output:
['0,0,0,0,0,0,0,0,0,0\r\n', '0,1,2,3,4,5,6,7,8,9\r\n', '0,2,4,6,8,10,12,14,16,18\r\n', '0,3,6,9,12,15,18,21,24,27\r\n']
which translates to a file of
0,0,0,0,0,0,0,0,0,0
0,1,2,3,4,5,6,7,8,9
0,2,4,6,8,10,12,14,16,18
0,3,6,9,12,15,18,21,24,27
You can also stuff each rows list (copy it by myList[:]) into another list and use writer.writerows([ [1,2,3,4],[4,5,6,7] ]) to write all your rows in one go .
See: https://docs.python.org/2/library/csv.html#writer-objects or https://docs.python.org/3/library/csv.html#writer-objects
Related
i have a large csv file and can not load in memory at a time,i also want to add some columns at the side of csv,so i want to add one column once a time because that does not cost many memory,i use python and pandas,so what can i do for that.
here's my code.
def toCsv(filepath,lists):
i = 0
with open(filepath,'r+') as f:
reader = csv.reader(f)
writer = csv.writer(f)
for row in reader:
print lists
row.append(lists[i])
writer.writerows(row)
i = i+1
I'm relatively new to Python. I'm trying to find a way to create a script that looks at a CSV file called "data_old" from a previous month, and compares it with the data in a more recent month called "data_new", then finally outputs that data into a new CSV "data_compare".
The files each month are consistently laid out and look like this (example)
Month 1
Company, StaffNumber, NeedToPass, Passed, %age meeting requirement
xxxxxxxx, 100, 80, 30, 30%
Month 3
Company, StaffNumber, NeedToPass, Passed, %meeting requirement
xxxxxxxx, 101, 81, 54, 60%
I'm trying to get the output file to compare the data from all rows and show me "Percentage improved, instead of "Percentage meeting requirement". Nothing I try seems to work.
As the numbers change all the time the only common data will be the company name.
I need a simple, explanatory way with comments... as I'd like to understand the logic so I can modify it and add functions.
Much appreciated.
Here ist a python code example which might does what you want. This script asumes that the two input csv files have the same amount of lines. In the function test the function zip i used, which stops if one list is at the end. If your files have a different amount of lines you have to manually loop over both. But I think it is a good starting point
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
def parse_csv(filename, sort_row=0, as_dict=False, delimiter=","):
r = list()
with open(filename, "rb") as f:
# make csv reader object
reader = csv.reader(f, delimiter=delimiter)
if as_dict:
# make dict if desired
header = [h.strip() for h in reader.next()]
for row in reader:
if as_dict:
# make dict if desired
r.append(dict(zip(header, row)))
else:
# strip each item in the row and append it to the return list
r.append([h.strip() for h in row])
# sort the list by the first item (company name in this example)
r.sort(key=lambda x: x[sort_row])
return r
def write_csv(filename, fieldnames, rows, delimiter=","):
with open(filename, "w") as f:
# make csv writer object
writer = csv.writer(f, delimiter=delimiter)
# write the first header line
writer.writerow(fieldnames)
for row in rows:
# write each row
writer.writerow(row)
def test():
data_old = parse_csv("m1.csv")
data_new = parse_csv("m2.csv")
#write_csv("data_compare.csv", data_old[:1][0], data_old[1:])
result = list()
# loop over the items (skipping the first header row)
for o, n in zip(data_old[1:], data_new[1:]):
# calculate the improvement (or whatever needs to be calculated)
value = float(n[4].replace("%", "")) - float(o[4].replace("%", ""))
# create the row
result.append([o[0], "%s%%" % value, o[4], n[4]])
#result.append(["%s%%" % value])
header = ["Company", "Percentage improved", "old", "new"]
#header = ["Company", "Percentage improved"]
write_csv("data_compare.csv", header, result)
if __name__ == '__main__':
test()
I need help sorting a list from a text file. I'm reading a .txt and then adding some data, then sorting it by population change %, then lastly, writing that to a new text file.
The only thing that's giving me trouble now is the sort function. I think the for statement syntax is what's giving me issues -- I'm unsure where in the code I would add the sort statement and how I would apply it to the output of the for loop statement.
The population change data I am trying to sort by is the [1] item in the list.
#Read file into script
NCFile = open("C:\filelocation\NC2010.txt")
#Save a write file
PopulationChange =
open("C:\filelocation\Sorted_Population_Change_Output.txt", "w")
#Read everything into lines, except for first(header) row
lines = NCFile.readlines()[1:]
#Pull relevant data and create population change variable
for aLine in lines:
dataRow = aLine.split(",")
countyName = dataRow[1]
population2000 = float(dataRow[6])
population2010 = float(dataRow[8])
popChange = ((population2010-population2000)/population2000)*100
outputRow = countyName + ", %.2f" %popChange + "%\n"
PopulationChange.write(outputRow)
NCFile.close()
PopulationChange.close()
You can fix your issue with a couple of minor changes. Split the line as you read it in and loop over the sorted lines:
lines = [aLine.split(',') for aLine in NCFile][1:]
#Pull relevant data and create population change variable
for dataRow in sorted(lines, key=lambda row: row[1]):
population2000 = float(dataRow[6])
population2010 = float(dataRow[8])
...
However, if this is a csv you might want to look into the csv module. In particular DictReader will read in the data as a list of dictionaries based on the header row. I'm making up the field names below but you should get the idea. You'll notice I sort the data based on 'countryName' as it is read in:
from csv import DictReader, DictWriter
with open("C:\filelocation\NC2010.txt") as NCFile:
reader = DictReader(NCFile)
data = sorted(reader, key=lambda row: row['countyName'])
for row in data:
population2000 = float(row['population2000'])
population2010 = float(row['population2010'])
popChange = ((population2010-population2000)/population2000)*100
row['popChange'] = "{0:.2f}".format(popChange)
with open("C:\filelocation\Sorted_Population_Change_Output.txt", "w") as PopulationChange:
writer = csv.DictWriter(PopulationChange, fieldnames=['countryName', 'popChange'])
writer.writeheader()
writer.writerows(data)
This will give you a 2 column csv of ['countryName', 'popChange']. You would need to correct this with the correct fieldnames.
You need to read all of the lines in the file before you can sort it. I've created a list called change to hold the tuple pair of the population change and the country name. This list is sorted and then saved.
with open("NC2010.txt") as NCFile:
lines = NCFile.readlines()[1:]
change = []
for line in lines:
row = line.split(",")
country_name = row[1]
population_2000 = float(row[6])
population_2010 = float(row[8])
pop_change = ((population_2010 / population_2000) - 1) * 100
change.append((pop_change, country_name))
change.sort()
output_rows = []
[output_rows.append("{0}, {1:.2f}\n".format(pair[1], pair[0]))
for pair in change]
with open("Sorted_Population_Change_Output.txt", "w") as PopulationChange:
PopulationChange.writelines(output_rows)
I used a list comprehension to generate the output rows which swaps the pair back in the desired order, i.e. country name first.
I imported my CSV File and made the data into an array. Now I was wondering, what can I do so that I'm able to print a specific value in the array? For instance if I wanted the value in the 2nd row, 2nd column.
Also how would I go about adding the two values together? Thanks.
import csv
import numpy as np
f = open("Test.csv")
csv_f = csv.reader(f)
for row in csv_f:
print(np.array(row))
f.close()
There is no need to use csv module.
This code reads csv file and prints value of cell in second row and second column. I am assuming that fields are separated by commas.
with open("Test.csv") as fo:
table = [row.split(",") for row in fo.read().replace("\r", "").split("\n")]
print table[1][1]
So, I grabbed a dataset ("Company Funding Records") from here. Then, I just rewrote a little...
#!/usr/bin/python
import csv
#import numpy as np
csvaslist = []
f = open("TechCrunchcontinentalUSA.csv")
csv_f = csv.reader(f)
for row in csv_f:
# print(np.array(row))
csvaslist.append(row)
f.close()
# Now your data is in a dict. Everything past this point is just playing
# Add together a couple of arbitrary values...
print int(csvaslist[2][7]) + int(csvaslist[11][7])
# Add using a conditional...
print "\nNow let's see what Facebook has received..."
fbsum = 0
for sublist in csvaslist:
if sublist[0] == "facebook":
print sublist
fbsum += int(sublist[7])
print "Facebook has received", fbsum
I've commented lines at a couple points to show what's being used and what was unneeded. Notice at the end that referring to a particular datapoint is simply a matter of referencing what is, effectively, original_csv_file[line_number][field_on_that_line], and then recasting as int, float, whatever you need. This is because the csv file has been changed to a list of lists.
To get specific values within your array/file, and add together:
import csv
f = open("Test.csv")
csv_f = list(csv.reader(f))
#returns the value in the second row, second column of your file
print csv_f[1][1]
#returns sum of two specific values (in this example, value of second row, second column and value of first row, first column
sum = int(csv_f[1][1]) + int(csv_f[0][0])
print sum
I have a for loop that prints 4 details:
deats = soup.find_all('p')
for n in deats:
print n.text
The output is 4 printed lines.
Instead of printing, what I'd like to do is have each 'n' written to a different column in a .csv. Obviously, when I use a regular .write() it puts it in the same column. In other words, how would I make it write each iteration of the loop to the next column?
You would create the csv row as a loop (or using list comprehension) I will show the explicit loop for ease of reading and you can change it to a single list comprehension line yourself.
row = []
for n in deats:
row.append(n)
Now you have row ready to write to the .csv file using csv.Writer()
Hei, try like this:
import csv
csv_output = csv.writer(open("output.csv", "wb")) # output.csv is the output file name!
csv_output.writerow(["Col1","Col2","Col3","Col4"]) # Setting first row with all column titles
temp = []
deats = soup.find_all('p')
for n in deats:
temp.append(str(n.text))
csv_output.writerow(temp)
You use the csv module for this:
import csv
with open('output.csv', 'wb') as csvfile:
opwriter = csv.writer(csvfile, delimiter=','
opwriter.writerow([n.text for n in deats])
extra_stuff = pie,cake,eat,too
some_file.write(",".join(n.text for n in deats)+"," + ",".join(str(s) for s in extra_stuff))
??? is that all you are looking for?