i have a large csv file and can not load in memory at a time,i also want to add some columns at the side of csv,so i want to add one column once a time because that does not cost many memory,i use python and pandas,so what can i do for that.
here's my code.
def toCsv(filepath,lists):
i = 0
with open(filepath,'r+') as f:
reader = csv.reader(f)
writer = csv.writer(f)
for row in reader:
print lists
row.append(lists[i])
writer.writerows(row)
i = i+1
Related
I am new in python, I have one CSV file, it has more than 1000 rows, I want to merge particular rows and move those rows to another column, can any one help?
This is the source csv file I have:
I want to move emails under members column with comma separator, like this image:
To read csv files in Python, you can use the csv module. This code does the merging you're looking for.
import csv
output = [] # this will store a list of new rows
with open('test.csv') as f:
reader = csv.reader(f)
# read the first line of the input as the headers
header = next(reader)
output.append(header)
# we will build up groups and their emails
emails = []
group = []
for row in reader:
if len(row) > 1 and row[1]: # "UserGroup" is given
if group:
group[-1] = ','.join(emails)
group = row
output.append(group)
emails = []
else: # it isn't, assume this is an email
emails.append(row[0])
group[-1] = ','.join(emails)
# now write a new file
with open('new.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(output)
I have a python script that appends 4 strings to the end of my csv file. The first column is the user's email address, and I want to search the csv to see if that users email address is already in the file, if it is I want to overwrite that whole row with my 4 new strings, but if not I want to continue to just append it to the end. I have it searching the first column for the email, and if it is there it will give me the row.
with open('Mycsvfile.csv', 'rb') as f:
reader = csv.reader(f)
indexLoop = []
for i, row in enumerate(reader):
if userEmail in row[0]:
indexLoop.append(i)
f.close()
with open("Mycsvfile.csv", 'ab') as file222:
writer = csv.writer(file222, delimiter=',')
lines = (userEmail, userDate, userPayment, userStatus)
writer.writerow(lines)
file222.close()
I want to do something like this, if email is in row it will give me the row index and I can use that to overwrite the whole row with my new data. If it isn't there I will just append the file at the bottom.
Example:
with open('Mycsvfile.csv', 'rb') as f:
reader = csv.reader(f)
new_rows = []
indexLoop = []
for i, row in enumerate(reader):
if userEmail in row[0]:
indexLoop.append(i)
new_row = row + indexLoop(userEmail, userDate, userPayment, userStatus)
new_rows.append(new_row)
else:
print "userEmail doesn't exist"
#(i'd insert my standard append statement here.
f.close
#now open csv file and writerows(new_row)
For this, you're better off using Pandas, rather than the csv module. That way you can read the whole file into memory, modify it, and then write it back to a file.
Be aware though that, modify DataFrames in place is slow, so if you have a lot of data to add, you're better of transforming it in into a dictionary and back.
import pandas as pd
file_path = r"/Users/tob/email.csv"
columns = ["email", "foo", "bar", "baz"]
df = pd.read_csv(file_path, header=None, names=columns, index_col="email")
data = df.to_dict('index')
for email, foo, bar, baz in information:
row = {"foo": foo, "bar": bar, "baz"}
data[email] = row
df = pd.DataFrame(data)
df.to_csv(file_path)
Where information is whatever your script returned.
First you don't need to call the close function when using with, python does it for you.
If you have the index you can do:
with open("myFile.csv", "r+") as f:
# gives you a list of the lines
contents = f.readlines()
# delete the old line and insert the new one
contents.pop(index)
contents.insert(index, value)
# join all lines and write it back
contents = "".join(contents)
f.write(contents)
But I would recommand you to do all the operations in one function because it doesn't make a lot of sense to open the file, iterate on its lines, close it, reopen it and updating it.
list3 = []
with open('**directory**') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
list3.append(row)
I'm completely new to data analysis using Python, and require some assistance.
The file I'm accessing contains data from 5 people (CSV file). There are 3 columns - participant number, pre-task Score, and post-task Score.
I'm essentially trying to access this file (using csv.DictReader) and manipulate the data. By this, I mean I want to calculate the difference between the post-task score and pre-task score, for each participant, and print this to the screen.
However, I'm not sure how to do this. I can print each row to the screen, and I can save each row in a list (as I've done above) - but I'm clueless as to how I am to manipulate/deal with this data. I'm wondering if there is something better than the module I'm currently using.
Calculating the difference between the second and third columns in a CSV file can be accomplished as follows:
import csv
with open('file.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
# skip the header row, remove this next line if there is no header
next(reader, None)
for row in reader:
difference = float(row[2]) - float(row[1])
print str(difference)
I am trying to append 2 data sets to my csv file. Below is my code. The code runs but my data gets appended below a set of data in the first column (i.e. col[0]). I would however like to append my data sets in separate columns at the end of file. Could I please get advice on how I might be able to do this? Thanks.
import csv
Trial = open ('Trial_test.csv', 'rt', newline = '')
reader = csv.reader(Trial)
Trial_New = open ('Trial_test.csv', 'a', newline = '')
writer = csv.writer(Trial_New, delimiter = ',')
Cortex = []
Liver = []
for col in reader:
Cortex_Diff = float(col[14])
Liver_Diff = float(col[17])
Cortex.append(Cortex_Diff)
Liver.append(Liver_Diff)
Avg_diff_Cortex = sum(Cortex)/len(Cortex)
Data1 = str(Avg_diff_Cortex)
Avg_diff_Liver = sum(Liver)/len(Liver)
Data2 = str(Avg_diff_Liver)
writer.writerows(Data1 + Data2)
Trial.close()
Trial_New.close()
I think I see what you are trying to do. I won't try to rewrite your function entirely for you, but here's a tip: assuming you are dealing with a manageable size of dataset, try reading your entire CSV into memory as a list of lists (or list of tuples), then perform your calculations on the values on this object, then write the python object back out to the new CSV in a separate block of code. You may find this article or this one of use. Naturally the official documentation should be helpful too.
Also, I would suggest using different files for input and output to make your life easier.
For example:
import csv
data = []
with open('Trial_test.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in reader:
data.append(row)
# now do your calculations on the 'data' object.
with open('Trial_test_new.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ', quotechar='|')
for row in data:
writer.writerow(row)
Something like that, anyway!
I have no knowledge of python.
What i want to be able to do is create a script that will edit a CSV file so that it will wrap every field in column 3 around quotes. I haven't been able to find much help, is this quick and easy to do? Thanks.
column1,column2,column3
1111111,2222222,333333
This is a fairly crude solution, very specific to your request (assuming your source file is called "csvfile.csv" and is in C:\Temp).
import csv
newrow = []
csvFileRead = open('c:/temp/csvfile.csv', 'rb')
csvFileNew = open('c:/temp/csvfilenew.csv', 'wb')
# Open the CSV
csvReader = csv.reader(csvFileRead, delimiter = ',')
# Append the rows to variable newrow
for row in csvReader:
newrow.append(row)
# Add quotes around the third list item
for row in newrow:
row[2] = "'"+str(row[2])+"'"
csvFileRead.close()
# Create a new CSV file
csvWriter = csv.writer(csvFileNew, delimiter = ',')
# Append the csv with rows from newrow variable
for row in newrow:
csvWriter.writerow(row)
csvFileNew.close()
There are MUCH more elegant ways of doing what you want, but I've tried to break it down into basic chunks to show how each bit works.
I would start by looking at the csv module.
import csv
filename = 'file.csv'
with open(filename, 'wb') as f:
reader = csv.reader(f)
for row in reader:
row[2] = "'%s'" % row[2]
And then write it back in the csv file.