I want to delete rows from a csv file as they are processed.
My file:
Sr,Name1,Name2,Name3
1,Zname1,Zname2,Zname3
2,Yname1,Yname2,Yname3
3,Xname1,Xname2,Xname3
I want to read row by row and delete the row which has been processed.
So the file will be now:
2,Yname1,Yname2,Yname3
3,Xname1,Xname2,Xname3
The solutions which are provided on other questions are:
read the file
use next() or any other way to skip the row and write the remaining rows in an updated file
I want to delete the row from the original file which was entered in .reader() method
My code:
with open("file.txt", "r") as file
reader = csv.reader(file)
for row in reader:
#process the row
#delete the row
I have not been able to figure out how to delete/remove the row.
I want the change to be in the original file.txt because I will be running the program many times and so each time it runs, file.txt will already be present and the program will start from where it ended the last time.
Just read the csv file in memory as a list, then edit that list, and then write it back to the csv file.
lines = list()
members= input("Please enter a member's name to be deleted.")
with open('mycsv.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field == members:
lines.remove(row)
with open('mycsv.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
You can delete column like this:
We can use the panda pop () method to remove columns from CSV by naming the column as an argument.
Import Pandas.
Read CSV File.
Use pop() function for removing or deleting rows or columns from the CSV files.
Print Data.
You probably can find inspiration here: How to delete a specific line in a file?.
And don't forget to grant write permission when opening the file.
Since the pandas package deals with big data, there is no solution in basic Python.
You will have to import pandas.
import pandas
df=pandas.read_csv("file_name.txt")
df.set_value(0,"Name3",new_value)
df.to_csv("file_name.txt", index=False)
This code edits the cell in the 0th row and Name3 column. The 0th row is the first row below the header. Thus, Zname3 will be changed to something else. You can similarly delete a row or a cell.
I have not tried this code but it is supposed to work in the required manner.
Related
I want to search a column and delete from csv file using python. I cannot dataframes as I need to work with large files and can't load it in RAM. How to do it?
example csv file-
Home,Contact,Adress
abc,123,xyz
I need to find and delete Contact for example. I thought to use csv.reader but cannot figure out how to do it
Check this :
import csv
col = 'Contact'
with open('your_csv.csv') as f:
with open('new_csv.csv', 'w', newline='') as g:
# creating csv reader
reader = csv.reader(f)
# getting the 'col' index in the header, we want to delete it in the next lines
col_index = next(reader).index(col)
for line in reader:
del line[col_index]
# writing to new csv file
writer = csv.writer(g)
writer.writerow(line)
Explanation for using newline='' is here.
If your application prefers to work with pandas still, I'd suggest to play with pandas chunking tactic. See example below:
iterator = pandas.read_csv('/tmp/abc.csv', chunksize=10**5)
df_new = pandas.DataFrame(columns=['your_remaining_columns'])
for df in iterator:
del df['col_b']
df_new = pandas.concat([df_new, df])
print(df_new.shape[0])
print(df_new.columns)
I was able to process a 50GB csv file with complex data (non utf8 encoding, cell contains ,, doing deduplication and filtered out bad rows) by this approach before.
Environment: Python 3.7 on Windows
Goal: Write out a set to a .csv file, with each set entry on a new line.
Problem: Each set entry is not on a new line... when I open the CSV file in Excel, every set entry is in a separate column, rather than a separate row.
Question: What do I need to do to get each set entry written on a new line?
import csv
test_set = {'http://www.apple.com', 'http://www.amazon.com', 'http://www.microsoft.com', 'https://www.ibm.com'}
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows([test_set])
f.close()
You passed writer.writerows() a list with a single element, and so it wrote a single row.
You need to convert your set to a series of rows; each row a list with the row contents. You could use a generator expression to produce the rows:
writer.writerows([value] for value in test_set)
However, you are not really producing a CSV here. With a single column, you may as well just write the set contents directly to a file with newlines in between. The print() function can be co-opted for this task:
with open('output.csv', 'w') as f:
print(*test_set, sep='\n', file=f)
As per the title, I'm attempting to write a python script to read a csv file, filter through it to see which ones I need and output the filtered rows into a seperate csv file.
So far I am able to read the csv files with:
open('list.csv') as f
csv_f = csv.reader(f)
and I am storing 3 of the rows in a tuple and using it to compare it to another list to see if there is a match. If there is a match I want the row containing the tuple to output to a new csv file.
I have successfully been able to read the files, match the tuples with another list and output which have been matched as text. The problem is I do not know how to then output the rows that match the tuple into a new csv file.
I was thinking to assign a row number to each tuple but that did not go anywhere either.
I want to know the best way I can effectively output the rows I need
Using csv module, this could be a solution more elegant:
with open('input.csv', 'r') as inp, open('output', 'w') as outp:
csv_f = csv.reader(inp)
csv_o = csv.reader(outp)
for line in csv_f:
if line == 'something':
csv_o.writeline(line)
Open both files. Iterate through the lines in the file that you read from and case your condition evaluates to True, then write the line to the output file.
with open('list.csv', 'r') as rf:
with open('output.csv', 'w') as wf:
# Read lines
for read_line in rf:
if <your condition>:
# Write to the file
wf.write(read_line)
I have a set of csv files and another csv file, GroundTruth2010_edited_copy.csv, which contains information I'd like to append to the end of the rows of the Set of files. The files contain information describing geologic samples. For all the files, including GroundTruth2010_edited_copy.csv, each row has an identifying 'rockid' that identifies the sample and the remainder of the row describes various parameters of the sample. I want to append corresponding information from GroundTruth2010_edited_copy.csv to the Set of csv files. That is, if the rows have the same 'rockid,' I want to combine them into a new row in a new csv file. Hence, there is a new csv file for each original csv file in the Set. Here is my code.
import os
import csv
#read in ground truth data
csvfilename='GroundTruth/GroundTruth2010_edited_copy.csv'
with open(csvfilename) as csvfile:
rocreader=csv.reader(csvfile)
path=os.getcwd()
filenames = os.listdir(path)
for filename in filenames:
if filename.endswith('.csv'):
#read csv files
r=csv.reader(open(filename))
new_data = []
for row in r:
rockid=row[-1]
for krow in rocreader:
entry=krow[0]
newentry=entry[:5] +entry[6:] #remove extra '0' from middle of entry
if newentry==rockid:
print('Ok!')
#append ground truth data
new_data.append([row, krow[1], krow[2], krow[3], krow[4]])
#write csv files
newfilename = "".join(filename.split(".csv")) + "_GT.csv"
with open(newfilename, "w") as f:
writer = csv.writer(f)
writer.writerows(new_data)
The code runs and makes my new csv files, but they are all empty. The problem seems to be that my second 'if' statement is never true: the console never prints 'Ok!' I've tried troubleshooting for a bit, and been rather frustrated. Perhaps the most frustrating thing is that after the program finishes, if I enter
rockid==newentry
The console returns 'True,' so it seems to me I should get at least one 'Ok!' for the final iteration. Can anyone help me find what's wrong?
Also, since my if statement is never true, there may also be a problem with the way I append 'new_data.'
You only open rocreader once, so when you try to use it later in the loop, you'll only get rows from it the first time through-- in the rest of the loop's runs, you're reading 0 rows (and of course getting no matches). To read it over and over, open and close it once for each time you need to use it.
But instead of re-scanning the Ground Truth file from disk (slow!) for every row of each of the other CSVs, you should read it once into a dictionary, so you can look up IDs in one step.
with open(csvfilename) as csvfile:
rocreader=csv.reader(csvfile)
rocindex = dict((row[-1], row) for row in rocreader)
Then for any key newentry, you can just check like this:
if newentry in rocindex:
truth = rocindex[newentry]
# Merge it with the row that has key `newentry`
Hi
I have a code to read the csv file
import csv
d = csv.reader(open('C:/Documents and Settings/242481/My Documents/file.csv'))
for row in d:
print row
This code returns all the rows in the csv file
Is there any way i can read one row at a time.
And each time i execute the print line i need to get the next row.
Thanks in advance
Aadith
it should work the way you have it, but maybe the EOL characters are not what is expected for the system you're on. try opening it with 'rU': open('file.csv', 'rU')
to verify that it's printing one row at a time, you could print a blank line between rows:
for row in d:
print row
print
or pause it:
for row in d:
print row
raw_input('continue-> ')
For your other code, it should be something like:
def value():
infile=open("C:/Documents and Settings/242481/My Documents/file.csv", "rU")
data = [row for row in infile]
infile.close()
return data
Always close your open files. It's good practice, even though not always strictly necessary. And 'file' is a Python class name. Although you can use these any way you wish, doing so can lead to hard-to-find bugs later on.