I want to search a column and delete from csv file using python. I cannot dataframes as I need to work with large files and can't load it in RAM. How to do it?
example csv file-
Home,Contact,Adress
abc,123,xyz
I need to find and delete Contact for example. I thought to use csv.reader but cannot figure out how to do it
Check this :
import csv
col = 'Contact'
with open('your_csv.csv') as f:
with open('new_csv.csv', 'w', newline='') as g:
# creating csv reader
reader = csv.reader(f)
# getting the 'col' index in the header, we want to delete it in the next lines
col_index = next(reader).index(col)
for line in reader:
del line[col_index]
# writing to new csv file
writer = csv.writer(g)
writer.writerow(line)
Explanation for using newline='' is here.
If your application prefers to work with pandas still, I'd suggest to play with pandas chunking tactic. See example below:
iterator = pandas.read_csv('/tmp/abc.csv', chunksize=10**5)
df_new = pandas.DataFrame(columns=['your_remaining_columns'])
for df in iterator:
del df['col_b']
df_new = pandas.concat([df_new, df])
print(df_new.shape[0])
print(df_new.columns)
I was able to process a 50GB csv file with complex data (non utf8 encoding, cell contains ,, doing deduplication and filtered out bad rows) by this approach before.
Related
I want to delete rows from a csv file as they are processed.
My file:
Sr,Name1,Name2,Name3
1,Zname1,Zname2,Zname3
2,Yname1,Yname2,Yname3
3,Xname1,Xname2,Xname3
I want to read row by row and delete the row which has been processed.
So the file will be now:
2,Yname1,Yname2,Yname3
3,Xname1,Xname2,Xname3
The solutions which are provided on other questions are:
read the file
use next() or any other way to skip the row and write the remaining rows in an updated file
I want to delete the row from the original file which was entered in .reader() method
My code:
with open("file.txt", "r") as file
reader = csv.reader(file)
for row in reader:
#process the row
#delete the row
I have not been able to figure out how to delete/remove the row.
I want the change to be in the original file.txt because I will be running the program many times and so each time it runs, file.txt will already be present and the program will start from where it ended the last time.
Just read the csv file in memory as a list, then edit that list, and then write it back to the csv file.
lines = list()
members= input("Please enter a member's name to be deleted.")
with open('mycsv.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field == members:
lines.remove(row)
with open('mycsv.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
You can delete column like this:
We can use the panda pop () method to remove columns from CSV by naming the column as an argument.
Import Pandas.
Read CSV File.
Use pop() function for removing or deleting rows or columns from the CSV files.
Print Data.
You probably can find inspiration here: How to delete a specific line in a file?.
And don't forget to grant write permission when opening the file.
Since the pandas package deals with big data, there is no solution in basic Python.
You will have to import pandas.
import pandas
df=pandas.read_csv("file_name.txt")
df.set_value(0,"Name3",new_value)
df.to_csv("file_name.txt", index=False)
This code edits the cell in the 0th row and Name3 column. The 0th row is the first row below the header. Thus, Zname3 will be changed to something else. You can similarly delete a row or a cell.
I have not tried this code but it is supposed to work in the required manner.
I have a code that creates a csv file, when I first open it I everything is in one column so I have to do the usual
Go to Data and do the following. The data is then spplited into columns.
I work with Office 365, and recently I was told that if I change the commas with semicolons then when I open the newly created file Csv file, Excel will automatically open the file already separated into columns.
I’m asking for some advice here, since having to do this process for every created Csv file is really time consuming.
Looking for a way to alter my code so it does this automatically maybe instead of splitting columns with commas, do it with semicolons in this case. Just to try if this works out.
with open('created.csv', 'w', newline='') as f:
writer = csv.writer(f)
[1]: https://i.stack.imgur.com/OtxO4.png
If you already want to transform an existing file you can do it like that:
with open('created.csv', 'r', encoding='utf-8') as f_in, open("outfile.csv", 'w') as f_out:
for line in f_in:
line = line.split(",")
line = ";".join(line)
f_out.write(line)
In case you have already a dataframe you can do it like #jezrael said in the comment with:
df.to_csv('created.csv', sep=';')
As mention in the comment you are already using the csv module to write your file. You have to change this line in your code:
writer = csv.writer(f)
to
writer = csv.writer(f, delimiter=';')
As for me if I open a csv splitted with "," I have to that thing you described in your question. But if I open a csv splitted with ";" it's already in the right columns.
This is (for Windows user at least) dependent on your region settings. This can be different for everyone dependent on your language settings.
You can check them here and also change it if you want:
https://www.itsupportguides.com/knowledge-base/office-2013/excel-20132016-how-to-change-csv-delimiter-character/
I am doing data migration.Old application data is exported as one CSV file. We cannot import this CSV file directly to new application. Need to create new CSV template that match with new application and import some data into this new CSV template. I would like to request code that facilitate this requirement.
I'm not exactly sure what template you want to go to. I'm going to assume that you either want to change the number/order of columns or the delimiter.
The simplest thing is to read it line by line and write it:
import csv
with open("Old.csv", 'r') as readfp, open("new.csv", 'w') as writefp:
csvReader = csv.reader(readfp)
csvWriter = csv.writer(writefp, delimiter=',')
for line in csvReader:
#line is a list of strings so you can reorder it as you wish. I'll skip the third column as an example.
csvWriter.writerow(line[:2]+line[3:])
If you have pandas installed this is even simpler
import pandas as pd
df = pd.read_csv("Old.csv")
df.drop(labels=["name_of_bad_col1", "name_of_bad_col2"], sep=',')
df.to_csv("new.csv)
If you are going the pandas route, make sure to checkout the documentations (read_csv, to_csv)
This:
import csv
with open('original.csv', 'rb') as inp, open('new.csv', 'wb') as out:
writer = csv.writer(out)
for row in csv.reader(inp):
if row[2] != "0":
writer.writerow(row)
os.remove('original.csv')
os.rename('new.csv', 'original.csv')
allows to delete certain rows of a CSV.
Is there a more pythonic way to delete some rows of a CSV file, in-place? (instead of creating a file, deleting the original, renaming, etc.)
There isn't a more Pythonic way: you can't delete stuff in the middle of a file. Write out a new file with the stuff you want, and then rename it.
I noticed that your code does not import the os module, even though you're using it. Regardless, here's a method of doing what you need it to do without using that module.
This will open in read mode first to get the data, then write mode to overwrite. Note that you need to pass the csv.reader(f) statement to the list() function or else the data variable will simply point to the memory address of the CSV file and you won't be able to do anything with the content once it's closed. list() will actually copy the information for you.
import csv
with open("original.csv", "rb") as f:
data = list(csv.reader(f))
with open("original.csv", "wb") as f:
writer = csv.writer(f)
for row in data:
if row[2] != "0":
writer.writerow(row)
I'm using Python's csv module to do some reading and writing of csv files.
I've got the reading fine and appending to the csv fine, but I want to be able to overwrite a specific row in the csv.
For reference, here's my reading and then writing code to append:
#reading
b = open("bottles.csv", "rb")
bottles = csv.reader(b)
bottle_list = []
bottle_list.extend(bottles)
b.close()
#appending
b=open('bottles.csv','a')
writer = csv.writer(b)
writer.writerow([bottle,emptyButtonCount,100, img])
b.close()
And I'm using basically the same for the overwrite mode(which isn't correct, it just overwrites the whole csv file):
b=open('bottles.csv','wb')
writer = csv.writer(b)
writer.writerow([bottle,btlnum,100,img])
b.close()
In the second case, how do I tell Python I need a specific row overwritten? I've scoured Gogle and other stackoverflow posts to no avail. I assume my limited programming knowledge is to blame rather than Google.
I will add to Steven Answer :
import csv
bottle_list = []
# Read all data from the csv file.
with open('a.csv', 'rb') as b:
bottles = csv.reader(b)
bottle_list.extend(bottles)
# data to override in the format {line_num_to_override:data_to_write}.
line_to_override = {1:['e', 'c', 'd'] }
# Write data to the csv file and replace the lines in the line_to_override dict.
with open('a.csv', 'wb') as b:
writer = csv.writer(b)
for line, row in enumerate(bottle_list):
data = line_to_override.get(line, row)
writer.writerow(data)
You cannot overwrite a single row in the CSV file. You'll have to write all the rows you want to a new file and then rename it back to the original file name.
Your pattern of usage may fit a database better than a CSV file. Look into the sqlite3 module for a lightweight database.