Removing blank spaces from a CSV file without creating a new file - python

I have blank spaces in a csv sheet that I want to get rid of it.
After searching for hours I realized that this is the code for it:
input = open('file.txt', 'wb')
output = open('new_file.txt', 'wb')
writer = csv.writer(output)
for row in csv.reader(input):
if any(field.strip() for field in row):
writer.writerow(row)
input.close()
output.close()
My question is: How do I remove the blank spaces without having to create a new file?

You can first extract the valid rows and overwrite the file afterwards, provided your file is not too big and thus the rows can fit in the memory entirely
with open('file.txt', 'rb') as inp:
valid_rows = [row for row in csv.reader(inp) if any(field.strip() for field in row)]
with open('file.txt', 'wb') as out:
csv.writer(out).writerows(valid_rows)

Related

Delete rows from csv file using function in Python

def usunPsa(self, ImiePsa):
with open('schronisko.csv', 'rb') as input, open('schronisko.csv', 'wb') as output:
writer = csv.writer(output)
for row in csv.reader(input):
if row[0] == ImiePsa:
writer.writerow(row)
with open(self.plik, 'r') as f:
print(f.read())
Dsac;Chart;2;2020-11-04
Dsac;Chart;3;2020-11-04
Dsac;Chart;4;2020-11-04
Lala;Chart;4;2020-11-04
Sda;Chart;4;2020-11-04
Sda;X;4;2020-11-04
Sda;Y;4;2020-11-04
pawel;Y;4;2020-11-04`
If I use usunPsa("pawel") every line gets removed.
Following code earse my whole csv file instead only one line with given ImiePsa,
What may be the problem there?
I found the problem. row[0] in your code returns the entire row, that means the lines are not parsed correctly. After a bit of reading, I found that csv.reader has a parammeter called delimiter to sepcify the delimiter between columns.
Adding that parameter solves your problem, but not all problems though.
The code that worked for me (just in case you still want to use your original code)
import csv
def usunPsa(ImiePsa):
with open('asd.csv', 'rb') as input, open('schronisko.csv', 'wb') as output:
writer = csv.writer(output)
for row in csv.reader(input, delimiter=';'):
if row[0] == ImiePsa:
writer.writerow(row)
usunPsa("pawel")
Notice that I changed the output filename. If you want to keep the filename the same however, you have to use Hamza Malik's answer.
Just read the csv file in memory as a list, then edit that list, and then write it back to the csv file.
lines = list()
members= input("Please enter a member's name to be deleted.")
with open('mycsv.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
lines.append(row)
for field in row:
if field == members:
lines.remove(row)
with open('mycsv.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)

Writing to a temporary csv file in Python to read from it for sorting and then writing to another file produces empty results

I am having to add couple of lists in python as columns to an existing CSV file. I want to make use of a temporary file for the output CSV because I want to sort first 2 columns of that resulting data and then write to a new final CSV file. I don't want to keep the unsorted csv file which is why I am trying to use tempfile.NamedTemporaryFile for that step. It's giving nothing in the final CSV file but no other code errors. I changed how the with blocks are indented but unable to fix it. I tested by using a file on disk which works fine. I need help understanding what I am doing wrong. Here is my code:
# Open the existing csv in read mode and new temporary csv in write mode
with open(csvfile.name, 'r') as read_f, \
tempfile.NamedTemporaryFile(suffix='.csv', prefix=('inter'), mode='w', delete=False) as write_f:
csv_reader = csv.reader(read_f)
csv_writer = csv.writer(write_f)
i = 0
for row in csv_reader:
# Append the new list values to that row/list
row.append(company_list[i])
row.append(highest_percentage[i])
# Add the updated row / list to the output file
csv_writer.writerow(row)
i += 1
with open(write_f.name) as data:
stuff = csv.reader(data)
sortedlist = sorted(stuff, key=operator.itemgetter(0, 1))
#now write the sorted result into final CSV file
with open(fileout, 'w', newline='') as f:
fileWriter = csv.writer(f)
for row in sortedlist:
fileWriter.writerow(row)
You should insert a write_f.seek(0, 0)
Just before the line opening the temporary file:
write_f.seek(0, 0)
with open(write_f.name) as data:
I found out what was causing the IndexError and consequently the empty final CSV. I resolved it with the help of this: CSV file written with Python has blank lines between each row. Here's my changed code that worked as desired:
with open(csvfile.name, 'r') as read_f, \
tempfile.NamedTemporaryFile(suffix='.csv', prefix=('inter'), newline='', mode='w+', delete=False) as write_f:
csv_reader = csv.reader(read_f)
csv_writer = csv.writer(write_f)
i = 0
for row in csv_reader:
# Append the new list values to that row/list
row.append(company_list[i])
row.append(highest_percentage[i])
# Add the updated row / list to the output file
csv_writer.writerow(row)
i += 1
with open(write_f.name) as read_stuff, \
open(fileout, 'w', newline='') as write_stuff:
read_data = csv.reader(read_stuff)
write_data = csv.writer(write_stuff)
sortedlist = sorted(read_data, key=operator.itemgetter(0, 1))
for row in sortedlist:
write_data.writerow(row)

Formatting csv file with python

I have a csv file with the following structure:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"
I need him to stay like this:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"
I received this .csv file from someone else, so I do not know how the conversion was done. I am trying unsuccessfully with the code below:
input_fd = open("/home/gustavo/Downloads/Redes/Despesas/csvfile.csv", 'r')
output_fd = open('dados_2018_1.csv', 'w')
for line in input_fd.readlines():
line.replace("\"","")
output_fd.write(line)
input_fd.close()
output_fd.close()
Is it possible to make this change or will I have to do the conversion from an xml file to a csv, and make this change during the conversion?
First: tell the reader to use delimiter=";" and quoting=csv.QUOTE_NONE. This will properly split your second line which is a string literal containing your delimiter, which you desire to be split. We'll tweak that data to remove the quotation marks (otherwise our output will be quoted strings like '"txNomeParlamentar"', etc).
import csv
with open('file.txt') as f:
reader = csv.reader(f, delimiter=";", quoting=csv.QUOTE_NONE)
data = [list(map(lambda s: s.replace('"', ''), row)) for row in reader]
Then: we write the file back out, with the delimiter=";", and quoting=csv.QUOTE_ALL to ensure each item is set in quotes
with open('out.txt', 'w', newline='') as o:
writer = csv.writer(o, delimiter=";", quoting=csv.QUOTE_ALL)
writer.writerows(data)
Input:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"
Output:
"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"
A couple things. First, you do NOT have a csv file because in a csv file, the delimiter is a comma by definition. I'm assuming you want the values in your data file to (1) remain separated by semicolons [why not fix it and make it commas?] and (2) you want each value to be in quotation marks.
If so, I think this will work:
# data reader
in_file = 'data.txt'
out_file = 'fixed.txt'
output = open(out_file, 'w')
with open(in_file, 'r') as source:
for line in source:
# split by semicolon
data = line.strip().split(';')
# remove all quotes found
data = [t.replace('"','') for t in data]
for item in data[:-1]:
output.write(''.join(['"', item, '"',';']))
# write the last item separately, without the trailing ';'
output.write(''.join(['"', item, '"']))
output.write('\n')
output.close()
If your target user is python, you should consider replacing the semicolons with commas (correct csv format) and forgoing the quotes. Everything python reads from csv is taken in as string anyhow.
Using csv module.
Ex:
import csv
with open(filename) as csvfile:
reader = csv.reader(csvfile, delimiter=";")
headers = next(reader) #Read Headers
data = [row.strip('"').split(";") for row in csvfile] #Format data
with open(filename, "w") as csvfile_out:
writer = csv.writer(csvfile_out, delimiter=";")
writer.writerow(headers) #Write Headers
writer.writerows(data) #Write data
You could use the csv module to do it if you massage the input data a little first.
import csv
#input_csv = '/home/gustavo/Downloads/Redes/Despesas/csvfile.csv'
input_csv = 'gustavo_input.csv'
output_csv = 'dados_2018_1.csv'
with open(input_csv, 'r', newline='') as input_fd, \
open(output_csv, 'w', newline='') as output_fd:
reader = csv.DictReader(input_fd, delimiter=';')
writer = csv.DictWriter(output_fd, delimiter=';',
fieldnames=reader.fieldnames,
quoting=csv.QUOTE_ALL)
first_field = reader.fieldnames[0]
for row in reader:
fields = row[first_field].split(';')
newrow = dict(zip(reader.fieldnames, fields))
writer.writerow(newrow)
print('done')

Python search csv file from input text file

I'm new to python and I struggling with this code. Have 2 file, 1st file is text file containing email addresses (one each line), 2nd file is csv file with 5-6 columns. Script should take search input from file1 and search in file 2, the output should be stored in another csv file (only first 3 columns) see example below. Also I have copied a script that I was working on. If there is a better/efficient script then please let me know. Thank you, appreciate your help.
File1 (output.txt)
rrr#company.com
eee#company.com
ccc#company.com
File2 (final.csv)
Sam,Smith,sss#company.com,admin
Eric,Smith,eee#company.com,finance
Joe,Doe,jjj#company.com,telcom
Chase,Li,ccc#company.com,IT
output (out_name_email.csv)
Eric,Smith,eee#company.com
Chase,Li,ccc#company.com
Here is the script
import csv
outputfile = 'C:\\Python27\\scripts\\out_name_email.csv'
inputfile = 'C:\\Python27\\scripts\\output.txt'
datafile = 'C:\\Python27\\scripts\\final.csv'
names=[]
with open(inputfile) as f:
for line in f:
names.append(line)
with open(datafile, 'rb') as fd, open(outputfile, 'wb') as fp_out1:
writer = csv.writer(fp_out1, delimiter=",")
reader = csv.reader(fd, delimiter=",")
headers = next(reader)
for row in fd:
for name in names:
if name in line:
writer.writerow(row)
Load the emails into a set for O(1) lookup:
with open(inputfile) as fin:
emails = set(line.strip() for line in fin)
Then loop over the rows once, and check it exists in emails - no need to loop over each possible match for each row:
# ...
for row in reader:
if row[1] in emails:
writer.writerow(row)
If you're not doing anything else, then you can make it:
writer.writerows(row for row in reader if row[1] in emails)
A couple of notes, in your original code you're not using the csv.reader object reader - you're looping over fd and you appear to have some naming issues with names and line and row...

How to skip the headers when processing a csv file using Python?

I am using below referred code to edit a csv using Python. Functions called in the code form upper part of the code.
Problem: I want the below referred code to start editing the csv from 2nd row, I want it to exclude 1st row which contains headers. Right now it is applying the functions on 1st row only and my header row is getting changed.
in_file = open("tmob_notcleaned.csv", "rb")
reader = csv.reader(in_file)
out_file = open("tmob_cleaned.csv", "wb")
writer = csv.writer(out_file)
row = 1
for row in reader:
row[13] = handle_color(row[10])[1].replace(" - ","").strip()
row[10] = handle_color(row[10])[0].replace("-","").replace("(","").replace(")","").strip()
row[14] = handle_gb(row[10])[1].replace("-","").replace(" ","").replace("GB","").strip()
row[10] = handle_gb(row[10])[0].strip()
row[9] = handle_oem(row[10])[1].replace("Blackberry","RIM").replace("TMobile","T-Mobile").strip()
row[15] = handle_addon(row[10])[1].strip()
row[10] = handle_addon(row[10])[0].replace(" by","").replace("FREE","").strip()
writer.writerow(row)
in_file.close()
out_file.close()
I tried to solve this problem by initializing row variable to 1 but it didn't work.
Please help me in solving this issue.
Your reader variable is an iterable, by looping over it you retrieve the rows.
To make it skip one item before your loop, simply call next(reader, None) and ignore the return value.
You can also simplify your code a little; use the opened files as context managers to have them closed automatically:
with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
reader = csv.reader(infile)
next(reader, None) # skip the headers
writer = csv.writer(outfile)
for row in reader:
# process each row
writer.writerow(row)
# no need to close, the files are closed automatically when you get to this point.
If you wanted to write the header to the output file unprocessed, that's easy too, pass the output of next() to writer.writerow():
headers = next(reader, None) # returns the headers or `None` if the input is empty
if headers:
writer.writerow(headers)
Another way of solving this is to use the DictReader class, which "skips" the header row and uses it to allowed named indexing.
Given "foo.csv" as follows:
FirstColumn,SecondColumn
asdf,1234
qwer,5678
Use DictReader like this:
import csv
with open('foo.csv') as f:
reader = csv.DictReader(f, delimiter=',')
for row in reader:
print(row['FirstColumn']) # Access by column header instead of column number
print(row['SecondColumn'])
Doing row=1 won't change anything, because you'll just overwrite that with the results of the loop.
You want to do next(reader) to skip one row.
Simply iterate one time with next()
with open(filename) as file:
csvreaded = csv.reader(file)
header = next(csvreaded)
for row in csvreaded:
empty_list.append(row) #your csv list without header
or use [1:] at the end of reader object
with open(filename) as file:
csvreaded = csv.reader(file)
header = next(csvreaded)
for row in csvreaded[1:]:
empty_list.append(row) #your csv list without header
Inspired by Martijn Pieters' response.
In case you only need to delete the header from the csv file, you can work more efficiently if you write using the standard Python file I/O library, avoiding writing with the CSV Python library:
with open("tmob_notcleaned.csv", "rb") as infile, open("tmob_cleaned.csv", "wb") as outfile:
next(infile) # skip the headers
outfile.write(infile.read())

Categories