I created a program to create a csv where every number from 0 to 1000000
import csv
nums = list(range(0,1000000))
with open('codes.csv', 'w') as f:
writer = csv.writer(f)
for val in nums:
writer.writerow([val])
then another program to remove a number from the file taken as input
import csv
import os
while True:
members= input("Please enter a number to be deleted: ")
lines = list()
with open('codes.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
if all(field != members for field in row):
lines.append(row)
else:
print('Removed')
os.remove('codes.csv')
with open('codes.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
The above code is working fine on any other device except my pc, in the first program it creates the csv file with empty rows between every number, in the second program the number of empty rows multiplies and the file size also multiples.
what is wrong with my device then?
Thanks in advance
I think you shouldn't use a csv file for single column data. Use a json file instead.
And the code that you've written for checking which value to not remove, is unnecessary. Instead you could write a list of numbers to the file, and read it back to a variable while removing a number you desire to, using the list.remove() method.
And then write it back to the file.
Here's how I would've done it:
import json
with open("codes.json", "w") as f: # Write the numbers to the file
f.write(json.dumps(list(range(0, 1000000))))
nums = None
with open("codes.json", "r") as f: # Read the list in the file to nums
nums = json.load(f)
to_remove = int(input("Number to remove: "))
nums.remove(to_remove) # Removes the number you want to
with open("codes.json", "w") as f: # Dump the list back to the file
f.write(json.dumps(nums))
Seems like you have different python versions.
There is a difference between built-in python2 open() and python3 open(). Python3 defaults to universal newlines mode, while python2 newlines depends on mode argument open() function.
CSV module docs provides a few examples where open() called with newline argument explicitly set to empty string newline='':
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
Try to do the same. Probably without explicit newline='' your writerows calls add one more newline character.
CSV file from English - Comma-Separated Values, you have a record with spaces.
To remove empty lines - when opening a file for writing, add newline="".
Since this format is tabular data, you cannot simply delete the element, the table will go. It is necessary to insert an empty string or "NaN" instead of the deleted element.
I reduced the number of entries and made them in the form of a table for clarity.
import csv
def write_csv(file, seq):
with open(file, 'w', newline='') as f:
writer = csv.writer(f)
for val in seq:
writer.writerow([v for v in val])
nums = ((j*10 + i for i in range(0, 10)) for j in range(0, 10))
write_csv('codes.csv', nums)
nums_new = []
members = input("Please enter a number, from 0 to 100, to be deleted: ")
with open('codes.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
rows_new = []
for elem in row:
if elem == members:
elem = ""
rows_new.append(elem)
nums_new.append(rows_new)
write_csv('codesdel.csv', nums_new)
Related
When I run this function on the file with 10 lines, it prints out the lines, but returns the length of 0. If I reverse the order, it prints out the length, but not the file content. I suppose this is a scope related issue, but not sure how to fix it
def read_samples(name):
with open ( '../data/samples/' + name + '.csv', encoding='utf-8', newline='') as file:
data = csv.reader(file)
for row in data:
print (row)
lines = len ( list ( data ) )
print(lines)
You are getting 0 because you have already looped over data.So now it is 0.It is an iterator which is consumed.
def read_samples(name):
with open ( '../data/samples/' + name + '.csv', encoding='utf-8', newline='') as file:
data = csv.reader(file)
x=0
for row in data:
x+=1
print (row)
lines = x
print(lines)
The file reader remembers its place in the file like a bookmark
Printing each line and getting the length both move the bookmark all the way to the end of the file
Add data.seek(0) between the loop and getting the length. This moves the bookmark back to the start of the file
Try
def read_samples(name):
with open ( '../data/samples/' + name + '.csv', encoding='utf-8', newline='') as file:
my_list = []
data = csv.reader(file)
for row in data:
print (row)
my_list.append(row)
lines = len (my_list)
print(lines)
The csv.reader() function returns an iterator (which are "consumed" when you use them) so you can only use data once. You can coerce convert it to a list before you do any operations on it to do what you want, e.g. data = list(data).
I have this csv log file which is huge in size (GBs)and has no header row:
1,<timestamp>,BEGIN
1,<timestamp>,fetched from db
1,<timestamp>,some processing
2,<timestamp>,BEGIN
2,<timestamp>,fetched from db
1,<timestamp>,returned success
3,<timestamp>,BEGIN
4,<timestamp>,BEGIN
1,<timestamp>,END
3,<timestamp>,some work
2,<timestamp>,some processing
4,<timestamp>,waiting for
2,<timestamp>,ERROR
3,<timestamp>,attempting other work
4,<timestamp>,ERROR
3,<timestamp>,attempting other work
Each line is a trace-log, and the first field is the RequestID.
Need to scan the file and store the logs only for requests which resulted in 'ERROR' to another file.
import csv
def readFile(filename):
with open(filename, 'r') as fn:
reader = csv.reader(fn)
for line in reversed(list(reader)):
yield (line)
def wrt2File():
rows = readFile('log.csv')
with open('error.csv', 'w') as fn:
writer = csv.writer(fn)
errReqIds = []
for row in rows:
if 'ERROR' in row:
errReqIds.append(row[0])
if row[0] in errReqIds:
writer.writerow(row)
wrt2File()
How to improve my code not to use memory for readFile operation and re-usability of this code? I don't want to use pandas, if any better alternative is available.
This does not look like CSV at all. Might I suggest something along the following lines:
def extract(filename):
previous = dict()
current = set()
with open(filename) as inputfile:
for line in inputfile:
id, rest = line.split(' ')
if 'ERROR' in line:
if id in previous:
for kept in previous[id]:
yield(kept)
del previous[id]
yield(line)
current.add(id)
elif id in current:
yield(line)
# Maybe do something here to remove really old entries from previous
def main():
import sys
for filename in sys.argv[1:]:
for line in extract(filename):
print(line)
if __name__ == '__main__':
main()
This simply prints to standard output. You could refactor it to accept an output file name as an option and use write on that filehandle if you like.
Since your file is huge, you could need a solution which avoid to load the entire file in memory. The following can do this job:
def find_errors(filename):
with open(filename) as f:
return {l[0:3] for l in f if 'ERROR' in l}
def wrt2File():
error_ids = find_errors('log.csv')
with open('error.csv', 'w') as fw, open('log.csv') as fr:
[fw.write(l) for l in fr if l[0:3] in error_ids]
Note that I assumed the id was the first 3 characters of the line, change it if needed.
Here's something that should be fairly fast, probably because it reads the entire file into memory to process it. You haven't defined what you mean by "efficent", so I assumed it was speed and that your computer has enough memory to do it—since that's what the code in your question does.
import csv
from itertools import groupby
from operator import itemgetter
REQUEST_ID = 0 # Column
RESULT = 2 # Column
ERROR_RESULT = 'ERROR'
keyfunc = itemgetter(REQUEST_ID)
def wrt2File(inp_filename, out_filename):
# Read log data into memory and sort by request id column.
with open(inp_filename, 'r', newline='') as inp:
rows = list(csv.reader(inp))
rows.sort(key=keyfunc)
with open(out_filename, 'w', newline='') as outp:
csv_writer = csv.writer(outp)
for k, g in groupby(rows, key=keyfunc):
g = list(g)
# If any of the lines in group have error indicator, write
# them to error csv.
has_error = False
for row in g:
if row[RESULT] == ERROR_RESULT:
has_error = True
break
if has_error:
csv_writer.writerows(g)
wrt2File('log.csv', 'error.csv')
Update:
Since I now know you don't want read it all into memory, here's one alternative. It reads the entire file twice. The first time is just determine which request ids had errors in the lines logging them. This information is used the second time to determine which line to write to the errors csv. Your OS should do a certain amount of file buffering / and data cache, so hopefully it will be an acceptable trade-off.
It's important to note that row for request ids with errors in the output file won't be grouped together since this approach doesn't sort them.
import csv
REQUEST_ID = 0 # Column
RESULT = 2 # Column
ERROR_RESULT = 'ERROR'
def wrt2File(inp_filename, out_filename):
# First pass:
# Read entire log file and determine which request id had errors.
error_requests = set() # Used to filter rows in second pass.
with open(inp_filename, 'r', newline='') as inp:
for row in csv.reader(inp):
if row[RESULT] == ERROR_RESULT:
error_requests.add(row[REQUEST_ID])
# Second pass:
# Read log file again and write rows associated with request ids
# which had errors to the output csv
with open(inp_filename, 'r', newline='') as inp:
with open(out_filename, 'w', newline='') as outp:
csv_writer = csv.writer(outp)
for row in csv.reader(inp)
if row[RESULT] in error_requests:
csv_writer.writerow(row)
wrt2File('log.csv', 'error.csv')
print('done')
This is my code i am able to print each line but when blank line appears it prints ; because of CSV file format, so i want to skip when blank line appears
import csv
import time
ifile = open ("C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv", "rb")
for line in csv.reader(ifile):
if not line:
empty_lines += 1
continue
print line
If you want to skip all whitespace lines, you should use this test: ' '.isspace().
Since you may want to do something more complicated than just printing the non-blank lines to the console(no need to use CSV module for that), here is an example that involves a DictReader:
#!/usr/bin/env python
# Tested with Python 2.7
# I prefer this style of importing - hides the csv module
# in case you do from this_file.py import * inside of __init__.py
import csv as _csv
# Real comments are more complicated ...
def is_comment(line):
return line.startswith('#')
# Kind of sily wrapper
def is_whitespace(line):
return line.isspace()
def iter_filtered(in_file, *filters):
for line in in_file:
if not any(fltr(line) for fltr in filters):
yield line
# A dis-advantage of this approach is that it requires storing rows in RAM
# However, the largest CSV files I worked with were all under 100 Mb
def read_and_filter_csv(csv_path, *filters):
with open(csv_path, 'rb') as fin:
iter_clean_lines = iter_filtered(fin, *filters)
reader = _csv.DictReader(iter_clean_lines, delimiter=';')
return [row for row in reader]
# Stores all processed lines in RAM
def main_v1(csv_path):
for row in read_and_filter_csv(csv_path, is_comment, is_whitespace):
print(row) # Or do something else with it
# Simpler, less refactored version, does not use with
def main_v2(csv_path):
try:
fin = open(csv_path, 'rb')
reader = _csv.DictReader((line for line in fin if not
line.startswith('#') and not line.isspace()),
delimiter=';')
for row in reader:
print(row) # Or do something else with it
finally:
fin.close()
if __name__ == '__main__':
csv_path = "C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv"
main_v1(csv_path)
print('\n'*3)
main_v2(csv_path)
Instead of
if not line:
This should work:
if not ''.join(line).strip():
my suggestion would be to just use the csv reader who can delimite the file into rows. Like this you can just check whether the row is empty and if so just continue.
import csv
with open('some.csv', 'r') as csvfile:
# the delimiter depends on how your CSV seperates values
csvReader = csv.reader(csvfile, delimiter = '\t')
for row in csvReader:
# check if row is empty
if not (row):
continue
You can always check for the number of comma separated values. It seems to be much more productive and efficient.
When reading the lines iteratively, as these are a list of comma separated values you would be getting a list object. So if there is no element (blank link), then we can make it skip.
with open(filename) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
for row in csv_reader:
if len(row) == 0:
continue
You can strip leading and trailing whitespace, and if the length is zero after that the line is empty.
import csv
with open('userlist.csv') as f:
reader = csv.reader(f)
user_header = next(reader) # Add this line if there the header is
user_list = [] # Create a new user list for input
for row in reader:
if any(row): # Pick up the non-blank row of list
print (row) # Just for verification
user_list.append(row) # Compose all the rest data into the list
This example just prints the data in array form while skipping the empty lines:
import csv
file = open("data.csv", "r")
data = csv.reader(file)
for line in data:
if line: print line
file.close()
I find it much clearer than the other provided examples.
import csv
ifile=csv.reader(open('C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv', 'rb'),delimiter=';')
for line in ifile:
if set(line).pop()=='':
pass
else:
for cell_value in line:
print cell_value
I'm writing a program that reads names and statistics related to those names from a file. Each line of the file is another person and their stats. For each person, I'd like to make their last name a key and everything else linked to that key in the dictionary. The program first stores data from the file in an array and then I'm trying to get those array elements into the dictionary, but I'm not sure how to do that. Plus I'm not sure if each time the for loop iterates, it will overwrite the previous contents of the dictionary. Here's the code I'm using to attempt this:
f = open("people.in", "r")
tmp = None
people
l = f.readline()
while l:
tmp = l.split(',')
print tmp
people = {tmp[2] : tmp[0])
l = f.readline()
people['Smith']
The error I'm currently getting is that the syntax is incorrect, however I have no idea how to transfer the array elements into the dictionary other than like this.
Use key assignment:
people = {}
for line in f:
tmp = l.rstrip('\n').split(',')
people[tmp[2]] = tmp[0]
This loops over the file object directly, no need for .readline() calls here, and removes the newline.
You appear to have CSV data; you could also use the csv module here:
import csv
people = {}
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
people[row[2]] = row[0]
or even a dict comprehension:
import csv
with open("people.in", "rb") as f:
reader = csv.reader(f)
people = {r[2]: r[0] for r in reader}
Here the csv module takes care of the splitting and removing newlines.
The syntax error stems from trying close the opening { with a ) instead of }:
people = {tmp[2] : tmp[0]) # should be }
If you need to collect multiple entries per row[2] value, collect these in a list; a collections.defaultdict instance makes that easier:
import csv
from collections import defaultdict
people = defaultdict(list)
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
people[row[2]].append(row[0])
In repsonse to Generalkidd's comment above, multiple people with the same last time, an addition to Martijn Pieter's solution, posted as an answer for better formatting:
import csv
people = {}
with open("people.in", "rb") as f:
reader = csv.reader(f)
for row in reader:
if not row[2] in people:
people[row[2]] = list()
people[row[2]].append(row[0])
This is my code i am able to print each line but when blank line appears it prints ; because of CSV file format, so i want to skip when blank line appears
import csv
import time
ifile = open ("C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv", "rb")
for line in csv.reader(ifile):
if not line:
empty_lines += 1
continue
print line
If you want to skip all whitespace lines, you should use this test: ' '.isspace().
Since you may want to do something more complicated than just printing the non-blank lines to the console(no need to use CSV module for that), here is an example that involves a DictReader:
#!/usr/bin/env python
# Tested with Python 2.7
# I prefer this style of importing - hides the csv module
# in case you do from this_file.py import * inside of __init__.py
import csv as _csv
# Real comments are more complicated ...
def is_comment(line):
return line.startswith('#')
# Kind of sily wrapper
def is_whitespace(line):
return line.isspace()
def iter_filtered(in_file, *filters):
for line in in_file:
if not any(fltr(line) for fltr in filters):
yield line
# A dis-advantage of this approach is that it requires storing rows in RAM
# However, the largest CSV files I worked with were all under 100 Mb
def read_and_filter_csv(csv_path, *filters):
with open(csv_path, 'rb') as fin:
iter_clean_lines = iter_filtered(fin, *filters)
reader = _csv.DictReader(iter_clean_lines, delimiter=';')
return [row for row in reader]
# Stores all processed lines in RAM
def main_v1(csv_path):
for row in read_and_filter_csv(csv_path, is_comment, is_whitespace):
print(row) # Or do something else with it
# Simpler, less refactored version, does not use with
def main_v2(csv_path):
try:
fin = open(csv_path, 'rb')
reader = _csv.DictReader((line for line in fin if not
line.startswith('#') and not line.isspace()),
delimiter=';')
for row in reader:
print(row) # Or do something else with it
finally:
fin.close()
if __name__ == '__main__':
csv_path = "C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv"
main_v1(csv_path)
print('\n'*3)
main_v2(csv_path)
Instead of
if not line:
This should work:
if not ''.join(line).strip():
my suggestion would be to just use the csv reader who can delimite the file into rows. Like this you can just check whether the row is empty and if so just continue.
import csv
with open('some.csv', 'r') as csvfile:
# the delimiter depends on how your CSV seperates values
csvReader = csv.reader(csvfile, delimiter = '\t')
for row in csvReader:
# check if row is empty
if not (row):
continue
You can always check for the number of comma separated values. It seems to be much more productive and efficient.
When reading the lines iteratively, as these are a list of comma separated values you would be getting a list object. So if there is no element (blank link), then we can make it skip.
with open(filename) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
for row in csv_reader:
if len(row) == 0:
continue
You can strip leading and trailing whitespace, and if the length is zero after that the line is empty.
import csv
with open('userlist.csv') as f:
reader = csv.reader(f)
user_header = next(reader) # Add this line if there the header is
user_list = [] # Create a new user list for input
for row in reader:
if any(row): # Pick up the non-blank row of list
print (row) # Just for verification
user_list.append(row) # Compose all the rest data into the list
This example just prints the data in array form while skipping the empty lines:
import csv
file = open("data.csv", "r")
data = csv.reader(file)
for line in data:
if line: print line
file.close()
I find it much clearer than the other provided examples.
import csv
ifile=csv.reader(open('C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv', 'rb'),delimiter=';')
for line in ifile:
if set(line).pop()=='':
pass
else:
for cell_value in line:
print cell_value