Hi I am trying to read a csv file using the following code. I want to read from n th line to m th line of the csv file provided. As a example I want to start reading from 10th line to 100 line and after that start from 500th line to 1000th line. I give those parameters using start and end variables.
The problem that it always start from the beginning regardless the start and end variables. i tried and tried for a solution but failed.Can anyone help me to figure out the issue here.? Thanks a lot! (there are some duplicate questions but no one seems to have given a solution)
import csv
import os
with open('file.csv','r') as csvfile:
start=10
end=100
csvfile.seek(start)
r= csv.reader(csvfile)
r.next()
for i in range(start,end):
try:
url=r.next()[2]
print url
except IndexError,e:
print str(e),
except ValueError,b:
print b
csvfile.close()
Use the csv module.
import csv
n = 3
m = 5
read = 0
with open("so.csv") as csvfile:
reader = csv.reader(csvfile)
for record in reader:
read += 1
if read >= n and read <= m:
print(record)
You can iterate through csv file lines and get lines you want like this:
import csv
def read_lines(file_name, start, end):
with open(file_name, 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in csvreader:
if csvreader.line_num >= start and csvreader.line_num <= end:
print ', '.join(row)
else:
continue
read_lines('test.csv', 10,12)
Update:
From documentation: csvreader.line_num: The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines.
Related
I created a program to create a csv where every number from 0 to 1000000
import csv
nums = list(range(0,1000000))
with open('codes.csv', 'w') as f:
writer = csv.writer(f)
for val in nums:
writer.writerow([val])
then another program to remove a number from the file taken as input
import csv
import os
while True:
members= input("Please enter a number to be deleted: ")
lines = list()
with open('codes.csv', 'r') as readFile:
reader = csv.reader(readFile)
for row in reader:
if all(field != members for field in row):
lines.append(row)
else:
print('Removed')
os.remove('codes.csv')
with open('codes.csv', 'w') as writeFile:
writer = csv.writer(writeFile)
writer.writerows(lines)
The above code is working fine on any other device except my pc, in the first program it creates the csv file with empty rows between every number, in the second program the number of empty rows multiplies and the file size also multiples.
what is wrong with my device then?
Thanks in advance
I think you shouldn't use a csv file for single column data. Use a json file instead.
And the code that you've written for checking which value to not remove, is unnecessary. Instead you could write a list of numbers to the file, and read it back to a variable while removing a number you desire to, using the list.remove() method.
And then write it back to the file.
Here's how I would've done it:
import json
with open("codes.json", "w") as f: # Write the numbers to the file
f.write(json.dumps(list(range(0, 1000000))))
nums = None
with open("codes.json", "r") as f: # Read the list in the file to nums
nums = json.load(f)
to_remove = int(input("Number to remove: "))
nums.remove(to_remove) # Removes the number you want to
with open("codes.json", "w") as f: # Dump the list back to the file
f.write(json.dumps(nums))
Seems like you have different python versions.
There is a difference between built-in python2 open() and python3 open(). Python3 defaults to universal newlines mode, while python2 newlines depends on mode argument open() function.
CSV module docs provides a few examples where open() called with newline argument explicitly set to empty string newline='':
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
Try to do the same. Probably without explicit newline='' your writerows calls add one more newline character.
CSV file from English - Comma-Separated Values, you have a record with spaces.
To remove empty lines - when opening a file for writing, add newline="".
Since this format is tabular data, you cannot simply delete the element, the table will go. It is necessary to insert an empty string or "NaN" instead of the deleted element.
I reduced the number of entries and made them in the form of a table for clarity.
import csv
def write_csv(file, seq):
with open(file, 'w', newline='') as f:
writer = csv.writer(f)
for val in seq:
writer.writerow([v for v in val])
nums = ((j*10 + i for i in range(0, 10)) for j in range(0, 10))
write_csv('codes.csv', nums)
nums_new = []
members = input("Please enter a number, from 0 to 100, to be deleted: ")
with open('codes.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
rows_new = []
for elem in row:
if elem == members:
elem = ""
rows_new.append(elem)
nums_new.append(rows_new)
write_csv('codesdel.csv', nums_new)
This is my code i am able to print each line but when blank line appears it prints ; because of CSV file format, so i want to skip when blank line appears
import csv
import time
ifile = open ("C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv", "rb")
for line in csv.reader(ifile):
if not line:
empty_lines += 1
continue
print line
If you want to skip all whitespace lines, you should use this test: ' '.isspace().
Since you may want to do something more complicated than just printing the non-blank lines to the console(no need to use CSV module for that), here is an example that involves a DictReader:
#!/usr/bin/env python
# Tested with Python 2.7
# I prefer this style of importing - hides the csv module
# in case you do from this_file.py import * inside of __init__.py
import csv as _csv
# Real comments are more complicated ...
def is_comment(line):
return line.startswith('#')
# Kind of sily wrapper
def is_whitespace(line):
return line.isspace()
def iter_filtered(in_file, *filters):
for line in in_file:
if not any(fltr(line) for fltr in filters):
yield line
# A dis-advantage of this approach is that it requires storing rows in RAM
# However, the largest CSV files I worked with were all under 100 Mb
def read_and_filter_csv(csv_path, *filters):
with open(csv_path, 'rb') as fin:
iter_clean_lines = iter_filtered(fin, *filters)
reader = _csv.DictReader(iter_clean_lines, delimiter=';')
return [row for row in reader]
# Stores all processed lines in RAM
def main_v1(csv_path):
for row in read_and_filter_csv(csv_path, is_comment, is_whitespace):
print(row) # Or do something else with it
# Simpler, less refactored version, does not use with
def main_v2(csv_path):
try:
fin = open(csv_path, 'rb')
reader = _csv.DictReader((line for line in fin if not
line.startswith('#') and not line.isspace()),
delimiter=';')
for row in reader:
print(row) # Or do something else with it
finally:
fin.close()
if __name__ == '__main__':
csv_path = "C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv"
main_v1(csv_path)
print('\n'*3)
main_v2(csv_path)
Instead of
if not line:
This should work:
if not ''.join(line).strip():
my suggestion would be to just use the csv reader who can delimite the file into rows. Like this you can just check whether the row is empty and if so just continue.
import csv
with open('some.csv', 'r') as csvfile:
# the delimiter depends on how your CSV seperates values
csvReader = csv.reader(csvfile, delimiter = '\t')
for row in csvReader:
# check if row is empty
if not (row):
continue
You can always check for the number of comma separated values. It seems to be much more productive and efficient.
When reading the lines iteratively, as these are a list of comma separated values you would be getting a list object. So if there is no element (blank link), then we can make it skip.
with open(filename) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
for row in csv_reader:
if len(row) == 0:
continue
You can strip leading and trailing whitespace, and if the length is zero after that the line is empty.
import csv
with open('userlist.csv') as f:
reader = csv.reader(f)
user_header = next(reader) # Add this line if there the header is
user_list = [] # Create a new user list for input
for row in reader:
if any(row): # Pick up the non-blank row of list
print (row) # Just for verification
user_list.append(row) # Compose all the rest data into the list
This example just prints the data in array form while skipping the empty lines:
import csv
file = open("data.csv", "r")
data = csv.reader(file)
for line in data:
if line: print line
file.close()
I find it much clearer than the other provided examples.
import csv
ifile=csv.reader(open('C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv', 'rb'),delimiter=';')
for line in ifile:
if set(line).pop()=='':
pass
else:
for cell_value in line:
print cell_value
This is my code i am able to print each line but when blank line appears it prints ; because of CSV file format, so i want to skip when blank line appears
import csv
import time
ifile = open ("C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv", "rb")
for line in csv.reader(ifile):
if not line:
empty_lines += 1
continue
print line
If you want to skip all whitespace lines, you should use this test: ' '.isspace().
Since you may want to do something more complicated than just printing the non-blank lines to the console(no need to use CSV module for that), here is an example that involves a DictReader:
#!/usr/bin/env python
# Tested with Python 2.7
# I prefer this style of importing - hides the csv module
# in case you do from this_file.py import * inside of __init__.py
import csv as _csv
# Real comments are more complicated ...
def is_comment(line):
return line.startswith('#')
# Kind of sily wrapper
def is_whitespace(line):
return line.isspace()
def iter_filtered(in_file, *filters):
for line in in_file:
if not any(fltr(line) for fltr in filters):
yield line
# A dis-advantage of this approach is that it requires storing rows in RAM
# However, the largest CSV files I worked with were all under 100 Mb
def read_and_filter_csv(csv_path, *filters):
with open(csv_path, 'rb') as fin:
iter_clean_lines = iter_filtered(fin, *filters)
reader = _csv.DictReader(iter_clean_lines, delimiter=';')
return [row for row in reader]
# Stores all processed lines in RAM
def main_v1(csv_path):
for row in read_and_filter_csv(csv_path, is_comment, is_whitespace):
print(row) # Or do something else with it
# Simpler, less refactored version, does not use with
def main_v2(csv_path):
try:
fin = open(csv_path, 'rb')
reader = _csv.DictReader((line for line in fin if not
line.startswith('#') and not line.isspace()),
delimiter=';')
for row in reader:
print(row) # Or do something else with it
finally:
fin.close()
if __name__ == '__main__':
csv_path = "C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv"
main_v1(csv_path)
print('\n'*3)
main_v2(csv_path)
Instead of
if not line:
This should work:
if not ''.join(line).strip():
my suggestion would be to just use the csv reader who can delimite the file into rows. Like this you can just check whether the row is empty and if so just continue.
import csv
with open('some.csv', 'r') as csvfile:
# the delimiter depends on how your CSV seperates values
csvReader = csv.reader(csvfile, delimiter = '\t')
for row in csvReader:
# check if row is empty
if not (row):
continue
You can always check for the number of comma separated values. It seems to be much more productive and efficient.
When reading the lines iteratively, as these are a list of comma separated values you would be getting a list object. So if there is no element (blank link), then we can make it skip.
with open(filename) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=",")
for row in csv_reader:
if len(row) == 0:
continue
You can strip leading and trailing whitespace, and if the length is zero after that the line is empty.
import csv
with open('userlist.csv') as f:
reader = csv.reader(f)
user_header = next(reader) # Add this line if there the header is
user_list = [] # Create a new user list for input
for row in reader:
if any(row): # Pick up the non-blank row of list
print (row) # Just for verification
user_list.append(row) # Compose all the rest data into the list
This example just prints the data in array form while skipping the empty lines:
import csv
file = open("data.csv", "r")
data = csv.reader(file)
for line in data:
if line: print line
file.close()
I find it much clearer than the other provided examples.
import csv
ifile=csv.reader(open('C:\Users\BKA4ABT\Desktop\Test_Specification\RDBI.csv', 'rb'),delimiter=';')
for line in ifile:
if set(line).pop()=='':
pass
else:
for cell_value in line:
print cell_value
I have a simple text file which contains numbers in ASCII text separated by spaces as per this example.
150604849
319865.301865 5810822.964432 -96.425797 -1610
319734.172256 5810916.074753 -52.490280 -122
319730.912949 5810918.098465 -61.864395 -171
319688.240891 5810889.851608 -0.339890 -1790
*<continues like this for millions of lines>*
basically I want to copy the first line as is, then for all following lines I want to offset the first value (x), offset the second value (y), leave the third value unchanged and offset and half the last number.
I've cobbled together the following code as a python learning experience (apologies if it crude and offensive, truly I mean no offence) and it works ok. However the input file I'm using it on is several GB in size and I'm wondering if there's ways to speed up the execution. Currently for a 740 MB file it takes 2 minutes 21 seconds
import glob
#offset values
offsetx = -306000
offsety = -5806000
files = glob.glob('*.pts')
for file in files:
currentFile = open(file, "r")
out = open(file[:-4]+"_RGB_moved.pts", "w")
firstline = str(currentFile.readline())
out.write(str(firstline.split()[0]))
while 1:
lines = currentFile.readlines(100000)
if not lines:
break
for line in lines:
out.write('\n')
words = line.split()
newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), str(float(words[2])), str((int(words[3])+2050)/2)]
out.write(" ".join(newwords))
Many thanks
Don't use .readlines(). Use the file directly as an iterator:
for file in files:
with open(file, "r") as currentfile, open(file[:-4]+"_RGB_moved.pts", "w") as out:
firstline = next(currentFile)
out.write(firstline.split(None, 1)[0])
for line in currentfile:
out.write('\n')
words = line.split()
newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), words[2], str((int(words[3]) + 2050) / 2)]
out.write(" ".join(newwords))
I also added a few Python best-practices, and you don't need to turn words[2] into a float, then back to a string again.
You could also look into using the csv module, it can handle splitting and rejoining lines in C code:
import csv
for file in files:
with open(file, "rb") as currentfile, open(file[:-4]+"_RGB_moved.pts", "wb") as out:
reader = csv.reader(currentfile, delimiter=' ', quoting=csv.QUOTE_NONE)
writer = csv.writer(out, delimiter=' ', quoting=csv.QUOTE_NONE)
out.writerow(next(reader)[0])
for row in reader:
newrow = [str(float(row[0])+offsetx), str(float(row[1])+offsety), row[2], str((int(row[3]) + 2050) / 2)]
out.writerow(newrow)
Use thé CSV package. It may be more optimized than your script and will simplify your code.
I've got a text file that is tab delimited and I'm trying to figure out how to search for a value in a specific column in this file.
I think i need to use the csv import but have been unsuccessful so far. Can someone point me in the right direction?
Thanks!
**Update**
Thanks for everyone's updates. I know I could probably use awk for this but simply for practice, I am trying to finish it in python.
I am getting the following error now:
if row.split(' ')[int(searchcolumn)] == searchquery:
IndexError: list index out of range
And here is the snippet of my code:
#open the directory and find all the files
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file, 'r')
lines=f.readlines()
for line in lines:
#the first 4 lines of the file are crap, skip them
if linescounter > startfromline:
with open(file) as infile:
for row in infile:
if row.split(' ')[int(searchcolumn)] == searchquery:
rfile = open(resultsfile, 'a')
rfile.writelines(line)
rfile.write("\r\n")
print "Writing line -> " + line
resultscounter += 1
linescounter += 1
f.close()
I am taking both searchcolumn and searchquery as raw_input from the user. Im guessing the reason I am getting the list out of range now, is because it's not parsing the file correctly?
Thanks again.
You can also use the sniffer (example taken from http://docs.python.org/library/csv.html)
csvfile = open("example.csv", "rb")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
Yes, you'll want to use the csv module, and you'll want to set delimiter to '\t':
spamReader = csv.reader(open('spam.csv', 'rb'), delimiter='\t')
After that you should be able to iterate:
for row in spamReader:
print row[n]
This prints all rows in filename with 'myvalue' in the fourth tab-delimited column:
with open(filename) as infile:
for row in infile:
if row.split('\t')[3] == 'myvalue':
print row
Replace 3, 'myvalue', and print as appropriate.