Splitting uneven spaced column in Python

Splitting uneven spaced column in Python - python

I tried to use the below program
import os
HOME= os.getcwd()
STORE_INFO_FILE = os.path.join(HOME,'storeInfo')
def searchStr(STORE_INFO_FILE, storeId):
with open (STORE_INFO_FILE, 'r') as storeInfoFile:
for storeLine in storeInfoFile:
## print storeLine.split(r'\s+')[0]
if storeLine.split()[0] == storeId:
print storeLine
searchStr(STORE_INFO_FILE, 'Star001')
An example line in the file:
Star001 Sunnyvale 9.00 USD Los_angeles/America sunnvaleStarb#startb.com
But it gives the below error
./searchStore.py Traceback (most recent call last): File
"./searchStore.py", line 21, in
searchStr(STORE_INFO_FILE, 'Star001') File "./searchStore.py", line 17, in searchStr
if storeLine.split()[0] == storeId: IndexError: list index out of range
I have tried printing using split function on the command line and I was able to print it.

It looks like you have an empty or blank line in your file:
>>> 'abc def hij\n'.split()
['abc', 'def', 'hij']
>>> ' \n'.split() # a blank line containing white space
[]
>>> '\n'.split() # an empty line
[]
The last 2 cases show that an empty list can be returned by split(). Trying to index that list raises an exception:
>>> '\n'.split()[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
You can fix the problem by checking for empty and blank lines. Try this code:
def searchStr(store_info_file, store_id):
with open (store_info_file) as f:
for line in f:
if line.strip() and (line.split()[0] == store_id):
print line
Adding line.strip() allows you to ignore empty lines and lines containing only whitespace.

Code has an issue if split method returns an empty list.
You can change code that calls split method and add error handling code.
Following can be done
storeLineWords = storeLine.split()
if len(storeLineWords) > 0 and storeLineWords[0] == storeId:

Related

Python: I'm getting an error saying that the list index in out of range, when isn't [duplicate]

I'm writing a simple script that is trying to extract the first element from the second column of a .txt input file.
import sys
if (len(sys.argv) > 1):
f = open(sys.argv[1], "r");
print "file opened";
line = [];
for line in f:
line = line.strip("\n ' '")
line = line.split(",")
print line[1]
f.close();
My input file looks like this:
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.567,-0.042,-0.893,0.333''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.590,-0.036,-0.905,0.273''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.611,-0.046,-0.948,0.204''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.631,-0.074,-0.978,0.170''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.654,-0.100,-1.006,0.171''
I want my delimiter to be a comma. When I print the length of the line out, I'm getting 5 elements (as expected). However, whenever I try to index the list to extract the data (i.e., when I call print line[1]), I keep getting the following error:
file opened
Traceback (most recent call last):
File "stats.py", line 13, in <module>
print line[1]
IndexError: list index out of range
I don't understand why it's out of range when clearly it isn't.

I would guess you have a blank line somewhere in your file. If it runs through the data and then generates the exception the blank line will be at the end of your file.
Please insert
print len(line), line
before your
print line[1]
as a check to verify if this is the case.
You can always use this construct to test for blank lines and only process/print non-blank lines:
for line in f:
line = line.strip()
if line:
# process/print line further

When you are working with list and trying to get value at particular index, it is always safe to see in index is in the range
if len(list_of_elements) > index:
print list_of_elements[index]
See:
>>> list_of_elements = [1, 2, 3, 4]
>>> len(list_of_elements)
4
>>> list_of_elements[1]
2
>>> list_of_elements[4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>
Now you have to find out why your list did not contain as many elements as you expected
Solution:
import sys
if (len(sys.argv) > 1):
f = open(sys.argv[1], "r")
print "file opened"
for line in f:
line = line.strip().strip('\n')
# Ensure that you are not working on empty line
if line:
data = line.split(",")
# Ensure that index is not out of range
if len(data) > 1: print data[1]
f.close()

you probably have empty line(s) after your data, I ran your test code without them it worked as expected.
$ python t.py t.txt
file opened
389182.567
389182.590
389182.611
389182.631
389182.654
if you don't want to remove them, then simply check for empty lines.
for line in f:
if line.strip(): # strip will remove all leading and trailing whitespace such as '\n' or ' ' by default
line = line.strip("\n ' '")
line = line.split(",")
print line[1]

It can be useful to catch the exception an print the offending lines
for line in f:
line = line.strip("\n ' '")
line = line.split(",")
try:
print line[1]
except IndexError, e:
print e
print "line =", line
raise # if you don't wish to continue

When isn't a list a list?

The following code returns a list, e.g. <class 'list'> in python. Everything I do to access that list fails
indexing list fails,
enumerating list fails
example if I just print(s)
['0.5211', '3.1324']
but if I access the indices
Traceback (most recent call last):
File "parse-epoch.py", line 11, in <module>
print("losses.add({}, {})".format(s[0], s[1]))
IndexError: list index out of range
Why can't I access the elements of the list?
import re
with open('epoch.txt', 'r') as f:
content = f.readlines()
content = [x.strip() for x in content]
for line in content:
s = re.findall("\d+\.\d+", line)
#print(s)
print("losses.add({}, {})".format(s[0], s[1]))

You should check what print(s) outputs again. Your issue is likely with a line where s does not contain a list with 2 values. If those values do not exist, then you cannot use them.

Python - Error while trying to split line of text

I am having as issue while trying to split a line of text I get from .txt file. It is quite a big file, but I will paste only 2 lines, with original text
1307;Własność: udział 1/1<>GMINA TARNOWIEC<><> 211<>30-200 ZipCode;KS1J/00080000/2;861;Własność: udział 1/1<>GMINA TARNOWIEC<><> 211<>30-200 ZipCode;KS1J/00080990/2;
1306;Własność: udział 1/1<>Jan Nowak<>im. rodz.: Tomasz_ Maria<>Somewhere 2<>30-200 ZipCode;KW22222;861;Własność: udział 1/1<>GMINA TARNOWIEC<><>Tarnowiec 211<>30-200 ZipCode;KS1W/00080000/1;
Data I get from this file will be used to create reports, and _ and <> will be used for further formatting. I want to have the line split on ;
Problem is, I am getting error on 2 methods of splitting.
first, the basic .split(';')
dane = open('dane_protokoly.txt', 'r')
for line in dane:
a,b,c,d,e,f,g = line.split(';')
print(a)
print(b)
print(c)
print(d)
print(e)
print(f)
print(g)
I am getting an error after printing the first loop
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\Nowy folder\costam.py", line 36, in <module>
a,b,c,d,e,f,g = line.split(';')
ValueError: not enough values to unpack (expected 7, got 1)
Same with creating lists from this file (list looks like: ['1307', 'Własność: udział 1/1<>GMINA TARNOWIEC<><> 211<>30-200 ZipCode', 'KS1J/00080000/2', '861', 'Własność: udział 1/1<>GMINA TARNOWIEC<><> 211<>30-200 ZipCode', 'KS1J/00080990/2', '']
dane = plik('dane_protokoly.txt')
for line in dane:
a = line[0]
b = line[1]
c = line[2]
d = line[3]
e = line[4]
f = line[5]
g = line[6]
print(str(a))
print(str(b))
print(str(c))
print(str(d))
print(str(e))
print(str(f))
error I get also after properly printing the first line:
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\Nowy folder\costam.py", line 22, in <module>
b = line[1]
IndexError: list index out of range
Any idea why am I getting such errors?

Sometimes line.split(';') not giving 7 values to unpack for (a,b,c,...), So better to iterate like this ,
lst = line.split(';')
for item in lst:
print item
And there is a newline in between that's making the problems for you,
And the syntax that followed is a bad practice
You change your code like this,
for line in open("'dane_protokoly.txt'").read().split('\n'):
lst = line.split(';')
for item in lst:
print item
It's doesn't care about the newlines in between,

As Rahul K P mentioned, the problems are the "empty" lines in between your lines with the data. You should skip them when trying to split your data.
Maybe use this as a starting point:
with open(r"dane_protokoly.txt", "r") as data_file:
for line in data_file:
#skip rows which only contain a newline special char
if len(line)>1:
data_row=line.strip().split(";")
print(data_row)

Your second strategy didn't work because line[0] is essentially the whole line as it includes no spaces and the default is splitting at spaces.
Therefore there is no line[1] or line[2]... and therefore you get a list index out of range error.
I hope this helps. And I hope it solves your problem.

index out of range raises in random function [duplicate]

This question already has answers here:
f.read coming up empty
(5 answers)
Closed 7 years ago.
I have this function which simply opens a text files and read lines:
def select_word(model):
lines = model.read().splitlines()
selectedline = random.choice(lines)
return [selectedline.split(":")[0],selectedline.split(":")[1]]
when I call this function for just one, there is no problem. But when I call it more than once:
print select_word(a)
print select_word(a)
print select_word(a)
print select_word(a)
print select_word(a)
I got this error:
Traceback (most recent call last):
File "wordselect.py", line 58, in <module>
print select_word("noun")
File "wordselect.py", line 19, in select_word
selectedline = random.choice(lines)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/random.py", line 275, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
What is the problem with that function?

import random
def select_word(model):
with open(model, 'r') as f:
lines = f.read().splitlines()
selectedline = random.choice(lines)
return [selectedline.split(":")[0],selectedline.split(":")[1]]
result = select_word('example.txt')
print result
I did this and didnt get a problem.
just make sure that in the file you are opening you have something like.
Line: 1
Line: 2

random.choice raises IndexError if you pass it an empty sequence. This happens when you call .read() on a file object the second time (you can only do it once, subsequent calls will return an empty string).
To fix the function, you could read the file once then pass the lines to the function, e.g.:
lines = list(model)
def select_word(lines):
selectedline = random.choice(lines)
return selectedline.split(":", 1)

File handles operate like generators. Once you have read a file, you have reached the end of stream.
model.seek(0) # bring cursor to start of file after reading, at 2nd line of the function

Python list index out of range on return value of split

I'm writing a simple script that is trying to extract the first element from the second column of a .txt input file.
import sys
if (len(sys.argv) > 1):
f = open(sys.argv[1], "r");
print "file opened";
line = [];
for line in f:
line = line.strip("\n ' '")
line = line.split(",")
print line[1]
f.close();
My input file looks like this:
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.567,-0.042,-0.893,0.333''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.590,-0.036,-0.905,0.273''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.611,-0.046,-0.948,0.204''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.631,-0.074,-0.978,0.170''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.654,-0.100,-1.006,0.171''
I want my delimiter to be a comma. When I print the length of the line out, I'm getting 5 elements (as expected). However, whenever I try to index the list to extract the data (i.e., when I call print line[1]), I keep getting the following error:
file opened
Traceback (most recent call last):
File "stats.py", line 13, in <module>
print line[1]
IndexError: list index out of range
I don't understand why it's out of range when clearly it isn't.

I would guess you have a blank line somewhere in your file. If it runs through the data and then generates the exception the blank line will be at the end of your file.
Please insert
print len(line), line
before your
print line[1]
as a check to verify if this is the case.
You can always use this construct to test for blank lines and only process/print non-blank lines:
for line in f:
line = line.strip()
if line:
# process/print line further

When you are working with list and trying to get value at particular index, it is always safe to see in index is in the range
if len(list_of_elements) > index:
print list_of_elements[index]
See:
>>> list_of_elements = [1, 2, 3, 4]
>>> len(list_of_elements)
4
>>> list_of_elements[1]
2
>>> list_of_elements[4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>
Now you have to find out why your list did not contain as many elements as you expected
Solution:
import sys
if (len(sys.argv) > 1):
f = open(sys.argv[1], "r")
print "file opened"
for line in f:
line = line.strip().strip('\n')
# Ensure that you are not working on empty line
if line:
data = line.split(",")
# Ensure that index is not out of range
if len(data) > 1: print data[1]
f.close()

you probably have empty line(s) after your data, I ran your test code without them it worked as expected.
$ python t.py t.txt
file opened
389182.567
389182.590
389182.611
389182.631
389182.654
if you don't want to remove them, then simply check for empty lines.
for line in f:
if line.strip(): # strip will remove all leading and trailing whitespace such as '\n' or ' ' by default
line = line.strip("\n ' '")
line = line.split(",")
print line[1]

It can be useful to catch the exception an print the offending lines
for line in f:
line = line.strip("\n ' '")
line = line.split(",")
try:
print line[1]
except IndexError, e:
print e
print "line =", line
raise # if you don't wish to continue

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Splitting uneven spaced column in Python - python

Code has an issue if split method returns an empty list. You can change code that calls split method and add error handling code. Following can be done storeLineWords = storeLine.split() if len(storeLineWords) > 0 and storeLineWords[0] == storeId:

Related

Python: I'm getting an error saying that the list index in out of range, when isn't [duplicate]

When isn't a list a list?

Python - Error while trying to split line of text

index out of range raises in random function [duplicate]

Python list index out of range on return value of split

Categories

Resources