Python - Error while trying to split line of text - python

I am having as issue while trying to split a line of text I get from .txt file. It is quite a big file, but I will paste only 2 lines, with original text
1307;Własność: udział 1/1<>GMINA TARNOWIEC<><> 211<>30-200 ZipCode;KS1J/00080000/2;861;Własność: udział 1/1<>GMINA TARNOWIEC<><> 211<>30-200 ZipCode;KS1J/00080990/2;
1306;Własność: udział 1/1<>Jan Nowak<>im. rodz.: Tomasz_ Maria<>Somewhere 2<>30-200 ZipCode;KW22222;861;Własność: udział 1/1<>GMINA TARNOWIEC<><>Tarnowiec 211<>30-200 ZipCode;KS1W/00080000/1;
Data I get from this file will be used to create reports, and _ and <> will be used for further formatting. I want to have the line split on ;
Problem is, I am getting error on 2 methods of splitting.
first, the basic .split(';')
dane = open('dane_protokoly.txt', 'r')
for line in dane:
a,b,c,d,e,f,g = line.split(';')
print(a)
print(b)
print(c)
print(d)
print(e)
print(f)
print(g)
I am getting an error after printing the first loop
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\Nowy folder\costam.py", line 36, in <module>
a,b,c,d,e,f,g = line.split(';')
ValueError: not enough values to unpack (expected 7, got 1)
Same with creating lists from this file (list looks like: ['1307', 'Własność: udział 1/1<>GMINA TARNOWIEC<><> 211<>30-200 ZipCode', 'KS1J/00080000/2', '861', 'Własność: udział 1/1<>GMINA TARNOWIEC<><> 211<>30-200 ZipCode', 'KS1J/00080990/2', '']
dane = plik('dane_protokoly.txt')
for line in dane:
a = line[0]
b = line[1]
c = line[2]
d = line[3]
e = line[4]
f = line[5]
g = line[6]
print(str(a))
print(str(b))
print(str(c))
print(str(d))
print(str(e))
print(str(f))
error I get also after properly printing the first line:
Traceback (most recent call last):
File "C:\Users\Admin\Desktop\Nowy folder\costam.py", line 22, in <module>
b = line[1]
IndexError: list index out of range
Any idea why am I getting such errors?

Sometimes line.split(';') not giving 7 values to unpack for (a,b,c,...), So better to iterate like this ,
lst = line.split(';')
for item in lst:
print item
And there is a newline in between that's making the problems for you,
And the syntax that followed is a bad practice
You change your code like this,
for line in open("'dane_protokoly.txt'").read().split('\n'):
lst = line.split(';')
for item in lst:
print item
It's doesn't care about the newlines in between,

As Rahul K P mentioned, the problems are the "empty" lines in between your lines with the data. You should skip them when trying to split your data.
Maybe use this as a starting point:
with open(r"dane_protokoly.txt", "r") as data_file:
for line in data_file:
#skip rows which only contain a newline special char
if len(line)>1:
data_row=line.strip().split(";")
print(data_row)

Your second strategy didn't work because line[0] is essentially the whole line as it includes no spaces and the default is splitting at spaces.
Therefore there is no line[1] or line[2]... and therefore you get a list index out of range error.
I hope this helps. And I hope it solves your problem.

Related

Python: I'm getting an error saying that the list index in out of range, when isn't [duplicate]

I'm writing a simple script that is trying to extract the first element from the second column of a .txt input file.
import sys
if (len(sys.argv) > 1):
f = open(sys.argv[1], "r");
print "file opened";
line = [];
for line in f:
line = line.strip("\n ' '")
line = line.split(",")
print line[1]
f.close();
My input file looks like this:
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.567,-0.042,-0.893,0.333''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.590,-0.036,-0.905,0.273''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.611,-0.046,-0.948,0.204''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.631,-0.074,-0.978,0.170''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.654,-0.100,-1.006,0.171''
I want my delimiter to be a comma. When I print the length of the line out, I'm getting 5 elements (as expected). However, whenever I try to index the list to extract the data (i.e., when I call print line[1]), I keep getting the following error:
file opened
Traceback (most recent call last):
File "stats.py", line 13, in <module>
print line[1]
IndexError: list index out of range
I don't understand why it's out of range when clearly it isn't.
I would guess you have a blank line somewhere in your file. If it runs through the data and then generates the exception the blank line will be at the end of your file.
Please insert
print len(line), line
before your
print line[1]
as a check to verify if this is the case.
You can always use this construct to test for blank lines and only process/print non-blank lines:
for line in f:
line = line.strip()
if line:
# process/print line further
When you are working with list and trying to get value at particular index, it is always safe to see in index is in the range
if len(list_of_elements) > index:
print list_of_elements[index]
See:
>>> list_of_elements = [1, 2, 3, 4]
>>> len(list_of_elements)
4
>>> list_of_elements[1]
2
>>> list_of_elements[4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>
Now you have to find out why your list did not contain as many elements as you expected
Solution:
import sys
if (len(sys.argv) > 1):
f = open(sys.argv[1], "r")
print "file opened"
for line in f:
line = line.strip().strip('\n')
# Ensure that you are not working on empty line
if line:
data = line.split(",")
# Ensure that index is not out of range
if len(data) > 1: print data[1]
f.close()
you probably have empty line(s) after your data, I ran your test code without them it worked as expected.
$ python t.py t.txt
file opened
389182.567
389182.590
389182.611
389182.631
389182.654
if you don't want to remove them, then simply check for empty lines.
for line in f:
if line.strip(): # strip will remove all leading and trailing whitespace such as '\n' or ' ' by default
line = line.strip("\n ' '")
line = line.split(",")
print line[1]
It can be useful to catch the exception an print the offending lines
for line in f:
line = line.strip("\n ' '")
line = line.split(",")
try:
print line[1]
except IndexError, e:
print e
print "line =", line
raise # if you don't wish to continue

When isn't a list a list?

The following code returns a list, e.g. <class 'list'> in python. Everything I do to access that list fails
indexing list fails,
enumerating list fails
example if I just print(s)
['0.5211', '3.1324']
but if I access the indices
Traceback (most recent call last):
File "parse-epoch.py", line 11, in <module>
print("losses.add({}, {})".format(s[0], s[1]))
IndexError: list index out of range
Why can't I access the elements of the list?
import re
with open('epoch.txt', 'r') as f:
content = f.readlines()
content = [x.strip() for x in content]
for line in content:
s = re.findall("\d+\.\d+", line)
#print(s)
print("losses.add({}, {})".format(s[0], s[1]))
You should check what print(s) outputs again. Your issue is likely with a line where s does not contain a list with 2 values. If those values do not exist, then you cannot use them.

Splitting and adding words from a file into a list, 'str' object cannot be interpreted as an integer error

I am not aware of the cause of this error, but I am trying to take the words within a file, read the lines, split them, and then add those words into a list and sort them. It is simple enough but I seem to be getting an error which states ''str' object cannot be interpreted as an integer' I am not aware of the cause for this error and would appreciate some assistance.
I haven't tried a lot of methods as I was sure this one would work and I don't have a good idea of how to go around it. The file I'm using contains this:
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
here is the code that I am using...
#userin = input("Enter file name: ")
try:
l = [] # empty list
relettter = open('romeo.txt', 'r')
rd = relettter.readlines()
# loops through each line and reads file
for line in rd:
#add line to list
f = line.split(' ', '/n')
l.append(f)
k = set(l.sort())
print(k)
except Exception as e:
print(e)
the results should print a sorted list of the words present in the poem.
Your giant try/except block prevents you from seeing the source of the error. Removing that:
› python romeo.py
Traceback (most recent call last):
File "romeo.py", line 9, in <module>
f = line.split(' ', '/n')
TypeError: 'str' object cannot be interpreted as an integer
You are passing '/n' as the second argument to the split() method, which is an integer maxsplit. Your line
f = line.split(' ', '/n')
does not work because only one string can be used for the split method, e.g.:
f = line.split(' ')
Note also that '\n' is a newline, not '/n'.
The error is caused when you split f = line.split(' ', '/n') instead do this f = line.split('\n')[0].split(' '). Also on the next statement I think you'lled want to extend not append
try:
l = [] # empty list
relettter = open('romeo.txt', 'r')
rd = relettter.readlines()
# loops through each line and reads file
for line in rd:
#add line to list
f = line.split('\n')[0].split(' ') ##<-first error
l.extend(f) ##<- next problem
k = set(sorted(l))
print(k)
except Exception as e:
print(e)
Though, a much better implementation:
l = [] # empty list
with open('romeo.txt') as file:
for line in file:
f = line[:-1].split(' ')
l.extend(f)
k = set(sorted(l))
print(k)
You should probably be using with in this case. It esentially manages your otherwise unmanaged resource. Here is a great explanation on it: What is the python keyword "with" used for?.
As for your problem:
with open(fname, "r") as f:
words = []
for line in f:
line = line.replace('\n', ' ')
for word in line.split(' '):
words.append(word)
This will read the text line by line and splits each line into words. The words are then added into the list.
If you're looking for a shorter version:
with open(fname, "r") as f:
words = [word for word in [line.replace('\n', '').split(' ') for line in f]]
This will give a list of words per sentence, but you can flatten and get all your words that way.

Splitting uneven spaced column in Python

I tried to use the below program
import os
HOME= os.getcwd()
STORE_INFO_FILE = os.path.join(HOME,'storeInfo')
def searchStr(STORE_INFO_FILE, storeId):
with open (STORE_INFO_FILE, 'r') as storeInfoFile:
for storeLine in storeInfoFile:
## print storeLine.split(r'\s+')[0]
if storeLine.split()[0] == storeId:
print storeLine
searchStr(STORE_INFO_FILE, 'Star001')
An example line in the file:
Star001 Sunnyvale 9.00 USD Los_angeles/America sunnvaleStarb#startb.com
But it gives the below error
./searchStore.py Traceback (most recent call last): File
"./searchStore.py", line 21, in
searchStr(STORE_INFO_FILE, 'Star001') File "./searchStore.py", line 17, in searchStr
if storeLine.split()[0] == storeId: IndexError: list index out of range
I have tried printing using split function on the command line and I was able to print it.
It looks like you have an empty or blank line in your file:
>>> 'abc def hij\n'.split()
['abc', 'def', 'hij']
>>> ' \n'.split() # a blank line containing white space
[]
>>> '\n'.split() # an empty line
[]
The last 2 cases show that an empty list can be returned by split(). Trying to index that list raises an exception:
>>> '\n'.split()[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
You can fix the problem by checking for empty and blank lines. Try this code:
def searchStr(store_info_file, store_id):
with open (store_info_file) as f:
for line in f:
if line.strip() and (line.split()[0] == store_id):
print line
Adding line.strip() allows you to ignore empty lines and lines containing only whitespace.
Code has an issue if split method returns an empty list.
You can change code that calls split method and add error handling code.
Following can be done
storeLineWords = storeLine.split()
if len(storeLineWords) > 0 and storeLineWords[0] == storeId:

Python list index out of range on return value of split

I'm writing a simple script that is trying to extract the first element from the second column of a .txt input file.
import sys
if (len(sys.argv) > 1):
f = open(sys.argv[1], "r");
print "file opened";
line = [];
for line in f:
line = line.strip("\n ' '")
line = line.split(",")
print line[1]
f.close();
My input file looks like this:
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.567,-0.042,-0.893,0.333''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.590,-0.036,-0.905,0.273''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.611,-0.046,-0.948,0.204''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.631,-0.074,-0.978,0.170''
Client 192.168.1.13 said ``ACC: d0bb38f18da536aff7b455264eba2f1e35dd976f,389182.654,-0.100,-1.006,0.171''
I want my delimiter to be a comma. When I print the length of the line out, I'm getting 5 elements (as expected). However, whenever I try to index the list to extract the data (i.e., when I call print line[1]), I keep getting the following error:
file opened
Traceback (most recent call last):
File "stats.py", line 13, in <module>
print line[1]
IndexError: list index out of range
I don't understand why it's out of range when clearly it isn't.
I would guess you have a blank line somewhere in your file. If it runs through the data and then generates the exception the blank line will be at the end of your file.
Please insert
print len(line), line
before your
print line[1]
as a check to verify if this is the case.
You can always use this construct to test for blank lines and only process/print non-blank lines:
for line in f:
line = line.strip()
if line:
# process/print line further
When you are working with list and trying to get value at particular index, it is always safe to see in index is in the range
if len(list_of_elements) > index:
print list_of_elements[index]
See:
>>> list_of_elements = [1, 2, 3, 4]
>>> len(list_of_elements)
4
>>> list_of_elements[1]
2
>>> list_of_elements[4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>>
Now you have to find out why your list did not contain as many elements as you expected
Solution:
import sys
if (len(sys.argv) > 1):
f = open(sys.argv[1], "r")
print "file opened"
for line in f:
line = line.strip().strip('\n')
# Ensure that you are not working on empty line
if line:
data = line.split(",")
# Ensure that index is not out of range
if len(data) > 1: print data[1]
f.close()
you probably have empty line(s) after your data, I ran your test code without them it worked as expected.
$ python t.py t.txt
file opened
389182.567
389182.590
389182.611
389182.631
389182.654
if you don't want to remove them, then simply check for empty lines.
for line in f:
if line.strip(): # strip will remove all leading and trailing whitespace such as '\n' or ' ' by default
line = line.strip("\n ' '")
line = line.split(",")
print line[1]
It can be useful to catch the exception an print the offending lines
for line in f:
line = line.strip("\n ' '")
line = line.split(",")
try:
print line[1]
except IndexError, e:
print e
print "line =", line
raise # if you don't wish to continue

Categories