Here is info from the .txt file I am trying to access:
Movies: Drama
Possession, 2002
The Big Chill, 1983
Crimson Tide, 1995
Here is my code:
fp = open("Movies.txt", "r")
lines = fp.readlines()
for line in lines:
values = line.split(", ")
year = int(values[1])
if year < 1990:
print(values[0])
I get an error message "IndexError: list index out of range". Please explain why or how I can fix this. Thank you!
Assuming your .txt file includes the "Movies: Drama" line, as you listed, it's because the first line of the text file has no comma in it. Therefore splitting that first line on a comma only results in 1 element (element 0), NOT 2, and therefore there is no values[1] for the first line.
It's not unusual for data files to have a header line that doesn't contain actual data. Import modules like Pandas will typically handle this automatically, but open() and readlines() don't differentiate.
The easiest thing to do is just slice your list variable (lines) so you don't include the first line in your loop:
fp = open("Movies.txt", "r")
lines = fp.readlines()
for line in lines[1:]:
values = line.split(", ")
year = int(values[1])
if year < 1990:
print(values[0])
Note the "lines[1:]" modification. This way you only loop starting from the second line (the first line is lines[0]) and go to the end.
The first line of the text file does not have a ", ", so when you split on it, you get a list of size 1. When you access the 2nd element with values[1] then you are accessing outside the length of the array, hence the IndexError. You need to do a check on the line before making the assumption about the size of the list. Some options:
Check the length of values and continue if it's too short.
Check that ', ' is in the line before splitting on it.
Use a regex which will ensure the ', ' is there as well as can ensure that the contents after the comma represent a number.
Preemptively strip off the first line in lines if you know that it's the header.
Your first line of your txt file has wrong index
Just simple change your code to:
fp = open("Movies.txt", "r")
lines = fp.readlines()
for line in lines:
try: #<---- Here
values = line.split(", ")
year = int(values[1])
if year < 1990:
print(values[0])
except: #<--------And here
pass
Related
I have a CSV file that has errors. The most common one is a too early linebreak.
But now I don't know how to remove it ideally. If I read the line by line
with open("test.csv", "r") as reader:
test = reader.read().splitlines()
the wrong structure is already in my variable. Is this still the right approach and do I use a for loop over test and create a copy or can I manipulate directly in the test variable while iterating over it?
I can identify the corrupt lines by the semikolon, some rows end with a ; others start with it. So maybe counting would be an alternative way to solve it?
EDIT:
I replaced reader.read().splitlines() with reader.readlines() so I could handle the rows which end with a ;
for line in lines:
if("Foobar" in line):
line = line.replace("Foobar", "")
if(";\n" in line):
line = line.replace(";\n", ";")
The only thing that remains are rows that beginn with a ;
Since I need to go back one entry in the list
Example:
Col_a;Col_b;Col_c;Col_d
2021;Foobar;Bla
;Blub
Blub belongs in the row above.
Here's a simple Python script to merge lines until you have the desired number of fields.
import sys
sep = ';'
fields = 4
collected = []
for line in sys.stdin:
new = line.rstrip('\n').split(sep)
if collected:
collected[-1] += new[0]
collected.extend(new[1:])
else:
collected = new
if len(collected) < fields:
continue
print(';'.join(collected))
collected = []
This simply reads from standard input and prints to standard output. If the last line is incomplete, it will be lost.
The separator and the number of fields can be edited into the variables at the top; exposing these as command-line parameters left as an exercise.
If you wanted to keep the newlines, it would not be too hard to only strip a newline from the last fields, and use csv.writer to write the fields back out as properly quoted CSV.
This is how I deal with this. This function fixes the line if there are more columns than needed or if there is a line break in the middle.
Parameters of the function are:
message - content of the file - reader.read() in your case
columns - number of expected columns
filename - filename (I use it for logging)
def pre_parse(message, columns, filename):
parsed_message=[]
i =0
temp_line =''
for line in message.splitlines():
#print(line)
split = line.split(',')
if len(split) == columns:
parsed_message.append(line)
elif len(split) > columns:
print(f'Line {i} has been truncated in file {filename} - too much columns'))
split = split[:columns]
line = ','.join(split)
parsed_message.append(line)
elif len(split) < columns and temp_line =='':
temp_line = line.replace('\n','')
print(temp_line)
elif temp_line !='':
line = temp_line+line
if line.count(',') == columns-1:
print((f'Line {i} has been fixed in file {filename} - extra line feed'))
parsed_message.append(line)
temp_line =''
else:
temp_line=line.replace('\n', '')
i+=1
return parsed_message
make sure you use proper split character and proper line feed characer.
I am trying to find the characters in one specific line of my code. Say the line is 4.
My file consists of:
1. randomname
2. randomname
3.
4. 34
5. 12202018
My code consists of:
with open('/Users/eviemcmahan/PycharmProjects/eCYBERMISSION/eviemcmahan', 'r') as my_file:
data = my_file.readline(4)
characters = 0
for data in my_file:
words = data.split(" ")
for i in words:
characters += len(i)
print(characters)
I am not getting an error, I am just getting the number "34"
I would appreciate any help on how to get a correct amount of characters for line 4.
my_file.readline(4) does not read the 4th line, instead reads the next line but only the 4 firsts characters. To read a specific line you need to, for example, read all the lines and put them in a list. Then is easy to get the line you want. You could also read line by line and stop whenever you find yourself in the line you desired.
Going with the first approach and using the count method of strings, it is straight-forward to count any character at a specific line. For example:
line_number = 3 # Starts with 0
with open('test.txt', 'r') as my_file:
lines = my_file.readlines() # List containing all the lines as elements of the list
print(lines[line_number ].count('0')) # 0
print(lines[line_number ].count('4')) # 2
Please see following attached image showing the format of the text file. I need to extract the dimensions of data matrix indicated by the first line in the file, here 49 * 70 * 1 for the case shown by the image. Note that the length of name "gd_fac" can be varying. How can I extract these numbers as integers? I am using Python 3.6.
Specification is not very clear. I am assuming that the information you want will always be in the first line, and always be in parenthesis. After that:
with open(filename) as infile:
line = infile.readline()
string = line[line.find('(')+1:line.find(')')]
lst = string.split('x')
This will create the list lst = [49, 70, 1].
What is happening here:
First I open the file (you will need to replace filename with the name of your file, as a string. The with ... as ... structure ensures that the file is closed after use. Then I read the first line. After that. I select only the parts of that line that fall after the open paren (, and before the close paren ). Finally, I break the string into parts, with the character x as the separator. This creates a list that contains the values in the first line of the file, which fall between parenthesis, and are separated by x.
Since you have mentioned that length of 'gd_fac' van be variable, best solution will be using Regular Expression.
import re
with open("a.txt") as fh:
for line in fh:
if '(' in line and ')' in line:
dimension = re.findall(r'.*\((.*)\)',line)[0]
break
print dimension
Output:
'49x70x1'
What this does is it looks for "gd_fac"
then if it's there is removes all the unneeded stuff and replaces it with just what you want.
with open('test.txt', 'r') as infile:
for line in infile:
if("gd_fac" in line):
line = line.replace("gd_fac", "")
line = line.replace("x", "*")
line = line.replace("(","")
line = line.replace(")","")
print (line)
break
OUTPUT: "49x70x1"
My program is supposed to take input from the user and read a file with the name input. Read file gets saved into a dictionary called portfolio and from there all I have to do is sort each line in the portfolio into keys and values.
Here's my code.
portfolio = {}
portfolio = file_read() #Reads the file through a function
if file_empty(portfolio) == True or None: #nevermind this, it works
print "The file was not found."
else:
print "The file has successfully been loaded"
for line in portfolio:
elements = line.strip().split(",") #separate lists by comma
print elements[0] #using this to check
print elements[1] #if it works at all
All this does is print the first letter in the first line, which is S. And apparently elements[1] is supposed to be the second letter but index is out of range, please enlighten me what might be wrong.
Thank you.
It looks like file_read() is reading the file into a string.
Then for line in portfolio: is iterating through each character in that string.
Then elements = line.strip().split(",") will give you a list containing one character, so trying to get elements[1] is past the bounds of the list.
If you want to read the whole contents of the file into a string called portfolio, you can iterate through each line in the string using
for line in porfolio.split('\n'):
...
But the more usual way of iterating through lines in a file would be
with open(filename,'r') as inputfile:
for line in inputfile:
....
Got it to work with this code:
for line in minfil :
line = line.strip()
elements = line.split(",")
portfolio[str(elements[0])] = [(int(elements[1]),float(elements[2]), str(elements[3]))]
I recently asked a question about converting list of values from txt file to dictionary list. You can see it from the link here: See my question here
P883, Michael Smith, 1991
L672, Jane Collins, 1992(added)(empty line here)
L322, Randy Green, 1992
H732, Justin Wood, 1995(/added)
^key ^name ^year of birth
===============
this question has been answered and i used the following code (accepted answer) which works perfectly:
def load(filename):
students = {}
infile = open(filename)
for line in infile:
line = line.strip()
parts = [p.strip() for p in line.split(",")]
students[parts[0]] = (parts[1], parts[2])
return students
however when there is a line space in the values from the txt file.. (see added parts) it doesnt work anymore and gives an error saying that list index is out of range.
Check for empty lines inside your for-loop and skip them:
for line in infile:
line = line.strip()
if not line:
continue
parts = [p.strip() for p in line.split(",")]
students[parts[0]] = (parts[1], parts[2])
Check for an empty line by either counting the elements of parts (if there are zero (or, in general, less than three) elements in parts, the line was empty or at least invalid). or by checking the trimmed value of line against the empty string. (Sorry, I can't code Python, so no code sample here...)
Remember: You should always check the size of a dynamically created array before indexing it.
lines = [line.split(', ') for line in file if line]
result = dict([(list[0], element_list[1:]) for line in lines if line])
It's really straight forward to check the line for emptyness or length 0:
for line in infile:
line = line.strip()
if line:
do_something()
# or
if len(line) > 0:
do_something()