As mentioned, I am a beginner and am trying to do short exercises. Unfortunately my online tutor is not able to or unwilling to help me with this (keeps suggesting other ways of doing it).
My task is to check if the first word for the line is 'From ' in which case I need to print the next word (the email address).
For example the file has series of lines like the following
From stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008
Return-Path: <postmaster#collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
From louis#media.berkeley.edu Fri Jan 4 18:10:48 2008
Return-Path: <postmaster#collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.97])
The code should result with the following output:
stephen.marquard#uct.ac.za
louis#media.berkeley.edu
I have written the following code to do this:
fname = "mbox-short.txt"
f = open(fname,'r')
lines = f.readlines()
i = 0
count = len(lines)
while i < count :
test = lines[i].split()
if test[0] == "From " :
print(test[1])
i += 1
I keep getting the following error:
Traceback (most recent call last):
File "C:\Users\38775\Desktop\py4e\Project 2\email.py", line 10, in <module>
if test[0] == "From " :
IndexError: list index out of range
I just want to understand why this is happening, and how I can correct this. Request you not to take time to share alternatives.
Thanks
An IndexError indicates that you're trying to access some part of a list which doesn't exist (for example, trying to find the 5th value of [1, 2, 3]).
It would be good to know a small part of the contents of your file or an example input and your desired output so we can figure out exactly what's going wrong.
So here is a modification of what you have that works:
fname = "mbox-short.txt"
f = open(fname,'r')
lines = f.readlines()
i = 0
count = len(lines)
while i < count :
test = lines[i].strip().split()
if test[0] == "From":
print(test[1])
i += 1
After you "strip()" you get this "From email#email.com". You need to tack on a ".split()" to split it into a list of two parts like this
['From', 'email#email.com'].
Now if test[0] == "From" (if the first word is From) you can print test[1] which will be the second word (the email).
The ".split()" was your mistake because that's what splits the string by spaces or a different character chosen.
Hope this helps!
Thanks everyone!
Turns out the issue was that there were some blank lines in the file and so I needed to nest an if function to keep it moving.
Thanks!
Related
I don't really know how to word the question, but I have this file with a number and a decimal next to it, like so(the file name is num.txt):
33 0.239
78 0.298
85 1.993
96 0.985
107 1.323
108 1.000
I have this string of numbers that I want to find the certain numbers from the file, take the decimal numbers, and append it to a list:
['78','85','108']
Here is my code so far:
chosen_number = ['78','85','108']
memory_list = []
for line in open(path/to/num.txt):
checker = line[0:2]
if not checker in chosen_number: continue
dec = line.split()[-1]
memory_list.append(float(dec))
The error they give to me is that it is not in a list and they only account for the 3 digit numbers. I don't really understand why this is happening and would like some tips to know how to fix it. Thanks.
As for the error, there is no actual error. The only problem is that they ignore the two digit numbers and only get the three digit numbers. I want them to get both the 2 and 3 digit numbers. For example, the script would pass 78 and 85, going to the line with '108'.
Your checker is undefined. The below code works.
N.B. I have used startswith because, the number might appear elsewhere in the line.
chosen_number = ['78','85','108']
memory_list = []
with open('path/to/num.txt') as f:
for line in f:
if any(line.startswith(i) for i in chosen_number):
memory_list.append(float(line.split()[1]))
print(memory_list)
Output:
[0.298, 1.993, 1.0]
The following would should work:
chosen_number = ['78','85','108']
memory_list = []
with open('num.txt') as f_input:
for line in f_input:
v1, v2 = line.split()
if v1 in chosen_number:
memory_list.append(float(v2))
print memory_list
Giving you:
[0.298, 1.993, 1.0]
Also, it is better to use a with statement when dealing with files so that the file is automatically closed afterwards.
Try to use this code:
chosen_number = ['78 ', '85 ', '108 ']
memory_list = []
for line in open("num.txt"):
for num in chosen_number:
if num in line:
dec = line.split()[-1]
memory_list.append(float(dec))
In chosen number, I declared numbers with a space after: '85 '. Otherwise when 0.985 is found, the if condition would be true, as they're used as string. I hope, I'm clear enough.
I am trying to read from a file and return solutions based on the problem that the user inputs. I have saved the text file in the same location, that is not an issue. At the moment, the program just crashes when I run it and type a problem eg "screen".
Code
file = open("solutions.txt", 'r')
advice = []
read = file.readlines()
file.close()
print (read)
for i in file:
indword = i.strip()
advice.append (indword)
lst = ("screen","unresponsive","frozen","audio")
favcol = input("What is your problem? ")
probs = []
for col in lst:
if col in lst:
probs.append(col)
for line in probs:
for solution in advice:
if line in solution:
print(solution)
The text file called "solutions.txt" holds the following info:
screen: Take the phone to a repair shop where they can replace the damaged screen.
unresponsive: Try to restart the phone by holding the power button for at least 4 seconds.
frozen: Try to restart the phone by holding the power button for at least 4 seconds.
audio: If the audio or sound doesnt work, go to the nearest repair shop to fix it.
Your question reminds me a lot of my learning, so I will try give an answer to expand on your learning with lots of print statements to consider how it works carefully. It's not the most efficient or stable approach but hopefully of some use to you to move forwards.
print "LOADING RAW DATA"
solution_dictionary = {}
with open('solutions.txt', 'r') as infile:
for line in infile:
dict_key, solution = line.split(':')
print "Dictionary 'key' is: ", dict_key
print "Corresponding solution is: ", solution
solution_dictionary[dict_key] = solution.strip('\n')
print '\n'
print 'Final dictionary is:', '\n'
print solution_dictionary
print '\n'
print 'FINISHED LOADING RAW DATA'
solved = False
while not solved: # Will keep looping as long as solved == False
issue = raw_input('What is your problem? ')
solution = solution_dictionary.get(issue)
""" If we can find the 'issue' in the dictionary then 'solution' will have
some kind of value (considered 'True'), otherwise 'None' is returned which
is considered 'False'."""
if solution:
print solution
solved = True
else:
print ("Sorry, no answer found. Valid issues are 'frozen', "
"'screen' 'audio' or 'unresponsive'")
want_to_exit = raw_input('Want to exit? Y or N? ')
if want_to_exit == 'Y':
solved = True
else:
pass
Other points:
- don't use 'file' as a variable name anywhere. It's a python built-in and can cause some weird behaviour that you'll struggle to debug https://docs.python.org/2/library/functions.html
- If you get an error, don't say "crashes", you should provide some form of traceback e.g.:
a = "hello" + 2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-6f5e94f8cf44> in <module>()
----> 1 a = "hello" + 2
TypeError: cannot concatenate 'str' and 'int' objects
your question title will get you down-votes unless you are specific about the problem. "help me do something" is unlikely to get a positive response because the error is ambiguous, there's no sign of Googling the errors (and why the results didn't work) and it's unlikely to be of any help to anyone else in the future.
Best of luck :)
When I change the line "for i in file:" to "for i in read:" everything works well.
To output only the line starting with "screen" just forget the probs variable and change the last for statement to
for line in advice:
if line.startswith( favcol ) :
print line
break
For the startswith() function refer to https://docs.python.org/2/library/stdtypes.html#str.startswith
And: the advices of roganjosh are helpfull. Particularly the one "please don't use python keywords (e.g. file) as variable names". I spent hours of debugging with some bugs like "file = ..." or "dict = ...".
#!/ usr/bin/python3
import sys
def main():
for line in sys.stdin:
line = line.split()
x = -1
for word in line:
if word[-1]==word[0] or word[x-1]==word[1]:
print(word)
main()
It also prints dots at the end of the sentences, why?
And words like 'cat' and 'moon' should also be out of the question. But it also prints these words.
Can someone point me in the right direction please?
I think your problem is because the second and second last characters of 'cat' are the same.
def main():
for line in sys.stdin:
line = line.split()
x = -1
for word in line:
if (word[-1]==word[0] and len(word)<=2) or (word[x-1]==word[1] and len(word)<=4):
print(word)
or something like that, depending on your preference.
This should get rid of that pesky cat, although moon stays.
It will also include words that use upper and lower case characters, so sadly not only will moon print but also Moon, MOon, mooN and moOn.
Edit: Forgot to test for one character words (a, I etc)
import sys
def main():
for line in sys.stdin:
line = line.split()
for word in line:
uword = word.lower()
if len(uword) > 1:
if uword[0:1]==uword[-1] or (uword[1:2]==uword[-2] and len(uword) > 3):
print(word)
main()
I got it guys, understood the question wrong. This prints the right words, that I got beforehand. That cleared things up for me. This is the right code but it still gives "sys.excepthook is missing". I run this code with another code that gives a space an newline. So every space between words becomes a newline:
cat cdb.sentences| python3 newline.py| python3 word.py |head -n 5
import sys
def main():
for line in sys.stdin:
line = line.split()
for word in line:
letterword = lw = word.lower()
if len(lw) > 1:
if lw[0:1]==lw[-1] and (lw[1:2]==lw[-2]):
print(word)
main()
import sys
def main():
for line in sys.stdin:
line = line.rstrip()
text = ""
for word in line:
if word in ' ':
text=text + '\n'
else:
text=text + word
print(text)
main()
It should give the 5 first words that have the same first, last letter, -2 and 1 letters. With an white line between each one of them. First i want to solve that hook.
Thx
You are not helping yourself by answering your own question with what is essentially a completely different question in an answer.
You should have closed your original off by accepting one of the answers, if one of them helped, which it looked like they did and then asked a new question.
However, the answer to your 2nd question/answer can be found here:
http://python.developermemo.com/7757_12807216/ and it is a brilliant answer
Synopsis:
The reason this is happening is that you're piping a nonzero amount of output from your Python script to something which never reads from standard input. You can get the same result by piping to any command which doesn't read standard input, such as
python testscript.py | cd .
Or for a simpler example, consider a script printer.py containing nothing more than
print 'abcde'
Then
python printer.py | python printer.py
will produce the same error.
The following however will trap the sys.excepthook error:
import sys
import logging
def log_uncaught_exceptions(exception_type, exception, tb):
logging.critical(''.join(traceback.format_tb(tb)))
logging.critical('{0}: {1}'.format(exception_type, exception))
sys.excepthook = log_uncaught_exceptions
print "abcdfe"
I'm sure this is a basic question, but I have spent about an hour on it already and can't quite figure it out. I'm parsing smartctl output, and here is the a sample of the data I'm working with:
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-2.6.32-39-pve] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA MD04ACA500
Serial Number: Y9MYK6M4BS9K
LU WWN Device Id: 5 000039 5ebe01bc8
Firmware Version: FP2A
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Jul 2 11:24:08 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
What I'm trying to achieve is pulling out the device model (some devices it's just one string, other devices, such as this one, it's two words), serial number, time, and a couple other fields. I assume it would be easiest to capture all data after the colon, but how to eliminate the variable amounts of spaces?
Here is the relevant code I currently came up with:
deviceModel = ""
serialNumber = ""
lines = infoMessage.split("\n")
for line in lines:
parts = line.split()
if str(parts):
if parts[0] == "Device Model: ":
deviceModel = parts[1]
elif parts[0] == "Serial Number: ":
serialNumber = parts[1]
vprint(3, "Device model: %s" %deviceModel)
vprint(3, "Serial number: %s" %serialNumber)
The error I keep getting is:
File "./tester.py", line 152, in parseOutput
if parts[0] == "Device Model: ":
IndexError: list index out of range
I get what the error is saying (kinda), but I'm not sure what else the range could be, or if I'm even attempting this in the right way. Looking for guidance to get me going in the right direction. Any help is greatly appreciated.
Thanks!
The IndexError occurs when the split returns a list of length one or zero and you access the second element. This happens when it isn't finding anything to split (empty line).
No need for regular expressions:
deviceModel = ""
serialNumber = ""
lines = infoMessage.split("\n")
for line in lines:
if line.startswith("Device Model:"):
deviceModel = line.split(":")[1].strip()
elif line.startswith("Serial Number:"):
serialNumber = line.split(":")[1].strip()
print("Device model: %s" %deviceModel)
print("Serial number: %s" %serialNumber)
I guess your problem is the empty line in the middle. Because,
>>> '\n'.split()
[]
You can do something like,
>>> f = open('a.txt')
>>> lines = f.readlines()
>>> deviceModel = [line for line in lines if 'Device Model' in line][0].split(':')[1].strip()
# 'TOSHIBA MD04ACA500'
>>> serialNumber = [line for line in lines if 'Serial Number' in line][0].split(':')[1].strip()
# 'Y9MYK6M4BS9K'
Try using regular expressions:
import re
r = re.compile("^[^:]*:\s+(.*)$")
m = r.match("Device Model: TOSHIBA MD04ACA500")
print m.group(1) # Prints "TOSHIBA MD04ACA500"
Not sure what version you're running, but on 2.7, line.split() is splitting the line by word, so
>>> parts = line.split()
parts = ['Device', 'Model:', 'TOSHIBA', 'MD04ACA500']
You can also try line.startswith() to find the lines you want https://docs.python.org/2/library/stdtypes.html#str.startswith
The way I would debug this is by printing out parts at every iteration. Try that and show us what the list is when it fails.
Edit: Your problem is most likely what #jonrsharpe said. parts is probably an empty list when it gets to an empty line and str(parts) will just return '[]' which is True. Try to test that.
I think it would be far easier to use regular expressions here.
import re
for line in lines:
# Splits the string into at most two parts
# at the first colon which is followed by one or more spaces
parts = re.split(':\s+', line, 1)
if parts:
if parts[0] == "Device Model":
deviceModel = parts[1]
elif parts[0] == "Serial Number":
serialNumber = parts[1]
Mind you, if you only care about the two fields, startswith might be better.
When you split the blank line, parts is an empty list.
You try to accommodate that by checking for an empty list, But you turn the empty list to a string which causes your conditional statement to be True.
>>> s = []
>>> bool(s)
False
>>> str(s)
'[]'
>>> bool(str(s))
True
>>>
Change if str(parts): to if parts:.
Many would say that using a try/except block would be idiomatic for Python
for line in lines:
parts = line.split()
try:
if parts[0] == "Device Model: ":
deviceModel = parts[1]
elif parts[0] == "Serial Number: ":
serialNumber = parts[1]
except IndexError:
pass
I have recently been learning some Python and how to apply it to my work. I have written a couple of scripts successfully, but I am having an issue I just cannot figure out.
I am opening a file with ~4000 lines, two tab separated columns per line. When reading the input file, I get an index error saying that the list index is out of range. However, while I get the error every time, it doesn't happen on the same line every time (as in, it will throw the error on different lines everytime!). So, for some reason, it works generally but then (seemingly) randomly fails.
As I literally only started learning Python last week, I am stumped. I have looked around for the same problem, but not found anything similar. Furthermore I don't know if this is a problem that is language specific or IPython specific. Any help would be greatly appreciated!
input = open("count.txt", "r")
changelist = []
listtosort = []
second = str()
output = open("output.txt", "w")
for each in input:
splits = each.split("\t")
changelist = list(splits[0])
second = int(splits[1])
print second
if changelist[7] == ";":
changelist.insert(6, "000")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
elif changelist[8] == ";":
changelist.insert(6, "00")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
elif changelist[9] == ";":
changelist.insert(6, "0")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
else:
#output.write(str("".join(changelist)))
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
output.close()
The error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/home/a/Desktop/sharedfolder/ipytest/individ.ins.count.test/<ipython-input-87-32f9b0a1951b> in <module>()
57 splits = each.split("\t")
58 changelist = list(splits[0])
---> 59 second = int(splits[1])
60
61 print second
IndexError: list index out of range
Input:
ID=cds0;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C 50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C 36
Desired output:
ID=cds0000;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C 50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C 36
The reason you're getting the IndexError is that your input-file is apparently not entirely tab delimited. That's why there is nothing at splits[1] when you attempt to access it.
Your code could use some refactoring. First of all you're repeating yourself with the if-checks, it's unnecessary. This just pads the cds0 to 7 characters which is probably not what you want. I threw the following together to demonstrate how you could refactor your code to be a little more pythonic and dry. I can't guarantee it'll work with your dataset, but I'm hoping it might help you understand how to do things differently.
to_sort = []
# We can open two files using the with statement. This will also handle
# closing the files for us, when we exit the block.
with open("count.txt", "r") as inp, open("output.txt", "w") as out:
for each in inp:
# Split at ';'... So you won't have to worry about whether or not
# the file is tab delimited
changed = each.split(";")
# Get the value you want. This is called unpacking.
# The value before '=' will always be 'ID', so we don't really care about it.
# _ is generally used as a variable name when the value is discarded.
_, value = changed[0].split("=")
# 0-pad the desired value to 7 characters. Python string formatting
# makes this very easy. This will replace the current value in the list.
changed[0] = "ID={:0<7}".format(value)
# Join the changed-list with the original separator and
# and append it to the sort list.
to_sort.append(";".join(changed))
# Write the results to the file all at once. Your test data already
# provided the newlines, you can just write it out as it is.
output.writelines(to_sort)
# Do what else you need to do. Maybe to_list.sort()?
You'll notice that this code is reduces your code down to 8 lines but achieves the exact same thing, does not repeat itself and is pretty easy to understand.
Please read the PEP8, the Zen of python, and go through the official tutorial.
This happens when there is a line in count.txt which doesn't contain the tab character. So when you split by tab character there will not be any splits[1]. Hence the error "Index out of range".
To know which line is causing the error, just add a print(each) after splits in line 57. The line printed before the error message is your culprit. If your input file keeps changing, then you will get different locations. Change your script to handle such malformed lines.