Iterating through characters in a python string and summing their values - python

So I have a very simple task. The Project Euler problem Names Scores gives us a file with a set of strings(which are names). Now you have to sort these names in the alphabetical order and then compute what is known as a name score for each of these names and sum them all up. The name score calculation is pretty simple. All you have to do is take a name and then sum up the values of the alphabets in the name and then multiply this sum with the position that the name has on the list. Obviously this seems a pretty simple question.
Being a python beginner, I wanted to try this out on python and being a beginner this was the code I wrote out. I did use list comprehensions as well along with a sum, but that gives me the same answer. Here is my code:
def name_score(s):
# print sum((ord(c)-96) for c in s)
s1 = 0;
for c in s:
s1 = s1 + (ord(c) - 96)
print s1
return s1
# print ord(c) - 96
myList = []
f = open('p022_names.txt')
for line in f:
myList.append(line.lower())
count = 0;
totalSum = 0;
for line in sorted(myList):
count = count + 1;
totalSum += (name_score(line) * count)
print totalSum
Now the file p022_names.txt contains only one line "colin". So the function name_score("colin") should return 53. Now try whatever I always end up getting the value -33. I am using PyDev on Eclipse. Now here is a curious anomaly. If I just used the list variable and populated it with the value myList = ["colin"] in the code, I get the correct answer. Honestly I don't know what is happening. Can anybody throw some light into what is happening here. There is a similar loop also in the program to calculate totalSum, but that doesn't seem to have an issue.
[EDIT] After the issue was pointed out, I am posting an updated revision of the code which works.
def name_score(s):
return sum((ord(c)-96) for c in s)
with open('p022_names.txt') as f:
myList = f.read().splitlines()
print sum((name_score(line.lower()) * (ind+1)) for ind,line in enumerate(sorted(myList)))

96 - 53 - 33 = 10
That happens because you have a newline character ("\n") in your file, thus your line is not "colin" but "colin\n".
To get rid of the newline character, multiple approaches could work. Here is an example:
Replace your line:
for line in f:
with:
for line in f.read().splitlines():

Could it be because you didn't close the file? As in f.close()?

Related

Creating a File and then Loop for an Incrementing Value

I am attempting to create a loop that creates a file named "/tmp/newfile.txt" and creates 29 lines of text. Line 1 should read: "I see 0 sheep". For each line, 1 sheep should be added and a new line created to reflect that until 29 sheep (and lines) are reached.
x = 0
myDoc = myDoc.readfiles("/tmp/newfile.txt", "r+")
myDoc.write("I see" + str(x) + "sheep")
for line in myDoc.readfiles():
x = x + 1
myDoc.append(x)
print(myDoc)
if x == 30
break;
First, what I tried to do is create a new file and put it into a variable (myDoc) that would open it. I specified w+ so that I would have the ability to read the file and write on it. I gave the changing number a variable 'x'.
The function I intended to, for each line in the file, write "I see x sheep". Afterward, add 1 to the current value of x and append it so it's added to the file. After this, print it so I can see the line(s). Once that value reached 30, cease the loop because 29 is the number of lines I need.
My errors have to do with indentation and nothing being printed at all. I am extremely new to this.
Welcome to StackOverflow!
There seem to be a couple of issues in the code:
Indentation / Syntax Errors - It seems that you are using Python, which follows strict indentation and whitespace rules. An indent is inserted when you enter a new local scope / new control flow / enter an if/elif/else statement or a while or for loop, to separate it from the current scope.
You'd need to remove the space on the left side on line 3 and line 6.
Also, on line 8 there should be a colon(:) after the if x==30.
The mode used (w+) isn't going to work as expected.
This mode overwrites a file if it already exists and allows you to read and write to that file. Instead, you would need the r+ mode.
There's a great explanation & flowchart in this answer explaining the various file modes - https://stackoverflow.com/a/30566011/13307211
The for loop can't iterate over myDoc.
The open function gives a file object (TextIOWrapper), which can't be iterated over. You could use the myDoc.readfiles() method, which returns a list of lines present in the file and loop over that - for line in myDoc.readfiles().
printing myDoc and using .append() with myDoc wouldn't work as expected. It's representing a file object, which doesn't have an append method. Also, I feel like there might have been some mixed logic here - were you trying to iterate over myDoc like an array and hence pushing value to it?
I'd suggest removing the append part as the past value of x isn't going to be needed for what you want to do.
After applying the above, you should end up with code that looks like this -
x = 0
myDoc = open("./newfile.txt", "r+")
for line in myDoc.readlines():
myDoc.write("I see" + str(x) + "sheep\n")
x = x + 1
if x == 30:
break
Now, this doesn't exactly do what you want it to do...
The first thing we should do is update the for loop - a for loop should be structured in a way where it has a start, an end, and an increment, or it should iterate over a range of values. Python has a neat range function that allows you to iterate between values.
for x in range(1, 10):
print(x)
the above would print values from 1 to 10, excluding 10.
updating our for loop, we can change the code to -
myDoc = open("./newfile.txt", "r+")
for x in range(1, 30):
myDoc.write("I see" + str(x) + "sheep")
we could also use a while loop here -
myDoc = open("./newfile.txt", "r+")
for x in range(1, 30):
myDoc.write("I see" + str(x) + "sheep")
this makes the file but without the lines and without the right formatting. "I see " + str(x) + " sheep" should fix the sentence, but to print the string on multiple lines instead of the same line, you would need to use the newline character(\n) and add it at the end of the string -
myDoc = open("./newfile.txt", "r+")
for x in range(1, 30):
myDoc.write("I see" + str(x) + "sheep\n")

crating a list from answers

I am new on python and currently reading Hands-on cryptography with python, while I was reading the caesar5.py script on the book a question came across my mind and I would appreciate anyone who can help me out with it.
the code says:
alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
str_in = raw_input("Enter ciphertext: ")
for shift in range(26):
n = len(str_in)
str_out = ""
for i in range(n):
c = str_in[i]
loc = alpha.find(c)
newloc = (loc + shift)%26
str_out += alpha[newloc]
print shift, str_out
and it prints the results in 26 rows, and I was wondering how can I print the results in one list?
instead of printing it like
0 KHOOR
1 LIPPS
.
.
.
25 JGNNQ
It just prints out [KHOOR, LIPPS,...,JGNNQ], something like this.
So essentially what your looking to do is create an array in python.
There are tutorials on this: https://www.w3schools.com/python/python_arrays.asp.
I'll try to write the code out quickly:
array = []
array.append(str_out)
Then at the end of the for loop:
print(array)

What causes this return() to create a SyntaxError?

I need this program to create a sheet as a list of strings of ' ' chars and distribute text strings (from a list) into it. I have already coded return statements in python 3 but this one keeps giving
return(riplns)
^
SyntaxError: invalid syntax
It's the return(riplns) on line 39. I want the function to create a number of random numbers (randint) inside a range built around another randint, coming from the function ripimg() that calls this one.
I see clearly where the program declares the list I want this return() to give me. I know its type. I see where I feed variables (of the int type) to it, through .append(). I know from internet research that SyntaxErrors on python's return() functions usually come from mistype but it doesn't seem the case.
#loads the asciified image ("/home/userX/Documents/Programmazione/Python projects/imgascii/myascify/ascimg4")
#creates a sheet "foglio1", same number of lines as the asciified image, and distributes text on it on a randomised line
#create the sheet foglio1
def create():
ref = open("/home/userX/Documents/Programmazione/Python projects/imgascii/myascify/ascimg4")
charcount = ""
field = []
for line in ref:
for c in line:
if c != '\n':
charcount += ' '
if c == '\n':
charcount += '*' #<--- YOU GONNA NEED TO MAKE THIS A SPACE IN A FOLLOWING FUNCTION IN THE WRITER.PY PROGRAM
for i in range(50):#<------- VALUE ADJUSTMENT FROM WRITER.PY GOES HERE(default : 50):
charcount += ' '
charcount += '\n'
break
for line in ref:
field.append(charcount)
return(field)
#turn text in a list of lines and trasforms the lines in a list of strings
def poemln():
txt = open("/home/gcg/Documents/Programmazione/Python projects/imgascii/writer/poem")
arrays = []
for line in txt:
arrays.append(line)
txt.close()
return(arrays)
#rander is to be called in ripimg()
def rander(rando, fldepth):
riplns = []
for i in range(fldepth):
riplns.append(randint((rando)-1,(rando)+1)
return(riplns) #<---- THIS RETURN GIVES SyntaxError upon execution
#opens a rip on the side of the image.
def ripimg():
upmost = randint(160, 168)
positions = []
fldepth = 52 #<-----value is manually input as in DISTRIB function.
positions = rander(upmost,fldepth)
return(positions)
I omitted the rest of the program, I believe these functions are enough to get the idea, please tell me if I need to add more.
You have incomplete set of previous line's parenthesis .
In this line:-
riplns.append(randint((rando)-1,(rando)+1)
You have to add one more brace at the end. This was causing error because python was reading things continuously and thought return statement to be a part of previous uncompleted line.

Replace space-char by newline-char within max-line-length in python

I am trying to write a python script,
which breaks a continuous string into lines,
when the max_line_length has been exceeded.
It shall not break words,
and searches therefore the last occurrence of a whitespace-char,
which will be replaced by a newline-char.
For some reason it does not break within the specified limit.
E.g. when defining the max_line_length = 80,
the text sometimes breaks at 82 or 83, etc.
Since quite some time I am trying to fix the problem,
however it feels like i am having the tunnel vision
and don't see the problem here:
#!/usr/bin/python
import sys
if len(sys.argv) < 3:
print('usage: $ python3 breaktext.py <max_line_length> <file>')
print('example: $ python3 breaktext.py 80 infile.txt')
exit()
filename = str(sys.argv[2])
with open(filename, 'r') as file:
text_str = file.read().replace('\n', '')
m = int(sys.argv[1]) # max_line_length
text_list = list(text_str) # convert string to list
l = 0; # line_number
i = m+1 # line_character_index
index = m+1 # total_list_index
while index < len(text_list):
while text_list[l * m + i] != ' ':
i -= 1
pass
text_list[l * m + i] = '\n'
l += 1
i = m+1
index += m+1
pass
text_str = ''.join(text_list)
print(text_str)
I guess we'll take this from the top.
text_str = file.read().replace('\n', '')
Here's one assumption about the input data I don't know if it's true. You're replacing all the newline characters with nothing; if there weren't spaces next to them, this means the code below will never break the lines in the same places.
text_list = list(text_str) # convert string to list
This splits the input file into single character strings. I guess you might have done so to make it mutable, such that you can replace individual characters, but it's a very expensive operation and loses all the features of a string. Python is a high level language that would allow you to split into e.g. words instead.
index = m+1 # total_list_index
while index < len(text_list):
#...
index += m+1
Let's consider what this means. We're not entering into the loop if index exceeds the text_list length. But index is advancing in steps of m+1. So we're splitting math.floor(len(text)/(max_line_length+1)) times. Unless every line is exactly max_line_length characters, not counting its space we replace with a newline, that's too few times. Too few times means too long lines, at least at the end.
l = 0; # line_number
i = m+1 # line_character_index
#loop:
while text_list[l * m + i] != ' ':
i -= 1
text_list[l * m + i] = '\n'
l += 1
i = m+1
This is making things difficult with index math. Quite clearly the one index we ever use is l * m + i. This moves in a quite odd way; it searches backwards for a space, then leaps forward as l increments and i resets. Whatever position it had reversed to is lost as all the leaps are in steps of m.
Let's apply m=5 to the string "Fee fie faw fum who did you see now". For the first iteration, 0 * 5 + 5+1 hits the second word, and i seeks back to the first space. The first line then is "Fee", as expected. The second search starts at 1*5 + 5+1, which is a space, and the second line becomes "fie faw", which already exceeds our limit of 5! The reason is that l * m isn't the beginning of the line; it's actually in the middle of "fie", a discrepancy which can only grow as you continue through the file. It grows whenever you split off a line that is shorter than m.
The solution involves remembering where you did your split. That could be as simple as replacing l * m with index, and updating it by index += i instead of m+1.
Another odd effect happens if you ever encounter a word that exceeds the maximum line length. Beyond meaning a line is longer than the limit, i will still search backwards until it finds a space; that space could then be in an earlier line altogether, producing extra short lines as well as too long ones. That's a result of handling the entire text as one array and not limiting which section we're looking at.
Personally I'd much rather use Python's built in methods, such as str.rindex, which can find a particular character in a given region within a string:
s = "Fee fie faw fum who did you see now"
maxlen = 5
start = 8
end = s.rindex(' ', start, start+maxlen)
print(s[start:end])
start = end + 1
We also, as PaulMcG pointed out, can go full "batteries included" and use the standard library textwrap module for the entire task.

Python/IPython strange non reproducible list index out of range error

I have recently been learning some Python and how to apply it to my work. I have written a couple of scripts successfully, but I am having an issue I just cannot figure out.
I am opening a file with ~4000 lines, two tab separated columns per line. When reading the input file, I get an index error saying that the list index is out of range. However, while I get the error every time, it doesn't happen on the same line every time (as in, it will throw the error on different lines everytime!). So, for some reason, it works generally but then (seemingly) randomly fails.
As I literally only started learning Python last week, I am stumped. I have looked around for the same problem, but not found anything similar. Furthermore I don't know if this is a problem that is language specific or IPython specific. Any help would be greatly appreciated!
input = open("count.txt", "r")
changelist = []
listtosort = []
second = str()
output = open("output.txt", "w")
for each in input:
splits = each.split("\t")
changelist = list(splits[0])
second = int(splits[1])
print second
if changelist[7] == ";":
changelist.insert(6, "000")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
elif changelist[8] == ";":
changelist.insert(6, "00")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
elif changelist[9] == ";":
changelist.insert(6, "0")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
else:
#output.write(str("".join(changelist)))
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
output.close()
The error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/home/a/Desktop/sharedfolder/ipytest/individ.ins.count.test/<ipython-input-87-32f9b0a1951b> in <module>()
57 splits = each.split("\t")
58 changelist = list(splits[0])
---> 59 second = int(splits[1])
60
61 print second
IndexError: list index out of range
Input:
ID=cds0;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C 50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C 36
Desired output:
ID=cds0000;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C 50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C 36
The reason you're getting the IndexError is that your input-file is apparently not entirely tab delimited. That's why there is nothing at splits[1] when you attempt to access it.
Your code could use some refactoring. First of all you're repeating yourself with the if-checks, it's unnecessary. This just pads the cds0 to 7 characters which is probably not what you want. I threw the following together to demonstrate how you could refactor your code to be a little more pythonic and dry. I can't guarantee it'll work with your dataset, but I'm hoping it might help you understand how to do things differently.
to_sort = []
# We can open two files using the with statement. This will also handle
# closing the files for us, when we exit the block.
with open("count.txt", "r") as inp, open("output.txt", "w") as out:
for each in inp:
# Split at ';'... So you won't have to worry about whether or not
# the file is tab delimited
changed = each.split(";")
# Get the value you want. This is called unpacking.
# The value before '=' will always be 'ID', so we don't really care about it.
# _ is generally used as a variable name when the value is discarded.
_, value = changed[0].split("=")
# 0-pad the desired value to 7 characters. Python string formatting
# makes this very easy. This will replace the current value in the list.
changed[0] = "ID={:0<7}".format(value)
# Join the changed-list with the original separator and
# and append it to the sort list.
to_sort.append(";".join(changed))
# Write the results to the file all at once. Your test data already
# provided the newlines, you can just write it out as it is.
output.writelines(to_sort)
# Do what else you need to do. Maybe to_list.sort()?
You'll notice that this code is reduces your code down to 8 lines but achieves the exact same thing, does not repeat itself and is pretty easy to understand.
Please read the PEP8, the Zen of python, and go through the official tutorial.
This happens when there is a line in count.txt which doesn't contain the tab character. So when you split by tab character there will not be any splits[1]. Hence the error "Index out of range".
To know which line is causing the error, just add a print(each) after splits in line 57. The line printed before the error message is your culprit. If your input file keeps changing, then you will get different locations. Change your script to handle such malformed lines.

Categories