Summing elements of string in Python - python

I am new comer to python. I started to read a book published by MIT's professor about python. I got an exercise from this book. I tried to solve this but i could not.
Problem: Let s be a string that contains a sequence of decimal numbers
separated by commas, e.g., s = '1.23,2.4,3.123' . Write a program that prints
the sum of the numbers in s.
i have to find out the sum of 1.23,2.4, and 3.123
So far i made some codes to solve this problem and my codes are follwoing:
s = '1.23,2.4,3.123'
total = 0
for i in s:
print i
if i == ',':
Please,someone help me how can go further?

All you need is first splitting your string with , then you'll have a list of string digits :
>>> s.split(',')
['1.23', '2.4', '3.123']
Then you need to convert this strings to float object till you can calculate those sum, for that aim you have 2 choice :
First is using map function :
>>> sum(map(float, s.split(',')))
6.753
Second way is using a generator expression within sum function :
>>> sum(float(i) for i in s.split(','))
6.753

It is much simpler to use str.split(), like in
s = '1.23,2.4,3.123'
total = 0
for i in s.split(','):
total += float(i)

It's much less Pythonic, and more effort, to walk through the string and build up the numbers as you go, but it's probably more in keeping with the spirit of a beginner working things out, and more of a continuation of your original code:
s = '1.23,2.4,3.123'
total = 0
number_holder_string = ''
for character in s:
if character == ',': # found the end of a number
number_holder_value = float(number_holder_string)
total = total + number_holder_value
number_holder_string = ''
else:
number_holder_string = number_holder_string + character
print total
That way, number_holder_string goes:
''
'1'
'1.'
'1.2'
'1.23'
found a comma -> convert 1.23 from string to value and add it.
''
'2'
'2.'
etc.

Related

How to unify separate digits when iterating string

I'm trying to iterate over a string and get all the numbers so that I can add them to a list, which I need for another task. I have multiple functions that recurrsively refer to each other and the original input is a list of data. The problem is that when I print the string I get the right output, but if I iterate over it and print all the indexes I get seperate digits, so 1,1 instead of 11 or 9,3 instead of 93. Does anyone have a simple solution to this problem? I'm not Quite experienced in programming so it may seem like a simple task but I can't figure it out at the moment. Here's my code for the problem part.
numbers = names.split('\t')[1].split(' ')[1]
print numbers
some of the output:
8
44
46
86
now if I use the following code:
numbers = names.split('\t')[1].split(' ')[1]
for i in numbers:
print i
I get the following output:
8
4
4
4
6
8
6
or when I convert to a list:
numbers = names.split('\t')[1].split(' ')[1]
print list(numbers)
output:
['8']
['4', '4']
['4', '6']
['8', '6']
The input names is structured in the following way: Andy Gray\t 2807 53
where I have many more names, but they are all structured like this.
I then split by \t to remove the names and then split again by ' ' to get the numbers. I then have 2 numbers and take the second index to get the numbers I want, which are the second numbers next to the name.
My only goal for now is to get the 'complete' digits, so the output as it is like when I print it. I need to be able to get a list of those numbers as integers where every index is the complete digit, so [8,44,46,86] etc. I can then iterate over the numbers and use them. Once I can do that I know what to do, but I'm stuck at this point for now. Any help would be nice.
Link to complete input and python code I am using, in case it makes things more clear:
Demo
str.rsplit()
works like str.split(), but starts from the right end.
s = "Andy Gray\t 2807 53"
_, number = s.rsplit(maxsplit=1)
print(number)
If you know that all your input is structured the same way and you have the guarantee that the string ends with the 2 digits of your interest, why don't just do the following?
names_list = ['Andy gray\t2807 53', 'name surname\t2807 934']
for n in names_list:
print (n[-2:])
On the other hand if you're not sure the last number only contains 2 digits, all the splitting on tab is unnecessary:
import re
names_list = ['Andy gray\t2807 53', 'name surname\t2807 94']
for n in names_list:
try:
if re.compile(r'.*\d+$').match(n) and ' ' in n:
print(n.split()[-1])
except:
pass
EDIT after reading the code added by OP
The code looks good, but the problem is that my input(names) is not a list of strings. I have a string like this: Guillaume van Steen 5855 5 Sven Silvis Cividjian 1539 88 Jan Willem Swarttouw 3911 66 which goes in further. This is why I split at the tab and whitespace to get the final number.
Maybe this code help:
from pathlib import Path
file = Path('text.txt')
text = file.read_text()
Here is were split the file in lines:
lines = text.split('\n')
So can use the function with a little added check
def get_numbers(list_of_strings):
numbers = list()
for string in list_of_strings:
# here check if the line has a "\t" so asume is a valid line
if '\t' in string:
numbers.append(int(string.split()[-1]))
return numbers
numbers = get_numbers(lines)
print(numbers)
You can split the string by whitespaces again and convert each value to int.
For example,
numbers = names.split('\t')[1].split(' ')[1]
list_numbers = [int(x) for x in numbers.split(' ')]
Then you will have your list of 'complete' digits

Python: How to move the position of an output variable using the split() method

This is my first SO post, so go easy! I have a script that counts how many matches occur in a string named postIdent for the substring ff. Based on this it then iterates over postIdent and extracts all of the data following it, like so:
substring = 'ff'
global occurences
occurences = postIdent.count(substring)
x = 0
while x <= occurences:
for i in postIdent.split("ff"):
rawData = i
required_Id = rawData[-8:]
x += 1
To explain further, if we take the string "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff", it is clear there are 3 instances of ff. I need to get the 8 preceding characters at every instance of the substring ff, so for the first instance this would be 909a9090.
With the rawData, I essentially need to offset the variable required_Id by -1 when I get the data out of the split() method, as I am currently getting the last 8 characters of the current string, not the string I have just split. Another way of doing it could be to pass the current required_Id to the next iteration, but I've not been able to do this.
The split method gets everything after the matching string ff.
Using the partition method can get me the data I need, but does not allow me to iterate over the string in the same way.
Get the last 8 digits of each split using a slice operation in a list-comprehension:
s = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
print([x[-8:] for x in s.split('ff') if x])
# ['909a9090', '90434390', 'sdfs9000']
Not a difficult problem, but tricky for a beginner.
If you split the string on 'ff' then you appear to want the eight characters at the end of every substring but the last. The last eight characters of string s can be obtained using s[-8:]. All but the last element of a sequence x can similarly be obtained with the expression x[:-1].
Putting both those together, we get
subject = '090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff'
for x in subject.split('ff')[:-1]:
print(x[-8:])
This should print
909a9090
90434390
sdfs9000
I wouldn't do this with split myself, I'd use str.find. This code isn't fancy but it's pretty easy to understand:
fullstr = "090fd0909a9090ff90493090434390ff90904210412419ghfsdfs9000ff"
search = "ff"
found = None # our next offset of
last = 0
l = 8
print(fullstr)
while True:
found = fullstr.find(search, last)
if found == -1:
break
preceeding = fullstr[found-l:found]
print("At position {} found preceeding characters '{}' ".format(found,preceeding))
last = found + len(search)
Overall I like Austin's answer more; it's a lot more elegant.

Replacing all numeric value to formatted string

What I am trying to do is:
Find out all the numeric values in a string.
input_string = "高露潔光感白輕悅薄荷牙膏100 79.80"
numbers = re.finditer(r'[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?',input_string)
for number in numbers:
print ("{} start > {}, end > {}".format(number.group(), number.start(0), number.end(0)))
'''Output'''
>>100 start > 12, end > 15
>>79.80 start > 18, end > 23
And then I want to replace all the integer and float value to a certain format:
INT_(number of digit) and FLT(number of decimal places)
eg. 100 -> INT_3 // 79.80 -> FLT_2
Thus, the expect output string is like this:
"高露潔光感白輕悅薄荷牙膏INT_3 FLT2"
But the string replace substring method in Python is kind of weird, which can't archive what I want to do.
So I am trying to use the substring append substring methods
string[:number.start(0)] + "INT_%s"%len(number.group()) +.....
which looks stupid and most importantly I still can't make it work.
Can anyone give me some advice on this problem?
Use re.sub and a callback method inside where you can perform various manipulations on the match:
import re
def repl(match):
chunks = match.group(1).split(".")
if len(chunks) == 2:
return "FLT_{}".format(len(chunks[1]))
else:
return "INT_{}".format(len(chunks[0]))
input_string = "高露潔光感白輕悅薄荷牙膏100 79.80"
result = re.sub(r'[-+]?([0-9]*\.?[0-9]+)(?:[eE][-+]?[0-9]+)?',repl,input_string)
print(result)
See the Python demo
Details:
The regex now has a capturing group over the number part (([0-9]*\.?[0-9]+)), this will be analyzed inside the repl method
Inside the repl method, Group 1 contents is split with . to see if we have a float/double, and if yes, we return the length of the fractional part, else, the length of the integer number.
You need to group the parts of your regex possibly like this
import re
def repl(m):
if m.group(1) is None: #int
return ("INT_%i"%len(m.group(2)))
else: #float
return ("FLT_%i"%(len(m.group(2))))
input_string = "高露潔光感白輕悅薄荷牙膏100 79.80"
numbers = re.sub(r'[-+]?([0-9]*\.)?([0-9]+)([eE][-+]?[0-9]+)?',repl,input_string)
print(numbers)
group 0 is the whole string that was matched (can be used for putting into float or int)
group 1 is any digits before the . and the . itself if exists else it is None
group 2 is all digits after the . if it exists else it it is just all digits
group 3 is the exponential part if existing else None
You can get a python-number from it with
def parse(m):
s=m.group(0)
if m.group(1) is not None or m.group(3) is not None: # if there is a dot or an exponential part it must be a float
return float(s)
else:
return int(s)
You probably are looking for something like the code below (of course there are other ways to do it). This one just starts with what you were doing and show how it can be done.
import re
input_string = u"高露潔光感白輕悅薄荷牙膏100 79.80"
numbers = re.finditer(r'[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?',input_string)
s = input_string
for m in list(numbers)[::-1]:
num = m.group(0)
if '.' in num:
s = "%sFLT_%s%s" % (s[:m.start(0)],str(len(num)-num.index('.')-1),s[m.end(0):])
else:
s = "%sINT_%s%s" % (s[:m.start(0)],str(len(num)), s[m.end(0):])
print(s)
This may look a bit complicated because there are really several simple problems to solve.
For instance your initial regex find both ints and floats, but you with to apply totally different replacements afterward. This would be much more straightforward if you were doing only one thing at a time. But as parts of floats may look like an int, doing everything at once may not be such a bad idea, you just have to understand that this will lead to a secondary check to discriminate both cases.
Another more fundamental issue is that really you can't replace anything in a python string. Python strings are non modifiable objects, henceforth you have to make a copy. This is fine anyway because the format change may need insertion or removal of characters and an inplace replacement wouldn't be efficient.
The last trouble to take into account is that replacement must be made backward, because if you change the beginning of the string the match position would also change and the next replacement wouldn't be at the right place. If we do it backward, all is fine.
Of course I agree that using re.sub() is much simpler.

Can't convert 'list'object to str implicitly Python

I am trying to import the alphabet but split it so that each character is in one array but not one string. splitting it works but when I try to use it to find how many characters are in an inputted word I get the error 'TypeError: Can't convert 'list' object to str implicitly'. Does anyone know how I would go around solving this? Any help appreciated. The code is below.
import string
alphabet = string.ascii_letters
print (alphabet)
splitalphabet = list(alphabet)
print (splitalphabet)
x = 1
j = year3wordlist[x].find(splitalphabet)
k = year3studentwordlist[x].find(splitalphabet)
print (j)
EDIT: Sorry, my explanation is kinda bad, I was in a rush. What I am wanting to do is count each individual letter of a word because I am coding a spelling bee program. For example, if the correct word is 'because', and the user who is taking part in the spelling bee has entered 'becuase', I want the program to count the characters and location of the characters of the correct word AND the user's inputted word and compare them to give the student a mark - possibly by using some kind of point system. The problem I have is that I can't simply say if it is right or wrong, I have to award 1 mark if the word is close to being right, which is what I am trying to do. What I have tried to do in the code above is split the alphabet and then use this to try and find which characters have been used in the inputted word (the one in year3studentwordlist) versus the correct word (year3wordlist).
There is a much simpler solution if you use the in keyword. You don't even need to split the alphabet in order to check if a given character is in it:
year3wordlist = ['asdf123', 'dsfgsdfg435']
total_sum = 0
for word in year3wordlist:
word_sum = 0
for char in word:
if char in string.ascii_letters:
word_sum += 1
total_sum += word_sum
# Length of characters in the ascii letters alphabet:
# total_sum == 12
# Length of all characters in all words:
# sum([len(w) for w in year3wordlist]) == 18
EDIT:
Since the OP comments he is trying to create a spelling bee contest, let me try to answer more specifically. The distance between a correctly spelled word and a similar string can be measured in many different ways. One of the most common ways is called 'edit distance' or 'Levenshtein distance'. This represents the number of insertions, deletions or substitutions that would be needed to rewrite the input string into the 'correct' one.
You can find that distance implemented in the Python-Levenshtein package. You can install it via pip:
$ sudo pip install python-Levenshtein
And then use it like this:
from __future__ import division
import Levenshtein
correct = 'because'
student = 'becuase'
distance = Levenshtein.distance(correct, student) # distance == 2
mark = ( 1 - distance / len(correct)) * 10 # mark == 7.14
The last line is just a suggestion on how you could derive a grade from the distance between the student's input and the correct answer.
I think what you need is join:
>>> "".join(splitalphabet)
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
join is a class method of str, you can do
''.join(splitalphabet)
or
str.join('', splitalphabet)
To convert the list splitalphabet to a string, so you can use it with the find() function you can use separator.join(iterable):
"".join(splitalphabet)
Using it in your code:
j = year3wordlist[x].find("".join(splitalphabet))
I don't know why half the answers are telling you how to put the split alphabet back together...
To count the number of characters in a word that appear in the splitalphabet, do it the functional way:
count = len([c for c in word if c in splitalphabet])
import string
# making letters a set makes "ch in letters" very fast
letters = set(string.ascii_letters)
def letters_in_word(word):
return sum(ch in letters for ch in word)
Edit: it sounds like you should look at Levenshtein edit distance:
from Levenshtein import distance
distance("because", "becuase") # => 2
While join creates the string from the split, you would not have to do that as you can issue the find on the original string (alphabet). However, I do not think is what you are trying to do. Note that the find that you are trying attempts to find the splitalphabet (actually alphabet) within year3wordlist[x] which will always fail (-1 result)
If what you are trying to do is to get the indices of all the letters of the word list within the alphabet, then you would need to handle it as
for each letter in the word of the word list, determine the index within alphabet.
j = []
for c in word:
j.append(alphabet.find(c))
print j
On the other hand if you are attempting to find the index of each character within the alphabet within the word, then you need to loop over splitalphabet to get an individual character to find within the word. That is
l = []
for c within splitalphabet:
j = word.find(c)
if j != -1:
l.append((c, j))
print l
This gives the list of tuples showing those characters found and the index.
I just saw that you talk about counting the number of letters. I am not sure what you mean by this as len(word) gives the number of characters in each word while len(set(word)) gives the number of unique characters. On the other hand, are you saying that your word might have non-ascii characters in it and you want to count the number of ascii characters in that word? I think that you need to be more specific in what you want to determine.
If what you are doing is attempting to determine if the characters are all alphabetic, then all you need to do is use the isalpha() method on the word. You can either say word.isalpha() and get True or False or check each character of word to be isalpha()

making two strings into one

Let's say I have 2 strings
AAABBBCCCCC
and
AAAABBBBCCCC
to make these strings as similar as possible, given that I can only remove characters I should
delete the last C from the first string
delete the last A and the last B from the second string,
so that they become
AAABBBCCCC
What would be an efficient algorithm to find out which characters to remove from each string?
I'm currently crushing my brain cells thinking about a sollution involving substrings of the strings, looking for them i,n the other string.
Levenshtein distance can calculate how many changes you need to convert one string into another. A small change to the source, and you may get not only distance, but the conversions needed.
How about using difflib?
import difflib
s1 = 'AAABBBCCCCC'
s2 = 'AAAABBBBCCCC'
for difference in difflib.ndiff(s1, s2):
print difference,
if difference[0] == '+':
print 'remove this char from s2'
elif difference[0] == '-':
print 'remove this char from s1'
else:
print 'no change here'
This will print out the differences between the two strings that you can then use to remove the differences. Here is the output:
A no change here
A no change here
A no change here
+ A remove this char from s2
+ B remove this char from s2
B no change here
B no change here
B no change here
C no change here
C no change here
C no change here
C no change here
- C remove this char from s1
Don't know if it's the fastest, but as code goes, it is at least short:
import difflib
''.join([c[-1] for c in difflib.Differ().compare('AAABBBCCCCC','AAAABBBBCCCC') if c[0] == ' '])
I think regular expression can do this.It's a string distance problem.
I mean. Let's have two string:
str1 = 'abc'
str2 = 'aabbcc'
first, I choose the short, and construct a regular expression like is:
regex = '(\w*)'+'(\w*)'.join(list(str1))+'(\w*)'
Then, we can search:
matches = re.search(regex,str2)
I use round brackets to group the section I am interested.
these groups of matches.group() is the distance of two strings.Next, I can figure out what characters should be removed.
It's my idea, I hope it can help you.

Categories