I'm currently automating a browsergame. In the game you can upgrade things and the time how long it takes to upgrade is given me in a string like so: 2d 4h 20m 19s
I want to compare different upgrade times, so I'd like to get the time into seconds so its easier to compare.
My Idea was to look what time is given, then get the index of that letter, look for the numbers in front of the letter, but i think thats a bit too much line of code, espacially if i have to do it more than one time.
My idea would have been sth like that:
if "d" in string:
a = string.index("d")
if a == 2:
b = string[a-2] * 10 + string[a-1]
seconds = b * 86400
You can split the string up with .split() giving you a list: ['2d', '4h', '20m', '19s']
Now we can do each part separately.
We can also use a conversion dictionary to give us what number to use depending on the suffix:
mod = {"d": 60*60*24, "h": 60*60, "m": 60, "s": 1}
Then we just sum the list, multiplying the number of each with the mod from above:
sum(int(value[:-1]) * mod[value[-1]] for value in ds.split())
This is equivalent to:
total = 0
for value in ds.split():
number = int(value[:-1]) # strip off the units and convert the number to an int
unit = value[-1] # take the last character
total += number * mod[unit]
where ds is the date string input.
Below I've tried to split the process up into the basic steps:
In [1]: timestring = '2d 4h 20m 19s'
Out[1]: '2d 4h 20m 19s'
In [2]: items = timestring.split()
Out[2]: ['2d', '4h', '20m', '19s']
In [3]: splititems = [(int(i[:-1]), i[-1]) for i in items]
Out[3]: [(2, 'd'), (4, 'h'), (20, 'm'), (19, 's')]
In [4]: factors = {'h': 3600, 'm': 60, 's': 1, 'd': 24*3600}
Out[4]: {'h': 3600, 'm': 60, 's': 1, 'd': 86400}
In [5]: sum(a*factors[b] for a, b in splititems)
Out[5]: 188419
Like every code, this has some basic assumptions:
Different units are separated by whitespace.
The unit is only one code-point (character).
Allowed units are days, hours, minutes and seconds.
Numbers are integers.
There's a helpful total_seconds() method on timedeltas.
Given
import datetime as dt
names = ["weeks", "days", "minutes", "hours", "seconds"]
s = "2d 4h 20m 19s"
Code
Make a dict of remapped name-value pairs and pass to timedelta:
remap = {n[0]: n for n in names}
name_time = {remap[x[-1]]: int(x[:-1]) for x in s.split()}
td = dt.timedelta(**name_time)
td.total_seconds()
# 188419.0
Related
Implement the function most_popular_character(my_string), which gets the string argument my_string and returns its most frequent letter. In case of a tie, break it by returning the letter of smaller ASCII value.
Note that lowercase and uppercase letters are considered different (e.g., ‘A’ < ‘a’). You may assume my_string consists of English letters only, and is not empty.
Example 1: >>> most_popular_character("HelloWorld") >>> 'l'
Example 2: >>> most_popular_character("gggcccbb") >>> 'c'
Explanation: cee and gee appear three times each (and bee twice), but cee precedes gee lexicographically.
Hints (you may ignore these):
Build a dictionary mapping letters to their frequency;
Find the largest frequency;
Find the smallest letter having that frequency.
def most_popular_character(my_string):
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] = 1
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
sorted_chars = sorted(char_count) # sort the dictionary
char_count = char_count.keys() # place the dictionary in a list
max_per = 0
for i in range(len(sorted_chars) - 1):
if sorted_chars[i] >= sorted_chars[i+1]:
max_per = sorted_chars[i]
break
return max_per
my function returns 0 right now, and I think the problem is in the last for loop and if statement - but I can't figure out what the problem is..
If you have any suggestions on how to adjust the code it would be very appreciated!
Your dictionary didn't get off to a good start by you forgetting to add 1 to the character count, instead you are resetting to 1 each time.
Have a look here to get the gist of getting the maximum value from a dict: https://datagy.io/python-get-dictionary-key-with-max-value/
def most_popular_character(my_string):
# NOTE: you might want to convert the entire sting to upper or lower case, first, depending on the use
# e.g. my_string = my_string.lower()
char_count = {} # define dictionary
for c in my_string:
if c in char_count: #if c is in the dictionary:
char_count[c] += 1 # add 1 to it
else: # if c isn't in the dictionary - create it and put 1
char_count[c] = 1
# Never under estimate the power of print in debugging
print(char_count)
# max(char_count.values()) will give the highest value
# But there may be more than 1 item with the highest count, so get them all
max_keys = [key for key, value in char_count.items() if value == max(char_count.values())]
# Choose the lowest by sorting them and pick the first item
low_item = sorted(max_keys)[0]
return low_item, max(char_count.values())
print(most_popular_character("HelloWorld"))
print(most_popular_character("gggcccbb"))
print(most_popular_character("gggHHHAAAAaaaccccbb 12 3"))
Result:
{'H': 1, 'e': 1, 'l': 3, 'o': 2, 'W': 1, 'r': 1, 'd': 1}
('l', 3)
{'g': 3, 'c': 3, 'b': 2}
('c', 3)
{'g': 3, 'H': 3, 'A': 4, 'a': 3, 'c': 4, 'b': 2, ' ': 2, '1': 1, '2': 1, '3': 1}
('A', 4)
So: l and 3, c and 3, A and 4
def most_popular_character(my_string):
history_l = [l for l in my_string] #each letter in string
char_dict = {} #creating dict
for item in history_l: #for each letter in string
char_dict[item] = history_l.count(item)
return [max(char_dict.values()),min(char_dict.values())]
I didn't understand the last part of minimum frequency, so I make this function return a maximum frequency and a minimum frequency as a list!
Use a Counter to count the characters, and use the max function to select the "biggest" character according to your two criteria.
>>> from collections import Counter
>>> def most_popular_character(my_string):
... chars = Counter(my_string)
... return max(chars, key=lambda c: (chars[c], -ord(c)))
...
>>> most_popular_character("HelloWorld")
'l'
>>> most_popular_character("gggcccbb")
'c'
Note that using max is more efficient than sorting the entire dictionary, because it only needs to iterate over the dictionary once and find the single largest item, as opposed to sorting every item relative to every other item.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 months ago.
Improve this question
How would I do this?
Write a Python program that meets the following requirements:
It prompts the user for six alphanumeric characters (A-Z, 0-9) separated by spaces.
It sorts the user input in ascending order, letters first, then numbers.
It prints the list of sorted characters to the screen (separated by spaces).
It is well commented.
Example:
If the program's input is 8 G J 4 5 D, the output would be D G J 4 5 8
I wrote a program, but when inputing data that had only numbers, it would give me an error. Any help would be appreciated.
Sort with a key so that decimal characters go before letters:
>>> s = "8 G J 4 5 D"
>>> print(*sorted(s.split(), key=lambda c: (c.isdecimal(), c)))
D G J 4 5 8
If you want your code to be almost two times faster, you can use a lookup table, I've used a one liner for this, but it can also be split up. I'm printing this to provide insight in what this object looks like. For the numbers I can make a list directly, for the letters I'll use a more compact notation.
decode = {character: index for index, character in enumerate([chr(i) for i in range(ord("A"), ord("Z") + 1)] + ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"])}
print(decode)
s = "8 G J 4 5 D"
print(*sorted(s.split(), key=lambda c: (c.isdecimal(), c)))
print(*sorted(s.split(), key = lambda c: decode[c]))
from timeit import repeat
loops = 500_000
count = 1
print(loops * min(repeat("sorted(s.split(), key=lambda c: (c.isdecimal(), c))", globals=globals(), repeat=loops, number=count)))
print(loops * min(repeat("sorted(s.split(), key = lambda c: decode[c])", globals=globals(), repeat=loops, number=count)))
Output:
{'A': 0, 'B': 1, 'C': 2, 'D': 3, 'E': 4, 'F': 5, 'G': 6, 'H': 7, 'I': 8, 'J': 9, 'K': 10, 'L': 11, 'M': 12, 'N': 13, 'O': 14, 'P': 15, 'Q': 16, 'R': 17, 'S': 18, 'T': 19, 'U': 20, 'V': 21, 'W': 22, 'X': 23, 'Y': 24, 'Z': 25, '0': 26, '1': 27, '2': 28, '3': 29, '4': 30, '5': 31, '6': 32, '7': 33, '8': 34, '9': 35}
D G J 4 5 8
D G J 4 5 8
3.6460114642977715
2.3174798116087914
More ideas?
# NOTE: This code is definitely not the most efficient.
import re # Import the regex standard library to check the input later
input_ = input("Enter six alphanumeric characters, separated by spaces: ") # Get the input
# Check if the input is six alphanumeric characters, separated by spaces.
# [A-Z0-9] in regex matches any character in the A-Z (capitalized) or 0-9 range
# ^ in regex matches the BEGINNING of the string
# $ in regex matches the END of the string
if not re.match(r"^[A-Z0-9] [A-Z0-9] [A-Z0-9] [A-Z0-9] [A-Z0-9] [A-Z0-9]$", input_): # If the string does not match the pattern (that specifies for the aforesaid criteria)
# Exit the program, printing "Invalid input"
print("Invalid input")
exit()
numbers_sorted = [] # Create a list that will later be populated with all the numbers in the input, sorted.
letters_sorted = [] # Create a list that will later be populated with all the Letters in the input, sorted.
# Loop through each character in the input.
# .split() splits the input into a list at every space (e.g. "a b c d e f".split() becomes ["a", "b", "c", "d", "e", "f"])
for character in input_.split():
if character.isalpha(): # If the character is alphabetic
# Sort the character into the `letters_sorted` list
# Loop through the length of the list, such that the code in the loop is executed n times,
# where n is the length of `letters_sorted`,
# and the `index` variable starts at 0, and increases per iteration.
for index in range(len(letters_sorted)):
# ord() returns the 'character code' of a character (e.g. ord('a') returns 97, ord('b') returns 98)
# If the `character` (from the outer for loop) is alphabetically preceeding the
# character at position `index` of the `letters_sorted` list,
if ord(letters_sorted[index]) > ord(character):
letters_sorted.insert(index, character) # Insert the `character` (from the outer loop) right before `index` of `letters_sorted`
break # Break from the loop as the character has been sorted into the list
else:
# If the character has not been sorted into the list
# (if the character alphabetically succeeds every other character currently in `letters_sorted`)
letters_sorted.append(character) # Append the character to the very end of the list
else: # Otherwise (in this case, if the character is numeric)
# Sort the character into the `numbers_sorted` list
# See the comments above for sorting alphabetic characters
# The only difference is not using the ord() function as we can directly compare numbers using less-than or greater-than
# (also, we are using the `numbers_sorted` list now)
for index in range(len(numbers_sorted)):
if numbers_sorted[index] > character:
numbers_sorted.insert(index, character)
break
else:
numbers_sorted.append(character)
# Now, the lists are 'in order'.
# Finally, combine the lists to achieve a final list that contains letters sorted, then numbers sorted.
final_list = letters_sorted + numbers_sorted
# (Very) finally, convert the list to a string, separating each list entry by a space character:
final_string = " ".join(final_list)
# (Very very) finally, print the string:
print(final_string)
EDIT: Please any of the other answers above, all of them are much more concise than this one
I would like to get common elements in two given strings such that duplicates will be taken care of. It means that if a letter occurs 3 times in the first string and 2 times in the second one, then in the common string it has to occur 2 times. The length of the two strings may be different. eg
s1 = 'aebcdee'
s2 = 'aaeedfskm'
common = 'aeed'
I can not use the intersection between two sets. What would be the easiest way to find the result 'common' ? Thanks.
Well there are multiple ways in which you can get the desired result. For me the simplest algorithm to get the answer would be:
Define an empty dict. Like d = {}
Iterate through each character of the first string:
if the character is not present in the dictionary, add the character to the dictionary.
else increment the count of character in the dictionary.
Create a variable as common = ""
Iterate through the second string characters, if the count of that character in the dictionary above is greater than 0: decrement its value and add this character to common
Do whatever you want to do with the common
The complete code for this problem:
s1 = 'aebcdee'
s2 = 'aaeedfskm'
d = {}
for c in s1:
if c in d:
d[c] += 1
else:
d[c] = 1
common = ""
for c in s2:
if c in d and d[c] > 0:
common += c
d[c] -= 1
print(common)
You can use two arrays (length 26).
One array is for the 1st string and 2nd array is for the second string.
Initialize both the arrays to 0.
The 1st array's 0th index denotes the number of "a" in 1st string,
1st index denotes number of "b" in 1st string, similarly till - 25th index denotes number of "z" in 1st string.
Similarly, you can create an array for the second string and store the count of
each alphabet in their corresponding index.
s1 = 'aebcdee'
s2 = 'aaeedfs'
Below is the array example for the above s1 and s2 values
Now you can run through the 1st String
s1 = 'aebcdee'
for each alphabet find the
K = minimum of ( [ count(alphabet) in Array 1 ], [ count(alphabet) in Array 2 ] )
and print that alphabet K times.
then make that alphabet count to 0 in both the arrays. (Because if you dint make it zero, then our algo might print the same alphabet again if it comes in the future).
Complexity - O( length(S1) )
Note - You can also run through the string having a minimum length to reduce the complexity.
In that case Complexity - O( minimum [ length(S1), length(S2) ] )
Please let me know if you want the implementation of this.
you can use collection.Counter and count each char in two string and if each char exist in two string using min of list and create a new string by join of them.
from collections import Counter, defaultdict
from itertools import zip_longest
s1 = 'aebcdee'
s2 = 'aaeedfskm'
# Create a dictionary the value is 'list' and can append char in each 'list'
res = defaultdict(list)
# get count of each char
cnt1 = Counter(s1) # -> {'e': 3, 'a': 1, 'b': 1, 'c': 1, 'd': 1}
cnt2 = Counter(s2) # -> {'a': 2, 'e': 2, 'd': 1, 'f': 1, 's': 1, 'k': 1, 'm': 1}
# for appending chars in one step, we can zip count of chars in two strings,
# so Because maybe two string have different length, we can use 'itertools. zip_longest'
for a,b in zip_longest(cnt1 , cnt2):
# list(zip_longest(cnt1 , cnt2)) -> [('a', 'a'), ('e', 'e'), ('b', 'd'),
# ('c', 'f'), ('d', 's'), (None, 'k'),
# (None, 'm')]
# Because maybe we have 'none', before 'append' we need to check 'a' and 'b' don't be 'none'
if a: res[a].append(cnt1[a])
if b: res[b].append(cnt2[b])
# res -> {'a': [1, 2], 'e': [3, 2], 'b': [1], 'd': [1, 1], 'c': [1], 'f': [1], 's': [1], 'k': [1], 'm': [1]}
# If the length 'list' of each char is larger than one so this char is duplicated and we repeat this char in the result base min of each char in the 'list' of count char of two strings.
out = ''.join(k* min(v) for k,v in res.items() if len(v)>1)
print(out)
# aeed
We can use this approach for multiple string, like three strings.
s1 = 'aebcdee'
s2 = 'aaeedfskm'
s3 = 'aaeeezzxx'
res = defaultdict(list)
cnt1 = Counter(s1)
cnt2 = Counter(s2)
cnt3 = Counter(s3)
for a,b,c in zip_longest(cnt1 , cnt2, cnt3):
if a: res[a].append(cnt1[a])
if b: res[b].append(cnt2[b])
if c: res[c].append(cnt3[c])
out = ''.join(k* min(v) for k,v in res.items() if len(v)>1)
print(out)
# aeed
s1="ckglter"
s2="ancjkle"
final_list=[]
if(len(s1)<len(s2)):
for i in s1:
if(i in s2):
final_list.append(i)
else:
for i in s2:
if(i in s1):
final_list.append(i)
print(final_list)
you can also do it like this also, just iterate through both the string using for loop and append the common character into the empty list
i have few string items which are the numbers with billion or million abbreviation in a list:
list = ["150M", "360M", "2.6B", "3.7B"]
I would like to use a syntax that could convert those string items into integers counted in thousands (e.g 150M > 150,000, 3.7B> 3,700,000 ), thanks
you should really show some attempt to try solve the problem yourself, but here is a simple example:
multipliers = {'K':1000, 'M':1000000, 'B':1000000000}
def string_to_int(string):
if string[-1].isdigit(): # check if no suffix
return int(string)
mult = multipliers[string[-1]] # look up suffix to get multiplier
# convert number to float, multiply by multiplier, then make int
return int(float(string[:-1]) * mult)
testvals = ["150M", "360M", "2.6B", "3.7B"]
print(list(map(string_to_int, testvals)))
You can use list comprehension with a dict mapping:
l = ["150M", "360M", "2.6B", "3.7B"]
m = {'K': 3, 'M': 6, 'B': 9, 'T': 12}
print([int(float(i[:-1]) * 10 ** m[i[-1]] / 1000) for i in l])
This outputs:
[150000, 360000, 2600000, 3700000]
Another solution, using re.sub:
import re
lst = ["150M", "360M", "2.6B", "3.7B"]
tbl = {'K':1, 'M':1_000, 'B':1_000_000}
new_lst = [int(i) for i in (re.sub(r'([\d\.]+)(K|M|B)', lambda v: str(int(float(v.groups()[0]) * tbl[v.groups()[1]])), i) for i in lst)]
print(new_lst)
Prints:
[150000, 360000, 2600000, 3700000]
I have a dictionary with information about single positions: position_info, and info about features feature_info. I have to find in which features (can be multiple) the positions are located, so that I can annotate the positions. What I use now is:
feature_info = [[1, 10, 'a'],[15, 30, 'b'],[40, 60, 'c'],[55, 71, 'd'],[73, 84, 'e']]
position_info = {5:'some info', 16:'some other info', 75:'last info'}
for pos in position_info.keys():
for info in feature_info:
if info[0] <= pos < info[1]:
print(pos, position_info[pos], info[2])
The problem is that feature_info contains 800k+ features, and position_info 150k positions, and this is quite slow. I can optimize it myself a little bit, but probably there are already methods that do it better than I can, but I have not found them.
EDIT
So for example this is one way I can think of to speed it up:
for info in feature_info:
for pos in position_info.keys():
if info[0] <= pos < info[1]:
print(pos, position_info[pos], info[2])
if pos > info[1]:
break
if I order the positions I can break when the position is larger than an end position of a feature (if I make sure those are ordered too). However, there must be a better way to do this.
How can I implement this in the fastest way?
Comparison of the 3 answers
import timeit
setup = """
from bisect import bisect
import pandas as pd
import random
import numpy as np
position_info = {}
random_number = random.sample(range(9000), 8000)
random_feature_start = random.sample(range(90000), 5000)
random_feature_length = np.random.choice(1000, 5000, replace=True)
for i in random_number:
position_info[i] = 'test'
feature_info = []
for index, i in enumerate(random_feature_start):
feature_info.append([i, i+random_feature_length[index],'test'])
"""
p1 = """
sections = sorted(r for a, b, c in feature_info for r in (a,b))
for pos in position_info:
feature_info[int(bisect(sections, pos) / 2)]
"""
p2 = """
# feature info to dataframe
feature_df = pd.DataFrame(feature_info)
# rename feature df columns
feature_df.rename(index=str, columns={0: "start", 1: "end",2:'name'}, inplace=True)
# positions to dataframe
position_df = pd.DataFrame.from_dict(position_info, orient='index')
position_df['key'] = position_df.index
# merge dataframes
feature_df['merge'] = 1
position_df['merge'] = 1
merge_df = feature_df.merge(position_df, on='merge')
merge_df.drop(['merge'], inplace=True, axis=1)
# filter where key between start and end
merge_df = merge_df.loc[(merge_df.key > merge_df.start) & (merge_df.key < merge_df.end)]
"""
p3 = """
feature_df = pd.DataFrame(feature_info)
position_df = pd.DataFrame(position_info, index=[0])
hits = position_df.apply(lambda col: (feature_df [0] <= col.name) & (col.name < feature_df [1])).values.nonzero()
for f, p in zip(*hits):
position_info[position_df.columns[p]]
feature_info[f]
"""
print('bisect:',timeit.timeit(p1, setup=setup, number = 3))
print('panda method 1:',timeit.timeit(p2, setup=setup, number = 3))
print('panda method 2:',timeit.timeit(p3, setup=setup, number = 3))
bisect: 0.08317881799985116
panda method 1: 29.6151025639997
panda method 2: 16.90901438500032
However, the bisect method only works if there are no overlapping features, e.g.
feature_info = [[1, 10, 'a'],[15, 30, 'b'],[40, 60, 'c'],[55, 71, 'd'],[2, 8, 'a_new']]
does not work, which does work with the pandas solution.
The fastest way is probably to use a fast library: pandas. Pandas vectorizes your operations to make them speedy.
feature_df = pd.DataFrame(feature_info)
position_df = pd.DataFrame(position_info, index=[0])
hits = position_df.apply(lambda col: (feature_df[0] <= col.name) & (col.name < feature_df[1])).values.nonzero()
for feature, position in zip(*hits):
print(position_info[position_df.columns[p]], "between", feature_info[f])
The bisect library and function is amazing for things like this.
We basically create a sorted list of ranges that a feature will fall under. Let me know if you need additional logic for checking if a position doesn't fall within a feature range.
Since feature_info[n][0:1] is a range of 2 values, we need to divide the bisect result (which is an index position) by 2.
from bisect import bisect
feature_info = [[1, 10, 'a'],[15, 30, 'b'],[40, 60, 'c'],[55, 71, 'd'],[73, 84, 'e']]
position_info = {5:'some info', 16:'some other info', 75:'last info'}
sections = sorted(r for a, b, c in feature_info for r in (a,b))
for pos in position_info:
print(pos, feature_info[bisect(sections, pos) / 2])
This will print the following (you should be able to get all the info you need from this, but I wanted to show the basic result):
(16, [15, 30, 'b'])
(75, [73, 84, 'e'])
(5, [1, 10, 'a'])
Is a textual description OK?
Preprocessing:
sort the positions
convert list of features into a list of "boundaries" (as in, start indices and end indices for each feature) - these will be triples of (index, start/end, feature). Sort this list by index.
Algorithm (two nested for loops):
start with empty set of 'current features'
for each feature boundary:
for each position from the range of: from the next position after last-visited position, until the position of current boundary's position:
output that this position belongs to each of current features
if the current boundary is a start, add it to the current features
if the current boundary is an end, remove it from the current features
Note that:
the outer for loop will execute exactly once for each boundary,
the inner for loop will execute (in total) exactly once for each position.
This will be fast because you don't need to look at any position or any feature twice in both loops. It will actually approach O(N+M) complexity if the positions don't overlap often (so that the current_features set remains small).
I assumed that there are no duplicate positions; handling these would add a little more complexity but the general approach would still work.
Also using pandas. First converts them to dataframes, then merges, then filters where position info key is between feature info columns.
import pandas as pd
feature_info = [[1, 10, 'a'],[15, 30, 'b'],[40, 60, 'c'],[55, 71, 'd'],[73, 84, 'e']]
position_info = {5:'some info', 16:'some other info', 75:'last info'}
# feature info to dataframe
feature_df = pd.DataFrame(feature_info)
# rename feature df columns
feature_df.rename(index=str, columns={0: "start", 1: "end",2:'name'}, inplace=True)
# positions to dataframe
position_df = pd.DataFrame.from_dict(position_info, orient='index')
position_df['key'] = position_df.index
# merge dataframes
feature_df['merge'] = 1
position_df['merge'] = 1
merge_df = feature_df.merge(position_df, on='merge')
merge_df.drop(['merge'], inplace=True, axis=1)
# filter where key between start and end
merge_df = merge_df.loc[(merge_df.key > merge_df.start) & (merge_df.key < merge_df.end)]