I have a dictionary with name dic in python, where the keys are strings and the corresponding values are integers.
dic ={ 'a' : 2, 'b' : 3, 'c':4, 'd':5 }.
Note that 2+3+4+5=14.
Now I need to impose an order to this dictionary, say the order is from 'a', to 'b', and then to 'c'. Then, I randomly generate 3 numbers from the list of intergers from 1 to 14, say, 2, 9, and 10. Then, these 3 numbers would correspond to the keys 'a', 'c', and 'd'.
All I can think of is to use dic.keys and dic.values to create two corresponding lists and then calculate the presums and do it in a very trivial way.
Is there a default python function to do this?
Maybe you can consider using random.sample like so.
>>> import random
>>> p = 2*['a'] + 3*['b'] + 4*['c'] + 5*['d']
>>> random.sample(p, 3)
['b', 'b', 'a']
From the docs, random.sample returns a k length list of unique elements chosen from the population sequence or set. It is used for random sampling without replacement. Therefore, the largest sample you can get from a population of size 14 is size 14, and a sample of size 14 is guaranteed to be a permutation of p.
Alternatively you can use your method of selecting an integer between 1 and 14 inclusive to make weighted random choices using p like this:
>>> k = random.choice(range(1, 15))
>>> p[k-1]
'b'
or, if you don't need the "index" of the selected element:
>>> random.choice(p)
'c'
However, note that by using random.choice repeatedly, you will be sampling with replacement (unless you have some mechanism of removing selecting elements from the population). This may be what you want though.
To construct your population p dynamically using your dictionary, you can do something like this:
>>> sum((w*[k] for k, w in dic.items()), [])
['d', 'd', 'd', 'd', 'd', 'a', 'a', 'c', 'c', 'c', 'c', 'b', 'b', 'b']
Note that the letters will not necessarily be in order as shown above! But anyways, you can sort them easily using Python's built in sorted function.
>>> sum(sorted(w*[k] for k, w in dic.items()), [])
['a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'd', 'd', 'd', 'd', 'd']
Related
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
The common items are:
c = ['a', 'b', 'a']
The code:
for e in my_list:
if e in my_list_2:
c.append(e)
...
If the my_list is long, this would be very inefficient. If I convert both lists into two sets, then use set's intersection() function to get the common items, I will lose the duplicates in my_list.
How to deal with this efficiently?
dict is already a hashmap, so lookups are practically as efficient as a set, so you may not need to do any extra work collecting the values - if it wasn't, you could pack the values into a set to check before checking the dict
However, a large improvement may be to make a generator for the values, rather than creating a new intermediate list, to iterate over where you actually want the values
def foo(src_dict, check_list):
for value in check_list:
if value in my_dict:
yield value
With the edit, you may find you're better off packing all the inputs into a set
def foo(src_list, check_list):
hashmap = set(src_list)
for value in check_list:
if value in hashmap:
yield value
If you know a lot about the inputs, you can do better, but that's an unusual case (for example if the lists are ordered you could bisect, or if you have a huge verifying list or very very few values to check against it you may find some efficiency in the ordering and if you make a set)
I am not sure about time efficiency, but, personally speaking, list comprehension would always be more of interest to me:
[x for x in my_list if x in my_list_2]
Output
['a', 'b', 'a']
First, utilize the set.intersection() method to get the intersecting values in the list. Then, use a nested list comprehension to include the duplicates based on the original list's count on each value:
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = [x for x in set(my_list).intersection(set(my_list_2)) for _ in range(my_list.count(x))]
print(c)
The above may be slower than just
my_list = ['a', 'b', 'a', 'd', 'e', 'f']
my_list_2 = ['a', 'b', 'c']
c = []
for e in my_list:
if e in my_list_2:
c.append(e)
print(c)
But when the lists are significantly larger, the code block utilizing the set.intersection() method will be significantly more efficient (faster).
sorry for not reading the post carefully and now it is not possible to delete.. however, it is an attempt for solution.
c = lambda my_list, my_list_2: (my_list, my_list_2, list(set(my_list).intersection(set(my_list_2))))
print("(list_1,list_2,duplicate_items) -", c(my_list, my_list_2))
Output:
(list_1,list_2,duplicate_items) -> (['a', 'b', 'a', 'd', 'e', 'f'], ['a', 'b', 'c'], ['b', 'a'])
or can be
[i for i in my_list if i in my_list_2]
output:
['a', 'b', 'a']
I have a list lst and I want to iteratively transform each element in the list using a function f:
lst = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
def f(x):
# do something
return x
I can iteratively apply this transformation over the list:
new_lst = [f(x) for x in lst]
So far so good. Now we want to introduce sequencing for processing elements in the lst. The dict ensure_transformed describes a list of elements that need to be transformed before processing a given element:
ensure_transformed = {'a' : ['b', 'e'],
'b' : ['e', 'f'],
'c' : ['a', 'd'],
'd' : [],
'e' : [],
'f' : []
}
The interpretation is - Before transforming a, transform b and e, before transforming b, transform e and f and so on.
(The meaning of 'd' : [] is that no elements need to be processed before d. g is absent from ensure_transformed implying g shouldn't be transformed.)
How can I transform lst using function f while ensuring sequencing from ensure_transformed?
Options on the table:
Use recursion to go through list while keeping track of elements that have already been transformed. This approach has become messy to manage.
Re-order the list using the dict and then iterate over the list. I haven't tried this approach yet, but it seems promising.
I'm open to other approaches.
As mentioned by #CristianCiupitu in the comments, this is now very easy to do with Python 3.9 by using the graphlib module:
import graphlib
lst = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
def f(x): return x.upper() #for example
ensure_transformed = {'a' : ['b', 'e'],
'b' : ['e', 'f'],
'c' : ['a', 'd'],
'd' : [],
'e' : [],
'f' : []
}
ts = graphlib.TopologicalSorter(ensure_transformed)
processed = [f(x) for x in ts.static_order()]
print(processed)
#prints ['E', 'F', 'D', 'B', 'A', 'C']
The ensure_transformed dictionary describes a directed graph where {'a': ['b', 'e']} means that there is an edge b → a and an edge e → a.
Assuming that the graph has no cycles, your task is equivalent to applying topological sorting to it, for which you can find various algorithms.
I'm trying to create a list of lists from a single list. I'm able to do this if the new list of lists have the same number of elements, however this will not always be the case
As said earlier, the function below works when the list of lists have the same number of elements.
I've tried using regular expressions to determine if an element matches a pattern using
pattern2=re.compile(r'\d\d\d\d\d\d') because the first value on my new list of lists will always be 6 digits and it will be the only one that follows that format. However, i'm not sure of the syntax of getting it to stop at the next match and create another list
def chunks(l,n):
for i in range(0,len(l),n):
yield l[i:i+n]
The code above works if the list of lists will contain the same number of elements
Below is what I expect.
OldList=[111111,a,b,c,d,222222,a,b,c,333333,a,d,e,f]
DesiredList=[[111111,a,b,c,d],[222222,a,b,c],[333333,a,d,e,f]]
Many thanks indeed.
Cheers
Likely a much more efficient way to do this (with fewer loops), but here is one approach that finds the indexes of the breakpoints and then slices the list from index to index appending None to the end of the indexes list to capture the remaining items. If your 6 digit numbers are really strings, then you could eliminate the str() inside re.match().
import re
d = [111111,'a','b','c','d',222222,'a','b','c',333333,'a','d','e','f']
indexes = [i for i, x in enumerate(d) if re.match(r'\d{6}', str(x))]
groups = [d[s:e] for s, e in zip(indexes, indexes[1:] + [None])]
print(groups)
# [[111111, 'a', 'b', 'c', 'd'], [222222, 'a', 'b', 'c'], [333333, 'a', 'd', 'e', 'f']]
You can use a fold.
First, define a function to locate the start flag:
>>> def is_start_flag(v):
... return len(v) == 6 and v.isdigit()
That will be useful if the flags are not exactly what you expected them to be, or to exclude some false positives, or even if you need a regex.
Then use functools.reduce:
>>> L = d = ['111111', 'a', 'b', 'c', 'd', '222222', 'a', 'b', 'c', '333333', 'a', 'd', 'e', 'f']
>>> import functools
>>> functools.reduce(lambda acc, x: acc+[[x]] if is_start_flag(x) else acc[:-1]+[acc[-1]+[x]], L, [])
[['111111', 'a', 'b', 'c', 'd'], ['222222', 'a', 'b', 'c'], ['333333', 'a', 'd', 'e', 'f']]
If the next element x is the start flag, then append a new list [x] to the accumulator. Else, add the element to the current list, ie the last list of the accumulator.
I have a nested list of around 1 million records like:
l = [['a', 'b', 'c', ...], ['d', 'b', 'e', ...], ['f', 'z', 'g', ...],...]
I want to get the distinct values of inner lists on second index, so that my resultant list be like:
resultant = ['b', 'z', ...]
I have tried nested loops but its not fast, any help will be appreciated!
Since you want the unique items you can use collections.OrderedDict.fromkeys() in order to keep the order and unique items (because of using hashtable fro keys) and use zip() to get the second items.
from collections import OrderedDict
list(OrderedDict.fromkeys(zip(my_lists)[2]))
In python 3.x since zip() returns an iterator you can do this:
colls = zip(my_lists)
next(colls)
list(OrderedDict.fromkeys(next(colls)))
Or use a generator expression within dict.formkeys():
list(OrderedDict.fromkeys(i[1] for i in my_lists))
Demo:
>>> lst = [['a', 'b', 'c'], ['d', 'b', 'e'], ['f', 'z', 'g']]
>>>
>>> list(OrderedDict().fromkeys(sub[1] for sub in lst))
['b', 'z']
You can unzip the list of lists then choice the second tuple with set like below :
This code take 4.05311584473e-06 millseconds, in my laptop
list(set(zip(*lst)[1]))
Input :
lst = [['a', 'b', 'c'], ['d', 'b', 'e'], ['f', 'z', 'g']]
Out put :
['b', 'z']
Would that work for you?
result = set([inner_list[1] for inner_list in l])
I can think of two options.
Set comprehension:
res = {x[1] for x in l}
I think numpy arrays work faster than list/set comprehensions, so converting this list to an array and then using array functions can be faster. Here:
import numpy as np
res = np.unique(np.array(l)[:, 1])
Let me explain: np.array(l) converts the list to a 2d array, then [:, 1] take the second column (starting to count from 0) which consists of the second item of each sublist in the original l, and finally taking only unique values using np.unique.
I'm new to programming in general, so looking to really expand my skills here. I'm trying to write a script that will grab a list of strings from an object, then order them based on a template of my design. Any items not in the template will be added to the end.
Here's how I'm doing it now, but could someone suggest a better/more efficient way?
originalList = ['b', 'a', 'c', 'z', 'd']
listTemplate = ['a', 'b', 'c', 'd']
listFinal = []
for thing in listTemplate:
if thing in originalList:
listFinal.append(thing)
originalList.pop(originalList.index(thing))
for thing in originalList:
listFinal.append(thing)
originalList.pop(originalList.index(thing))
Try this:
originalList = ['b', 'a', 'c', 'z', 'd']
listTemplate = ['a', 'b', 'c', 'd']
order = { element:index for index, element in enumerate(listTemplate) }
sorted(originalList, key=lambda element: order.get(element, float('+inf')))
=> ['a', 'b', 'c', 'd', 'z']
This is how it works:
First, we build a dictionary indicating, for each element in listTemplate, its relative order with respect to the others. For example a is 0, b is 1 and so on
Then we sort originalList. If one of its elements is present in the order dictionary, then use its relative position for ordering. If it's not present, return a positive infinite value - this will guarantee that the elements not in listTemplate will end up at the end, with no further ordering among them.
The solution in the question, although correct, is not very pythonic. In particular, whenever you have to build a new list, try to use a list comprehension instead of explicit looping/appending. And it's not a good practice to "destroy" the input list (using pop() in this case).
You can create a dict using the listTemplate list, that way the expensive(O(N)) list.index operations can be reduced to O(1) lookups.
>>> lis1 = ['b', 'a', 'c', 'z', 'd']
>>> lis2 = ['a', 'b', 'c', 'd']
Use enumerate to create a dict with the items as keys(Considering that the items are hashable) and index as values.
>>> dic = { x:i for i,x in enumerate(lis2) }
Now dic looks like:
{'a': 0, 'c': 2, 'b': 1, 'd': 3}
Now for each item in lis1 we need to check it's index in dic, if the key is not found we return float('inf').
Function used as key:
def get_index(key):
return dic.get(key, float('inf'))
Now sort the list:
>>> lis1.sort(key=get_index)
>>> lis1
['a', 'b', 'c', 'd', 'z']
For the final step, you can just use:
listFinal += originalList
and it will add these items to the end.
There is no need to create a new dictionary at all:
>>> len_lis1=len(lis1)
>>> lis1.sort(key = lambda x: lis2.index(x) if x in lis2 else len_lis1)
>>> lis1
['a', 'b', 'c', 'd', 'z']
Here is a way that has better computational complexity:
# add all elements of originalList not found in listTemplate to the back of listTemplate
s = set(listTemplate)
listTemplate.extend(el for el in originalList if el not in s)
# now sort
rank = {el:index for index,el in enumerate(listTemplate)}
listFinal = sorted(originalList, key=rank.get)