I want to generate a large number of key value pairs to put in my dictionary using a for loop. For example, the dictionary looks like this:
my_dict = dict()
my_dict["r0"] = "tag 0"
my_dict["r1"] = "tag 1"
my_dict["r2"] = "tag 2"
...
Note that both the key and value follows a pattern, i.e., the number increase by 1. Now I cannot do this 1M times and would prefer an automatic way to initialize my dictionary.
The most efficient way to do this is probably with a dict comprehension:
mydict={'r%s'%n : 'tag %s'%n for n in range(10)}
Which is equivalent to:
mydict=dict()
for n in range(10):
mydict.update({'r%s'%n:'tag %s'%n})
... but more efficient. Just change range(10) as necessary.
You could also use .format() formatting instead of percent (C-like) formatting in the dict:
mydict={'r{}'.format(n) : 'tag {}'.format(n) for n in range(10)}
If you are using Python2 replace all the range() functions with xrange() functions
my_dict = dict()
for i in range(0, 1000000):
key = "r{}".format(i)
value = "tag {}".format(i)
my_dict[key] = value
EDIT: As pointed out by others, if you are using python 2 use xrange instead since it is lazy (so more efficient). In Python 3 range does the same thing as xrange in python 2
my_dict = dict()
for i in xrange(1000000):
my_dict["r%s" % i] = "tag %s" % i
my_dict = dict()
for x in range(1000000):
key="r"+str(x)
val="tag " +str(x)
my_dict[key]=val
simple way is to do the following
#using python format strings
keyf = "r{}"
valf = "tag {}"
#dictionary comprehension
a = {keyf.format(i) : valf.format(i) for i in range(5)}
# can modify range to handle 1,000,000 if you wanted
print(a)
{'r0': 'tag 0', 'r1': 'tag 1', 'r2': 'tag 2', 'r3': 'tag 3', 'r4': 'tag 4', 'r5': 'tag 5'}
if you wanted to quickly append this to another dictionary you would use the dictionary equivalent of extend, which is called update.
b = dict{"x":1,"y":2}
b.update(a)
print(b)
{'x': 1, 'y': 2, 'r0': 'tag 0', 'r1': 'tag 1', 'r2': 'tag 2', 'r3': 'tag 3', 'r4': 'tag 4'}
you could also shorten the original comprehension by doing this:
a = {"r{}".format(i) : "tag {}".format(i) for i in range(5)}
You wouldn't even need to make keyf, or valf
Python can build dicts from lists:
$ python2 -c "print dict(map(lambda x: ('r' + str(x), 'tag ' + str(x)), range(10)))"
{'r4': 'tag 4', 'r5': 'tag 5', 'r6': 'tag 6', 'r7': 'tag 7', 'r0': 'tag 0', 'r1': 'tag 1', 'r2': 'tag 2', 'r3': 'tag 3', 'r8': 'tag 8', 'r9': 'tag 9'}
Related
I have developed a function to strip a dataset using polars. Now I want to check with a test if the strip was successful. For this I want to use the following logic. But this code is in python. How can I solve this using polars?
def test_strip():
df = pd.DataFrame({
'ID': [1, 1, 1, 1, 1],
'Entity': ['Entity 1 ', 'Entity 2', 'Entity 3', 'Entity 4', 'Entity 5'],
'Table': ['Table 1', ' Table 2', 'Table 3', 'Table 4', None],
'Local': ['Local 1', 'Local 2 ', None, 'Local 4', 'Local 5'],
'Global': ['Global 1', ' Global 2', 'Global 3', None, ' Global 5'],
'mandatory': ['M', 'M', 'M', 'CM ', 'M']
})
job = first_job(
config=test_config,
copying_list=copying,
)
result = job.run(df)
df_clean, *_ = result
for column in df_clean.columns:
for value in df_clean[column]:
if isinstance(value, str) and (value.startswith(" ") or value.endswith(" ")):
raise AssertionError(f"Strip failed for column '{column}'")
This should do it...
def test_strip(df):
bad_rows=df.filter(
pl.any([pl.col(x).str.contains("(^ )|( $)") for x in df.columns])
)
if bad_rows.shape[0]==0:
return("all good")
else:
str_cols=', '.join(bad_rows.melt().filter(pl.col('value').str.contains("(^ )|( $)")).get_column('variable').unique().to_list())
raise AssertionError(f"Strip failed for column(s): {str_cols}")
The meat and potatoes is the bad_rows assignment. It combines a list comprehension that uses a regex with the beginning of string anchor and the end of string anchor. That is wrapped in pl.any so that any column can trigger it. If the shape is 0 that means everything worked and it returns a message stating as much. Otherwise it'll raise the error and tell you which columns were bad.
This is my method. I am having trouble with returning the entire dictionary
def get_col(amount):
letter = 0
value = []
values = {}
for i in range(amount):
letter = get_column_letter(i + 1)
[value.append(row.value) for row in ws[letter]]
values = dict(zip(letter, [value]))
value = []
return values
I want it to output it like this:
{'A': ['ID', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
{'B': ['Name', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
{'C': ['Math', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
But when the return is onside the 'for' it only returns
{'A': ['ID', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
and when the return is outside the 'for' loop, it returns
{'C': ['Math', 'value is 1', 'value is 2', 'value is 3', 'value is 4', 'value is 5', 'value is 6']}
Any help would be appreciated. Thank you!
I am assuming you want all of the data in one dictionary:
values = dict(zip(letter, [value]))
Currently this part of your code overites the dictionary everytime. It is why you get the "A" dict with returning before the for loop finishes, and why after the loop finishes when return the dict is only the "C" dict as the "A" and "B" were overwriten.
Put the return outside the for loop afterwards, and instead of
values = dict(zip(letter, [value]))
use
values[letter] = value
as this will append more keys/values to the dict.
ps. This is my first post, I hope it helps and is understandable.
edit: If you are wanting a list of three dictionaries like your desired output shows do this:
def get_col(amount):
letter = 0
value = []
values = []
for i in range(amount):
letter = get_column_letter(i + 1)
[value.append(row.value) for row in ws[letter]]
values.append(dict(zip(letter, [value])))
value = []
return values
Your desired output is not a single dictionary. It's a list of dictionaries.
In the for loop, at each iteration you are creating a new dictionary. When you return, you either return the first one you create or the last one if you put the return inside or outside respectevely.
You need to return a list of the created dictionaries
def get_col(amount):
letter = 0
value = []
values = {}
values_list = []
for i in range(amount):
letter = get_column_letter(i + 1)
[value.append(row.value) for row in ws[letter]]
values = dict(zip(letter, [value]))
value = []
values_list.append(values)
return values_list
I'm trying to add values from List2 if the type is the same in List1. All the data is strings within lists. This isn't the exact data I'm using, just a representation. This is my first programme so please excuse any misunderstandings.
List1 = [['Type A =', 'Value 1', 'Value 2', 'Value 3'], ['Type B =', 'Value 4', 'Value 5']]
List2 = [['Type Z =', 'Value 6', 'Value 7', 'Value 8'], ['Type A =', 'Value 9', 'Value 10', 'Value 11'], ['Type A =', 'Value 12', 'Value 13']]
Desired result:
new_list =[['Type A =', 'Value 1', 'Value 2', 'Value 3', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 13'], ['Type B =', 'Value 4', 'Value 5']]
Current attempt:
newlist = []
for values in List1:
for valuestoadd in List2:
if values[0] == valuestoadd[0]:
newlist = [List1 + [valuestoadd[1:]]]
else:
print("Types don't match")
return newlist
This works for me if there weren't two Type A's in List2 as this causes my code to create two instances of List1. If I was able to add the values at a specific index of the list then that would be great but I can work around that.
It's probably easier to use a dictionary for this:
def merge(d1, d2):
return {k: v + d2[k] if k in d2 else v for k, v in d1.items()}
d1 = {'A': [1, 2, 3], 'B': [4, 5, 6]}
d2 = {'A': [7, 8, 9], 'C': [0]}
print(merge(d1, d2))
If you must use a list, it's fairly easy to temporarily convert to a dictionary and back to a list:
from collections import defaultdict
def list_to_dict(xss):
d = defaultdict(list)
for xs in xss:
d[xs[0]].extend(xs[1:])
return d
def dict_to_list(d):
return [[k, *v] for k, v in d.items()]
Rather than using List1 + [valuestoadd[1:]], you should be using newlist[0].append(valuestoadd[1:]) so that it doesn't ever create a new list and only appends to the old one. The [0] is necessary so that it appends to the first sublist rather than the whole list.
newlist = List1 #you're doing this already - might as well initialize the new list with this code
for values in List1:
for valuestoadd in List2:
if values[0] == valuestoadd[0]:
newlist[0].append(valuestoadd[1:]) #adds the values on to the end of the first list
else:
print("Types don't match")
Output:
[['Type A =', 'Value 1', 'Value 2', 'Value 3', ['Value 9', 'Value 10', 'Value 11'], ['Value 12', 'Value 13']], ['Type B =', 'Value 4', 'Value 5']]
This does, sadly, input the values as a list - if you want to split them into individual values, you would need to iterate through the lists you're adding on, and append individual values to newlist[0].
This could be achieved with another for loop, like so:
if values[0] == valuestoadd[0]:
for subvalues in valuestoadd[1:]: #splits the list into subvalues
newlist[0].append(subvalues) #appends those subvalues
Output:
[['Type A =', 'Value 1', 'Value 2', 'Value 3', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 13'], ['Type B =', 'Value 4', 'Value 5']]
I agree with the other answers that it would be better to use a dictionary right away. But if you want, for some reason, stick to the data structure you have, you could transform it into a dictionary and back:
type_dict = {}
for tlist in List1+List2:
curr_type = tlist[0]
type_dict[curr_type] = tlist[1:] if not curr_type in type_dict else type_dict[curr_type]+tlist[1:]
new_list = [[k] + type_dict[k] for k in type_dict]
In the creation of new_list, you can take the keys from a subset of type_dict only if you do not want to include all of them.
I have two lists of lists.
I want to get the elements from second list of lists, based on a value from the first list of lists.
I if I have simple lists, everything go smooth, but once I have list of list, I'm missing something at the end.
Here is the code working for two lists (N = names, and V = values):
N = ['name 1', 'name 2','name 3','name 4','name 5','name 6','name 7','name 8','name 9','name 10']
V = ['val 1', 'val 2','val 3','val 4','val 5','val 6','val 7','val 8','val 9','val 10']
bool_ls = []
NN = N
for i in NN:
if i == 'name 5':
i = 'y'
else:
i = 'n'
bool_ls.append(i)
# GOOD INDEXES = GI
GI = [i for i, x in enumerate(bool_ls) if x == 'y']
# SELECT THE GOOD VALUES = "GV" FROM V
GV = [V[index] for index in GI]
if I define a function, works well applied to the two lists:
def GV(N,V,name):
bool_ls = []
NN = N
for i in NN:
if i == name:
i = 'y'
else:
i = 'n'
bool_ls.append(i)
GI = [i for i, x in enumerate(bool_ls) if x == 'y']
GV = [V[index] for index in GI]
return GV
Once I try "list of list", I cannot get the similar results. My code looks like below so far:
NN = [['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3'], ['name 1', 'name 2','name 3']]
VV = [['val 1', 'val 2', 'val 3'], ['val 1', 'val 2', 'val 3'], ['val 1', 'val 2', 'val 3'], ['val 1', 'val 2', 'val 3'], ['val 1', 'val 2', 'val 3']]
def GV(NN,VV,name):
bool_ls = []
NNN = NN
for j in NNN:
for i in j:
if i == name:
i = 'y'
else:
i = 'n'
bool_ls.append(i)
# here is where I'm lost
Help greatly appreciated! Thank you.
You can generate pair-wise combinations from both list using zip and then filter in a list comprehension.
For the flat lists:
def GV(N, V, name):
return [j for i, j in zip(N, V) if i==name]
For the nested lists, you'll add an extra nesting:
def GV(NN,VV,name):
return [j for tup in zip(NN, VV) for i, j in zip(*tup) if i==name]
In case you want a list of lists, you can move the nesting into new lists inside the parent comprehension.
There's an easier way to do what your function is doing, but, to answer your question, you just need two loops (one for each level of lists): the first list iterates over the list of lists, the second iterates over the inner lists and does the somewhat odd y or n thing to chose a value.
I have a list of tuples and a dictionary of lists as follows.
# List of tuples
lot = [('Item 1', 43), ('Item 4', 82), ('Item 12', 33), ('Item 10', 21)]
# dict of lists
dol = {
'item_category_one': ['Item 3', 'Item 4'],
'item_category_two': ['Item 1'],
'item_category_thr': ['Item 2', 'Item 21'],
}
Now I want to do a look-up where any item in any list within dol exists in any of the tuples given in lot. If this requirement is met, then i want to add another variable to that respective tuple.
Currently I am doing this as follows (which looks incredibly inefficient and ugly). I would want to know the most efficient and neat way of achieving this. what are the possibilities ?
PS: I am also looking to preserve the order of lot while doing this.
merged = [x[0] for x in lot]
for x in dol:
for item in dol[x]:
if item in merged:
for x in lot:
if x[0] == item:
lot[lot.index(x)] += (True, )
First, build a set of all your values inside of the dol structure:
from itertools import chain
dol_values = set(chain.from_iterable(dol.itervalues()))
Now membership testing is efficient, and you can use a list comprehension:
[tup + (True,) if tup[0] in dol_values else tup for tup in lot]
Demo:
>>> from itertools import chain
>>> dol_values = set(chain.from_iterable(dol.itervalues()))
>>> dol_values
set(['Item 3', 'Item 2', 'Item 1', 'Item 21', 'Item 4'])
>>> [tup + (True,) if tup[0] in dol_values else tup for tup in lot]
[('Item 1', 43, True), ('Item 4', 82, True), ('Item 12', 33), ('Item 10', 21)]