initialize python dictionary containing a strings and lists - python

What's the best way to initialize (e.g. before a loop) a dictionary that will contain strings and a list in each item?
For instance:
dict = [("string1", [1,2]), ("string2", [5,6]),..]
So:
dict["string1"]
returns:
[1,2]

Please don't call a variable dict (or list, or set, etc) because it "covers up" existing functions and prevents you from using them.
data = {
"string1": [1,2],
"string2": [5,6]
}
print data["string1"] # => [1,2]
If you're a bit of a masochist, you can also try
data = dict([("string1",[1,2]), ("string2", [5,6])])
or (per cval's example)
data = dict(zip(strings, numbers))

Just pass the list to the dict() function, it'll return a dictionary, and never use dict as a variable name:
>>> dic = [("string1", [1,2]), ("string2", [5,6])]
>>> dict(dic)
{'string2': [5, 6], 'string1': [1, 2]}

Python 2.7 and Python 3, dict comprehension:
data = {entry[0]: entry[1] for entry in data}
or unpack your structures:
data = {k: v for (k, v) in data}
Python 2.6 and before, generator expression:
data = dict((entry[0], entry[1]) for entry in data)
or again unpack:
data = dict((k, v) for (k, v) in data)
Since your data is already in the form of (key, value) sequences, you can just generate a dict in one go by passing the structure to the dict type without an explicit dict or generator expression:
dict(data)

strings = ["string1","string2"]
numbers = [[1, 2], [5, 6]]
data = {k:v for k,v in zip(strings, numbers)}
And I don't recommend using reserved words (such as dict) as variable names.

Related

python list of lists to dict when key appear many times

I know to write something simple and slow with loop, but I need it to run super fast in big scale.
input:
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
desired out put:
d = {1 : ["txt1", "txt2"], 2 : "txt3"]
There is something built-in at python which make dict() extend key instead replacing it?
dict(list(zip(lst[0], lst[1])))
One option is to use dict.setdefault:
out = {}
for k, v in zip(*lst):
out.setdefault(k, []).append(v)
Output:
{1: ['txt1', 'txt2'], 2: ['txt3']}
If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:
out = {}
for k,v in zip(*lst):
if k in out:
if isinstance(out[k], list):
out[k].append(v)
else:
out[k] = [out[k], v]
else:
out[k] = v
or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:
from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
length = len([*v])
if length > 1:
out[k] = lst[1][pos:pos+length]
else:
out[k] = lst[1][pos]
pos += length
Output:
{1: ['txt1', 'txt2'], 2: 'txt3'}
But as #timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.
If you're dealing with large datasets it may be useful to add a pandas solution.
>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s
1 txt1
1 txt2
2 txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}
Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.
You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):
from collections import defaultdict
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
result[key].append(value)
for key in result:
if len(result[key]) == 1:
result[key] = result[key][0]
print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}

Function to iterate through multi level list nested with dicts does not work properly

I want to compare two lists and verify if they are the same. Though the lists might have items in a different order so just comparing list1 == list2 would not work.
Those lists can be nested into multi-level with dicts, strings, integers, and lists, so just using sorted(list1) == sorted(list2) would not work either.
I'm trying to create a function that iterates through a multi-level list and sort each list inside of it in ascending order.
The results I'm having so far only sorts the first level main list. All the other "sub lists" get sorted inside the function but when I print the final result they are unsorted the same way before using the function.
Function created so far:
def sort_multilevel_obj(items):
if isinstance(items, dict):
for v in items.values():
if isinstance(v, list):
v = sorted(v)
v = sort_multilevel_obj(v)
if isinstance(items, list):
items = sorted(items)
for i in items:
i = sort_multilevel_obj(i)
return items
Example of multi-level list:
mylist = [
'string1',
[
{
1:'one',
2:'two',
3:[4,2,'ddd'],
4:{'x':'xx'}
},
'substring'
],
{
'somekey':7,
'anotherkey':[
'ccccccc',
100,
4,
'blabla'
]
}
]
When I pass the list into the function the result I got is:
[{'z': ['ccccccc', 100, 4, 'afsafas'], 'f': 7}, [{1: 'one', 2: 'two', 3: [4, 2, 'ddd'], 4: {'x': 'xx'}}, 'substring'], 'string1']
The fist list (string, list, dict) is sorted properly (to dict, list, string) but the list inside the dict (['ccccccc', 100, 4, 'afsafas']) should be returned as [4, 100, 'afsafas', 'ccccccc'] but this just doesn't work.
What am I doing wrong?
Try using v.sort() instead of v = sorted(v). v.sort() mutates the original list while sorted(v) returns a new object. See this answer for another explanation about the difference between v.sort() and sorted(v).
The reason why v = sorted(v) doesn't change the value in the list/dict itself is because the variable v is basically only a reference to the original value in the dict/list. When you use v = <something> you're pointing that variable to another object and thus not changing the original value.

Weird behavior when using bool as dict key in python [duplicate]

I need a dictionary that has two keys with the same name, but different values. One way I tried to do this is by creating a class where I would put the each key name of my dictionary, so that they would be different objects:
names = ["1", "1"]
values = [[1, 2, 3], [4, 5, 6]]
dict = {}
class Sets(object):
def __init__(self,name):
self.name = name
for i in range(len(names)):
dict[Sets(names[i])] = values[i]
print dict
The result I was expecting was:
{"1": [1, 2, 3], "1": [4, 5, 6]}
But instead it was:
{"1": [4, 5, 6]}
[EDIT]
So I discovered that keys in a dictionary are meant to be unique, having two keys with the same name is a incorrect use of dictionary. So I need to rethink my problem and use other methods avaliable in Python.
What you are trying to do is not possible with dictionaries. In fact, it is contrary to the whole idea behind dictionaries.
Also, your Sets class won't help you, as it effectively gives each name a new (sort of random) hash code, making it difficult to retrieve items from the dictionary, other than checking all the items, which defeats the purpose of the dict. You can not do dict.get(Sets(some_name)), as this will create a new Sets object, having a different hash code than the one already in the dictionary!
What you can do instead is:
Just create a list of (name, value) pairs, or
pairs = zip(names, values) # or list(zip(...)) in Python 3
create a dictionary mapping names to lists of values.
dictionary = {}
for n, v in zip(names, values):
dictionary.setdefault(n, []).append(v)
The first approach, using lists of tuples, will have linear lookup time (you basically have to check all the entries), but the second one, a dict mapping to lists, is as close as you can get to "multi-key-dicts" and should serve your purposes well. To access the values per key, do this:
for key, values in dictionary.iteritems():
for value in values:
print key, value
Instead of wanting multiple keys with the same name, could you getting away of having multiple values per each key?
names = [1]
values = [[1, 2, 3], [4, 5, 6]]
dict = {}
for i in names:
dict[i] = values
for k,v in dict.items():
for v in dict[k]:
print("key: {} :: v: {}".format(k, v))
Output:
key: 1 :: v: [1, 2, 3]
key: 1 :: v: [4, 5, 6]
Then you would access each value like this (or in a loop):
print("Key 1 value 1: {}".format(dict[1][0]))
print("Key 1 value 2: {}".format(dict[1][1]))

Create OrderedDict from dict with values of list type (in the order of list's values)

It is a bit hard for me to explain it in words, so I'll show an example:
What I have (data is a dict instance):
data = {'a':[4,5,3], 'b':[1,0,2], 'c':[6,7,8]}
What I need (ordered_data is an OrderedDict instance):
ordered_data = {'b':[0,1,2], 'a':[3,4,5], 'b':[6,7,8]}
The order of keys should be changed with respect to order of items in nested lists
tmp = {k:sorted(v) for k,v in data.items()}
ordered_data = OrderedDict((k,v) for k,v in sorted(tmp.items(), key=lambda i: i[1]))
First sort the values. If you don't need the original data, it's OK to do this in place, but I made a temporary variable.
key is a function that returns a key to be sorted on. In this case, the key is the second element of the item tuple (the list), and since lists are comparable, that's good enough.
You can use OrderedDict by sorting your items and the values :
>>> from operator import itemgetter
>>> from collections import OrderedDict
>>> d = OrderedDict(sorted([(k, sorted(v)) for k, v in data.items()], key=itemgetter(1)))
>>> d
OrderedDict([('b', [0, 1, 2]), ('a', [3, 4, 5]), ('c', [6, 7, 8])])
Usually, you should not worry about the data order in the dictionary itself, and instead, jsut order it when you retrieve the dictionary's contents (i.e.: iterate over it):
data = {'a':[4,5,3], 'b':[1,0,2], 'c':[6,7,8]}
for datum in sorted(data.items(), key=lambda item: item[1]):
...

Python - tuple unpacking in dict comprehension

I'm trying to write a function that turns strings of the form 'A=5, b=7' into a dict {'A': 5, 'b': 7}. The following code snippets are what happen inside the main for loop - they turn a single part of the string into a single dict element.
This is fine:
s = 'A=5'
name, value = s.split('=')
d = {name: int(value)}
This is not:
s = 'A=5'
d = {name: int(value) for name, value in s.split('=')}
ValueError: need more than 1 value to unpack
Why can't I unpack the tuple when it's in a dict comprehension? If I get this working then I can easily make the whole function into a single compact dict comprehension.
In your code, s.split('=') will return the list: ['A', '5']. When iterating over that list, a single string gets returned each time (the first time it is 'A', the second time it is '5') so you can't unpack that single string into 2 variables.
You could try: for name,value in [s.split('=')]
More likely, you have an iterable of strings that you want to split -- then your dict comprehension becomes simple (2 lines):
splitstrs = (s.split('=') for s in list_of_strings)
d = {name: int(value) for name,value in splitstrs }
Of course, if you're obsessed with 1-liners, you can combine it, but I wouldn't.
Sure you could do this:
>>> s = 'A=5, b=7'
>>> {k: int(v) for k, v in (item.split('=') for item in s.split(','))}
{'A': 5, ' b': 7}
But in this case I would just use this more imperative code:
>>> d = {}
>>> for item in s.split(','):
k, v = item.split('=')
d[k] = int(v)
>>> d
{'A': 5, ' b': 7}
Some people tend to believe you'll go to hell for using eval, but...
s = 'A=5, b=7'
eval('dict(%s)' % s)
Or better, to be safe (thanks to mgilson for pointing it out):
s = 'A=5, b=7'
eval('dict(%s)' % s, {'__builtins__': None, 'dict': dict})
See mgilson answer to why the error is happening. To achieve what you want, you could use:
d = {name: int(value) for name,value in (x.split('=',1) for x in s.split(','))}
To account for spaces, use .strip() as needed (ex.: x.strip().split('=',1)).
How about this code:
a="A=5, b=9"
b=dict((x, int(y)) for x, y in re.findall("([a-zA-Z]+)=(\d+)", a))
print b
Output:
{'A': 5, 'b': 9}
This version will work with other forms of input as well, for example
a="A=5 b=9 blabla: yyy=100"
will give you
{'A': 5, 'b': 9, 'yyy': 100}
>>> strs='A=5, b=7'
>>> {x.split('=')[0].strip():int(x.split('=')[1]) for x in strs.split(",")}
{'A': 5, 'b': 7}
for readability you should use normal for-in loop instead of comprehensions.
strs='A=5, b=7'
dic={}
for x in strs.split(','):
name,val=x.split('=')
dic[name.strip()]=int(val)
How about this?
>>> s
'a=5, b=3, c=4'
>>> {z.split('=')[0].strip(): int(z.split('=')[1]) for z in s.split(',')}
{'a': 5, 'c': 4, 'b': 3}
Since Python 3.8, you can use walrus operator (:=) for this kind of operation. It allows to assign variables in the middle of expressions (in this case, assign the list created by .split('=') to kv).
s = 'A=5, b=7'
{(kv := item.split('='))[0]: int(kv[1]) for item in s.split(', ')}
# {'A': 5, 'b': 7}
One feature is that it leaks the assigned variable, kv, outside the scope it was defined in. If you want to avoid that, you can use a nested for-loop where the inner loop is over a singleton list (as suggested in mgilson's answer).
{k: int(v) for item in s.split(', ') for k,v in [item.split('=')]}
Since Python 3.9, loops over singleton lists are optimized to be as fast as simple assignments, i.e. y in [expr] is as fast as y = expr.

Categories