Keeping the order of an OrderedDict - python

I have an OrderedDict that I'm passing to a function. Somewhere in the function it changes the ordering, though I'm not sure why and am trying to debug it. Here is an example of the function and the function and output:
def unnest_data(data):
path_prefix = ''
UNNESTED = OrderedDict()
list_of_subdata = [(data, ''),] # (data, prefix)
while list_of_subdata:
for subdata, path_prefix in list_of_subdata:
for key, value in subdata.items():
path = (path_prefix + '.' + key).lstrip('.').replace('.[', '[')
if not (isinstance(value, (list, dict))):
UNNESTED[path] = value
elif isinstance(value, dict):
list_of_subdata.append((value, path))
elif isinstance(value, list):
list_of_subdata.extend([(_, path) for _ in value])
list_of_subdata.remove((subdata, path_prefix))
if not list_of_subdata: break
return UNNESTED
Then, if I call it:
from collections import OrderedDict
data = OrderedDict([('Item', OrderedDict([('[#ID]', '288917'), ('Main', OrderedDict([('Platform', 'iTunes'), ('PlatformID', '353736518')])), ('Genres', OrderedDict([('Genre', [OrderedDict([('[#FacebookID]', '6003161475030'), ('Value', 'Comedy')]), OrderedDict([('[#FacebookID]', '6003172932634'), ('Value', 'TV-Show')])])]))]))])
unnest_data(data)
I get an OrderedDict that doesn't match the ordering of my original one:
OrderedDict([('Item[#ID]', '288917'), ('Item.Genres.Genre[#FacebookID]', ['6003172932634', '6003161475030']), ('Item.Genres.Genre.Value', ['TV-Show', 'Comedy']), ('Item.Main.Platform', 'iTunes'), ('Item.Main.PlatformID', '353736518')])
Notice how it has "Genre" before "PlatformID", which is not the way it was sorted in the original dict. What seems to be my error here and how would I fix it?

It’s hard to say exactly what’s wrong without a complete working example. But based on the code you’ve shown, I suspect your problem isn’t with OrderedDict at all, but rather that you’re modifying list_of_subdata while iterating through it, which will result in items being unexpectedly skipped.
>>> a = [1, 2, 3, 4, 5, 6, 7]
>>> for x in a:
... print(x)
... a.remove(x)
...
1
3
5
7
Given your use, consider a deque instead of a list.

Related

get dictionary key by path (string)

I have this path that can change from time to time:
'#/path/to/key'
The parts of the path aren't defined, so this value is also fine
'#/this/is/a/longer/path'
I'm splitting this key at '/' so I get
['#', 'path', 'to', 'key']
and I need to get to the key in this path, let's say my dict is exp, so I need to get to here:
exp['path']['to']['key']
how could I possibly know how to get to this key?
Use the recursion, Luke ...
def deref_multi(data, keys):
return deref_multi(data[keys[0]], keys[1:]) \
if keys else data
last = deref_multi(exp, ['path','to','key'])
UPDATE: It's It's been 5+ years, time for an update, this time without using recursion (which may use slightly more resources than if Python does the looping internally). Use whichever is more understandable (and so maintainable) to you:
from functools import reduce
def deref_multi(data, keys):
return reduce(lambda d, key: d[key], keys, data)
I suggest you to use python-benedict, a python dict subclass with full keypath support and many utility methods.
You just need to cast your existing dict:
exp = benedict(exp)
# now your keys can be dotted keypaths too
exp['path.to.key']
Here the library and the documentation:
https://github.com/fabiocaccamo/python-benedict
Note: I am the author of this project
def get_key_by_path(dict_obj, path_string):
path_list = path_string.split('/')[1:]
obj_ptr = dict_obj
for elem in path_list:
obj_ptr = obj_ptr[elem]
return obj_ptr
There have been some good answers here, but none of them account for paths that aren't correct or paths that at some point result in something that is not subscriptable. The code below will potentially allow you a little more leeway in handling such cases whereas other code so far will just throw an error or have unexpected behavior.
path = '#/path/to/key'
exp = {'path' : { 'to' : { 'key' : "Hello World"}}}
def getFromPath(dictionary, path):
curr = dictionary
path = path.split("/")[1:] # Gets rid of '#' as it's uneccessary
while(len(path)):
key = path.pop(0)
curr = curr.get(key)
if (type(curr) is not dict and len(path)):
print("Path does not exist!")
return None
return curr
print(getFromPath(exp, path)) #Your value
>>> exp = {'path': {'to': {'key': 42}}}
>>> my_key = exp
>>> for i in '#/path/to/key'.split('/')[1:]:
>>> my_key = my_key[i]
>>> print(my_key)
42
But I'm a bit curious about how you retrieved such dict
Assuming what you mean by this is that your array ['#', 'path', 'to', 'key'] has indexes leading into a nested starting from index 1, you could iterate over each item in the list starting from the second and just dig deeper through every iteration.
For example, in Python 3 you could do this.
def get_key_from_path(exp, path):
"""Returns the value at the key from <path> in <exp>.
"""
cur = exp
for dir in path[1:]:
cur = exp[dir]
return cur
Using functools in place of recursion:
# Define:
from functools import partial, reduce
deref = partial(reduce, lambda d, k: d[k])
# Use:
exp = {'path': {'to': {'key': 42}}}
deref(('path', 'to', 'key'), exp)
3 year old question, I know... I just really like functools.

How to reduce on a list of tuples in python

I have an array and I want to count the occurrence of each item in the array.
I have managed to use a map function to produce a list of tuples.
def mapper(a):
return (a, 1)
r = list(map(lambda a: mapper(a), arr));
//output example:
//(11817685, 1), (2014036792, 1), (2014047115, 1), (11817685, 1)
I'm expecting the reduce function can help me to group counts by the first number (id) in each tuple. For example:
(11817685, 2), (2014036792, 1), (2014047115, 1)
I tried
cnt = reduce(lambda a, b: a + b, r);
and some other ways but they all don't do the trick.
NOTE
Thanks for all the advice on other ways to solve the problems, but I'm just learning Python and how to implement a map-reduce here, and I have simplified my real business problem a lot to make it easy to understand, so please kindly show me a correct way of doing map-reduce.
You could use Counter:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
counter = Counter(arr)
print zip(counter.keys(), counter.values())
EDIT:
As pointed by #ShadowRanger Counter has items() method:
from collections import Counter
arr = [11817685, 2014036792, 2014047115, 11817685]
print Counter(arr).items()
Instead of using any external module you can use some logic and do it without any module:
track={}
if intr not in track:
track[intr]=1
else:
track[intr]+=1
Example code :
For these types of list problems there is a pattern :
So suppose you have a list :
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
And you want to convert this to a dict as the first element of the tuple as key and second element of the tuple. something like :
{2008: [9], 2006: [5], 2007: [4]}
But there is a catch you also want that those keys which have different values but keys are same like (2006,1) and (2006,5) keys are same but values are different. you want that those values append with only one key so expected output :
{2008: [9], 2006: [1, 5], 2007: [4]}
for this type of problem we do something like this:
first create a new dict then we follow this pattern:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
So we first check if key is in new dict and if it already then add the value of duplicate key to its value:
full code:
a=[(2006,1),(2007,4),(2008,9),(2006,5)]
new_dict={}
for item in a:
if item[0] not in new_dict:
new_dict[item[0]]=[item[1]]
else:
new_dict[item[0]].append(item[1])
print(new_dict)
output:
{2008: [9], 2006: [1, 5], 2007: [4]}
After writing my answer to a different question, I remembered this post and thought it would be helpful to write a similar answer here.
Here is a way to use reduce on your list to get the desired output.
arr = [11817685, 2014036792, 2014047115, 11817685]
def mapper(a):
return (a, 1)
def reducer(x, y):
if isinstance(x, dict):
ykey, yval = y
if ykey not in x:
x[ykey] = yval
else:
x[ykey] += yval
return x
else:
xkey, xval = x
ykey, yval = y
a = {xkey: xval}
if ykey in a:
a[ykey] += yval
else:
a[ykey] = yval
return a
mapred = reduce(reducer, map(mapper, arr))
print mapred.items()
Which prints:
[(2014036792, 1), (2014047115, 1), (11817685, 2)]
Please see the linked answer for a more detailed explanation.
If all you need is cnt, then a dict would probably be better than a list of tuples here (if you need this format, just use dict.items).
The collections module has a useful data structure for this, a defaultdict.
from collections import defaultdict
cnt = defaultdict(int) # create a default dict where the default value is
# the result of calling int
for key in arr:
cnt[key] += 1 # if key is not in cnt, it will put in the default
# cnt_list = list(cnt.items())

Extending python dictionary and changing key values

Assume I have a python dictionary with 2 keys.
dic = {0:'Hi!', 1:'Hello!'}
What I want to do is to extend this dictionary by duplicating itself, but change the key value.
For example, if I have a code
dic = {0:'Hi!', 1:'Hello'}
multiplier = 3
def DictionaryExtend(number_of_multiplier, dictionary):
"Function code"
then the result should look like
>>> DictionaryExtend(multiplier, dic)
>>> dic
>>> dic = {0:'Hi!', 1:'Hello', 2:'Hi!', 3:'Hello', 4:'Hi!', 5:'Hello'}
In this case, I changed the key values by adding the multipler at each duplication step. What's the efficient way of doing this?
Plus, I'm also planning to do the same job for list variable. I mean, extend a list by duplicating itself and change some values like above exmple. Any suggestion for this would be helpful, too!
You can try itertools to repeat the values and OrderedDict to maintain input order.
import itertools as it
import collections as ct
def extend_dict(multiplier, dict_):
"""Return a dictionary of repeated values."""
return dict(enumerate(it.chain(*it.repeat(dict_.values(), multiplier))))
d = ct.OrderedDict({0:'Hi!', 1:'Hello!'})
multiplier = 3
extend_dict(multiplier, d)
# {0: 'Hi!', 1: 'Hello!', 2: 'Hi!', 3: 'Hello!', 4: 'Hi!', 5: 'Hello!'}
Regarding handling other collection types, it is not clear what output is desired, but the following modification reproduces the latter and works for lists as well:
def extend_collection(multiplier, iterable):
"""Return a collection of repeated values."""
repeat_values = lambda x: it.chain(*it.repeat(x, multiplier))
try:
iterable = iterable.values()
except AttributeError:
result = list(repeat_values(iterable))
else:
result = dict(enumerate(repeat_values(iterable)))
return result
lst = ['Hi!', 'Hello!']
multiplier = 3
extend_collection(multiplier, lst)
# ['Hi!', 'Hello!', 'Hi!', 'Hello!', 'Hi!', 'Hello!']
It's not immediately clear why you might want to do this. If the keys are always consecutive integers then you probably just want a list.
Anyway, here's a snippet:
def dictExtender(multiplier, d):
return dict(zip(range(multiplier * len(d)), list(d.values()) * multiplier))
I don't think you need to use inheritance to achieve that. It's also unclear what the keys should be in the resulting dictionary.
If the keys are always consecutive integers, then why not use a list?
origin = ['Hi', 'Hello']
extended = origin * 3
extended
>> ['Hi', 'Hello', 'Hi', 'Hello', 'Hi', 'Hello']
extended[4]
>> 'Hi'
If you want to perform a different operation with the keys, then simply:
mult_key = lambda key: [key,key+2,key+4] # just an example, this can be any custom implementation but beware of duplicate keys
dic = {0:'Hi', 1:'Hello'}
extended = { mkey:dic[key] for key in dic for mkey in mult_key(key) }
extended
>> {0:'Hi', 1:'Hello', 2:'Hi', 3:'Hello', 4:'Hi', 5:'Hello'}
You don't need to extend anything, you need to pick a better input format or a more appropriate type.
As others have mentioned, you need a list, not an extended dict or OrderedDict. Here's an example with lines.txt:
1:Hello!
0: Hi.
2: pylang
And here's a way to parse the lines in the correct order:
def extract_number_and_text(line):
number, text = line.split(':')
return (int(number), text.strip())
with open('lines.txt') as f:
lines = f.readlines()
data = [extract_number_and_text(line) for line in lines]
print(data)
# [(1, 'Hello!'), (0, 'Hi.'), (2, 'pylang')]
sorted_text = [text for i,text in sorted(data)]
print(sorted_text)
# ['Hi.', 'Hello!', 'pylang']
print(sorted_text * 2)
# ['Hi.', 'Hello!', 'pylang', 'Hi.', 'Hello!', 'pylang']
print(list(enumerate(sorted_text * 2)))
# [(0, 'Hi.'), (1, 'Hello!'), (2, 'pylang'), (3, 'Hi.'), (4, 'Hello!'), (5, 'pylang')]

Python convert string to array assignment

In my application I am receiving a string 'abc[0]=123'
I want to convert this string to an array of items. I have tried eval() it didnt work for me. I know the array name abc but the number of items will be different in each time.
I can split the string, get array index and do. But I would like to know if there is any direct way to convert this string as an array insert.
I would greately appreciate any suggestion.
are you looking for something like
In [36]: s = "abc[0]=123"
In [37]: vars()[s[:3]] = []
In [38]: vars()[s[:3]].append(eval(s[s.find('=') + 1:]))
In [39]: abc
Out[39]: [123]
But this is not a good way to create a variable
Here's a function for parsing urls according to php rules (i.e. using square brackets to create arrays or nested structures):
import urlparse, re
def parse_qs_as_php(qs):
def sint(x):
try:
return int(x)
except ValueError:
return x
def nested(rest, base, val):
curr, rest = base, re.findall(r'\[(.*?)\]', rest)
while rest:
curr = curr.setdefault(
sint(rest.pop(0) or len(curr)),
{} if rest else val)
return base
def dtol(d):
if not hasattr(d, 'items'):
return d
if sorted(d) == range(len(d)):
return [d[x] for x in range(len(d))]
return {k:dtol(v) for k, v in d.items()}
r = {}
for key, val in urlparse.parse_qsl(qs):
id, rest = re.match(r'^(\w+)(.*)$', key).groups()
r[id] = nested(rest, r.get(id, {}), val) if rest else val
return dtol(r)
Example:
qs = 'one=1&abc[0]=123&abc[1]=345&foo[bar][baz]=555'
print parse_qs_as_php(qs)
# {'abc': ['123', '345'], 'foo': {'bar': {'baz': '555'}}, 'one': '1'}
Your other application is doing it wrong. It should not be specifying index values in the parameter keys. The correct way to specify multiple values for a single key in a GET is to simply repeat the key:
http://my_url?abc=123&abc=456
The Python server side should correctly resolve this into a dictionary-like object: you don't say what framework you're running, but for instance Django uses a QueryDict which you can then access using request.GET.getlist('abc') which will return ['123', '456']. Other frameworks will be similar.

searching and adding a python list

I have a TList which is a list of lists. I would like to add new items to the list if they are not present before. For instance if item I is not present, then add to Tlist otherwise skip.Is there a more pythonic way of doing it ? Note : At first TList may be empty and elements are added in this code. After adding Z for example, TList = [ [A,B,C],[D,F,G],[H,I,J],[Z,aa,bb]]. The other elements are based on calculations on Z.
item = 'C' # for example this item will given by user
TList = [ [A,B,C],[D,F,G],[H,I,J]]
if not TList:
## do something
# check if files not previously present in our TList and then add to our TList
elif item not in zip(*TList)[0]:
## do something
Since it would appear that the first entry in each sublist is a key of some sort, and the remaining entries are somehow derived from that key, a dictionary might be a more suitable data structure:
vals = {'A': ['B','C'], 'D':['F','G'], 'H':['I','J']}
if 'Z' in vals:
print 'found Z'
else:
vals['Z'] = ['aa','bb']
#aix made a good suggestion to use a dict as your data structure; It seems to fit your use case well.
Consider wrapping up the value checking (i.e. 'Does it exist?') and the calculation of the derived values ('aa' and 'bb' in your example?).
class TList(object):
def __init__(self):
self.data = {}
def __iter__(self):
return iter(self.data)
def set(self, key):
if key not in self:
self.data[key] = self.do_something(key)
def get(self, key):
return self.data[key]
def do_something(self, key):
print('Calculating values')
return ['aa', 'bb']
def as_old_list(self):
return [[k, v[0], v[1]] for k, v in self.data.iteritems()]
t = TList()
## Add some values. If new, `do_something()` will be called
t.set('aval')
t.set('bval')
t.set('aval') ## Note, do_something() is not called
## Get a value
t.get('aval')
## 'in ' tests work
'aval' in t
## Give you back your old data structure
t.as_old_list()
if you need to keep the same data structure, something like this should work:
# create a set of already seen items
seen = set(zip(*TList)[:1])
# now start adding new items
if item not in seen:
seen.add(item)
# add new sublist to TList
Here is a method using sets and set.union:
a = set(1,2,3)
b = set(4,5,6)
c = set()
master = [a,b,c]
if 2 in set.union(*master):
#Found it, do something
else:
#Not in set, do something else
If the reason for testing for membership is simply to avoid adding an entry twice, the set structure uses a.add(12) to add something to a set, but only add it once, thus eliminating the need to test. Thus the following:
>>> a=set()
>>> a.add(1)
>>> a
set([1])
>>> a.add(1)
>>> a
set([1])
If you need the set elsewhere as a list you simply say "list(a)" to get "a" as a list, or "tuple(a)" to get it as a tuple.

Categories