Unexpected output when looping through subdictionaries in Python

Unexpected output when looping through subdictionaries in Python - python

I have a nested dictionary, where I have tickers to identifiy certain assets in my dictionary and then for each of these assets I would like to store characteristics in a subdictionary for the asset, creating them in a simple loop like the below:
ticker = ["a","bb","ccc"]
ticker_dict = dict.fromkeys(ticker, {"Var":[]})
for key in ticker_dict:
ticker_dict[key]["Var"] = len(key)
From the above output I would expect, that for each ticker/asset it saves the "Var" variable as the length of its name, meaning the following:
{"a":{"Var":1},
"bb":{"Var":2},
"ccc":{"Var":3}}
But, in my view rather weirdly, the result is this
{"a":{"Var":3},
"bb":{"Var":3},
"ccc":{"Var":3}}
To provide further context, the real process is that I have four assets, for which I would like to store dataframes in their subdictionaries as this makes it easy for me to access them later in loops etc. Somehow though, the data from the last asset is simply copied over all assets, eventhogh I explicitly loop through different keys.
What's going on?
PS: I'm not sure how to explain the problem without the sample code, so I might have missed a similar entry on this site. If so, any hints to it would be appreciated as well of course.

In your code, {"Var":[]} is only evaluated once, causing there to be only 1 inner dictionary shared by all keys. Instead, you can use a dictionary comprehension:
ticker_dict = {key:{"Var":[]} for key in ticker}
and it will work as expected.

Related

How to remove extraneous square brackets from a nested list inside a dictionary?

I have been working on a problem which involves sorting a large data set of shop orders, extracting shop and user information based on some parameters. Mostly this has involved creating dictionaries by iterating through a data set with a for loop and appending a new list, like this:
sshop = defaultdict(list)
for i in range(df_subset.shape[0]):
orderid, sid, userid, time = df.iloc[i]
sshop[sid].append(userid)
sData = dict(sshop)
#CREATES DICTIONARY OF UNIQUE SHOPS WITH USER DATA AS THE VALUE
shops = df_subset['shopid'].unique()
shops_dict = defaultdict(list)
for shop in shops:
shops_dict[shop].append(sData[shop])
shops_dict = dict(shops_dict)
shops_dict looks like this at this point:
{10009: [[196962305]], 10051: [[2854032, 48600461]], 10061: [[168750452, 194819216, 130633421,
62464559]]}
To get to the final stages I have had to repeat lines of code similar to these a couple of times. What seems to happen everytime I do this is that the VALUES in the dictionaries gain a set of square brackets.
This is one of my final dictionaries:
{10159: [[[1577562540.0, 1577736960.0, 1577737080.0]], [[1577651880.0, 1577652000.0, 1577652960.0]]],
10208: [[[1577651040.0, 1577651580.0, 1577797080.0]]]}
I don't entirely understand why this is happening, asides from I believe it is something to do with using defaultdict(list) and then converting that into a dictionary with dict().
These extra brackets, asides from being a little confusing, appear to be causing some problems for accessing the data using certain functions. I understand that there needs to be two sets of square brackets in total, one set that encases all the values in the dictionary key and another inside of that for each of the specific sets of values within that key.
My first question would be, is it possible to remove a specific set of square brackets from a dictionary like that?
My second question would be, if not - is there a better way of creating new dictionaries out the data from an older one without using defaultdict(list) and having all those extra square brackets?
Any help much appreciated!
Thanks :)!

In second loop use extend instead of append.
for shop in shops:
shops_dict[shop].extend(sData[shop])
shops_dict = dict(shops_dict)

Python - wrap same object to make it unique

I have a dictionary that is being built while iterating through objects. Now same object can be accessed multiple times. And I'm using object itself as a key.
So if same object is accessed more than once, then key becomes not unique and my dictionary is no longer correct.
Though I need to access it by object, because later on if someone wants access contents by it, they can request to get it by current object. And it will be correct, because it will access the last active object at that time.
So I'm wondering if it is possible to wrap object somehow, so it would keep its state and all attributes the same, but the only difference would be this new kind of object which is actually unique.
For example:
dct = {}
for obj in some_objects_lst:
# Well this kind of wraps it, but it loses state, so if I would
# instantiate I would lose all information that was in that obj.
wrapped = type('Wrapped', (type(obj),), {})
dct[wrapped] = # add some content
Now if there are some better alternatives than this, I would like to hear it too.
P.S. objects being iterated would be in different context, so even if object is the same, it would be treated differently.
Update
As requested, to give better example where the problem comes from:
I have this excel reports generator module. Using it, you can generate various excel reports. For that you need to write configuration using python dictionary.
Now before report is generated, it must do two things. Get metadata (metadata here is position of each cell that will be when report is about to be created) and second, parse configuration to fill cells with content.
One of the value types that can be used in this module, is formula (excel formulas). And the problem in my question is specifically with one of the ways formula can be computed: formula values that are retrieved for parent , that are in their childs.
For example imagine this excel file structure:
A | B | C
Total Childs Name Amount
1 sum(childs)
2 child_1 10
3 child_2 20
4 sum(childs)
...
Now in this example sum on cell 1A, would need to be 10+20=30 if sum would use expression to sum their childs column (in this case C column). And all of this is working until same object (I call it iterables) is repeated. Because when building metadata it I need to store it, to retrieve later. And key is object being iterated itself. So now when it will be iterated again when parsing values, it will not see all information, because some will overwritten by same object.
For example imagine there are invoice objects, then there are partner objects which are related with invoices and there are some other arbitrary objects that given invoice and partner produce specific amounts.
So when extracting such information in excel, it goes like this:
inoice1 -> partner1 -> amount_obj1, amount_obj2
invoice2 -> partner1 -> amount_obj3, amount_obj4.
Notice that partner in example is the same. Here is the problem, because I can't store this as key, because when parsing values, I will iterate over this object twice when metadata will actually hold values for amount_obj3 and amount_obj4
P.S Don't know if I explained it better, cause there is lots of code and I don't want to put huge walls of code here.
Update2
I'll try to explain this problem from more abstract angle, because it seems being too specific just confuses everyone even more.
So given objects list and empty dictionary, dictionary is built by iterating over objects. Objects act as a key in dictionary. It contains metadata used later on.
Now same list can be iterated again for different purpose. When its done, it needs to access that dictionary values using iterated object (same objects that are keys in that dictionary). But the problem is, if same object was used more than once, it will have only latest stored value for that key.
It means object is not unique key here. But the problem is the only thing I know is the object (when I need to retrieve the value). But because it is same iteration, specific index of iteration will be the same when accessing same object both times.
So uniqueness I guess then is (index, object).

I'm not sure if I understand your problem, so here's two options. If it's object content that matters, keep object copies as a key. Something crude like
new_obj = copy.deepcopy(obj)
dct[new_obj] = whatever_you_need_to_store(new_obj)
If the object doesn't change between the first time it's checked by your code and the next, the operation is just performed the second time with no effect. Not optimal, but probably not a big problem. If it does change, though, you get separate records for old and new ones. For memory saving you will probably want to replace copies with hashes, __str__() method that writes object data or whatever. But that depends on what your object is; maybe hashing will take too much time for miniscule savings in memory. Run some tests and see what works.
If, on the other hand, it's important to keep the same value for the same object, whether the data within it have changed or not (say, object is a user session that can change its data between login and logoff), use object ids. Not the builtin id() function, because if the object gets GCed or deleted, some other object may get its id. Define an id attribute for your objects and make sure different objects cannot possibly get the same one.

Properly looping over a dictionary / using dictionaries as databases

This looks like a CS 101 style homework but it actually isn't. I am trying to learn more python so I took up this personal project to write a small app that keeps my grade-book for me.
I have a class semester which holds a dictionary of section objects.
A section is a class that I am teaching in which ever semester object I am manipulating (I didn't want to call them classes for obvious reasons). I originally had sections as a list not a dictionary, and when I wanted to add a roster of students to that semester I could do this.
for sec in working_semester.sections:
sec.addRosterFromFile(filename)
Now I have changed the sections member of semester to a dictionary so I can look up a specific one to work with, but I am having trouble when I want to loop over all of them to do something like when I first set up a new semester I want to add all the sections, then loop over them and add students to each. If I try the same code to loop over the dictionary it gives me the key, but I was hoping to get the value.
I have also tried to iterate over a dictionary like this, which I got out of an older stack over flow question
for sec in iter(sorted(working_semester.sections.iteritems())):
sec.addRosterFromFile(filename)
But iter(sorted ... returns a tuple (key, value) not the item so the line in side the loop gives me an error that tuple does not have a function called addStudent.
Currently I have this fix in place where I loop through the keys and then use the key to access the value like this:
for key in working_semester.sections:
working_semester.sections[key].addRosterFromFile(filename)
There has to be a way to loop over dictionary values, or is this not desirable? My understanding of dictionaries is that they are like lists but rather than grabbing an element by its position it has a specific key, which makes it easier to grab the one you want no matter what order they are in. Am I missing how dictionaries should be used?

Using iteritems is a good approach, you just need to unpack the key and value:
for key, value in iter(sorted(working_semester.sections.iteritems())):
value.addRosterFromFile(filename)
If you really only need the value, you could use the aptly named itervalues:
for sec in sorted(working_semester.sections.itervalues()):
sec.addRosterFromFile(filename)
(It's not clear from your example whether you really need sorted there. If you don't need to iterate over the sections in sorted order just leave sorted out.)

A list of lists of lists in Python

I'm working with the Flask framework in Python, and need to hand off a list of lists to a renderer.
I step through a loop and create a list, sort it, append it to another list, then call the render function with the masterlist, like so:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
candidateArray.sort()
multiCandidateArray.append(candidateArray)
renderPage(multiCandidateArray)
My problem is that I need to clear the candidateArray and create a new one each time through the loop, but it looks like the candidateArray that I append to the multiCandidateArray is actually a pointer, not the values themselves.
When I do this:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
candidateArray.sort()
multiCandidateArray.append(candidateArray)
**del candidateArray[:]**
renderPage(multiCandidateArray)
I end up with no values.
Is there a way to handle this situation that I'm missing?

I would probably go with something like:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
multiCandidateArray.append(sorted(candidateArray))
No need to del anything here, and sorted returns a new list, so even if FindLowestPrices is for some reason returning references to the same list (which is unlikely), then you'll still have unique lists in the multiCandidateArray (although your unique lists could hold references to the same objects).

Your code already creates a new one each time through the loop.
candidateArray = findLowestPrices(...)
This assigns a new list to the variable, candidateArray. It should work fine.
When you do this:
del candidateArray[:]
...you're deleting the contents of the same list you just appended to the master list.
Don't think about pointers or variables; just think about objects, and remember nothing in Python is ever implicitly copied. A list is an object. At the end of the loop, candidateArray names the same list object as multiCandidateArray[-1]. They're different names for the same thing. On the next run through the loop, candidateArray becomes a name for a new list as produced by findLowestPrices, and the list at the end of the master list is unaffected.
I've written about this before; the C way of thinking about variables as being predetermined blocks of memory just doesn't apply to Python at all. Names are moved onto values, rather than values being copied into some fixed number of buckets.
(Also, nitpicking, but Python code generally uses under_scores and doesn't bother with types in names unless it's really ambiguous. So you might have candidates and multi_candidates. Definitely don't call anything an "array", since there's an array module in the standard library that does something different and generally not too useful. :))

Dictionary into dictionary in python

Ok, this one should be simple. I have 3 dictionaries. They are all made, ordered, and filled to my satisfaction but I would like to put them all in an overarching dictionary so I can reference and manipulate them more easily and efficiently.
Layer0 = {}
Layer1 = {}
Layer2 = {}
here they are when created, and afterwards I feebly tried different things based on SO questions:
Layers = {Layer0, Layer1, Layer2}
which raised a syntax error
Layers = {'Layer0', 'Layer1', 'Layer2'}
which raised another syntax error
(Layers is the Dictionary I'm trying to create that will have all the previously made dictionaries within it)
All the other examples I found on SO have been related to creating dictionaries within dictionaries in order to fill them (or filling them simultaneously) and since I already coded a large number of lines to make these dictionaries, I'd rather put them into a dictionary after the fact instead of re-writing code.
It would be best if the order of the dictionaries are preserved when put into Layers
Does anyone know if this is possible and how I should do it?

Dictionary items have both a key and a value.
Layers = {'Layer0': Layer0, 'Layer1': Layer1, 'Layer2': Layer2}

Keep in mind that dictionaries don't have an order, since a dictionary is a hash table (i.e. a mapping from your key names to a unique hash value). Using .keys() or .values() generates a list, which does have an order, but the dictionary itself doesn't.
So when you say "It would be best if the order of the dictionaries are preserved when put into Layers" - this doesn't really mean anything. For example, if you rename your dictionaries from "Layer1, Layer2, Layer3" to "A, B, C," you'll see that Layers.keys() prints in the order "A, C, B." This is true regardless of the order you used when building the dictionary. All this shows is that the hash value of "C" is less than that of "B," and it doesn't tell you anything about the structure of your dictionary.
This is also why you can't directly iterate over a dictionary (you have to iterate over e.g. a list of the keys).
As a side note, this hash function is what allows a dictionary to do crazy fast lookups. A good hash function will give you constant time [O(1)] lookup, meaning you can check if a given item is in your dictionary in the same amount of time whether the dictionary contains ten items or ten million. Pretty cool.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.