Python - wrap same object to make it unique - python

I have a dictionary that is being built while iterating through objects. Now same object can be accessed multiple times. And I'm using object itself as a key.
So if same object is accessed more than once, then key becomes not unique and my dictionary is no longer correct.
Though I need to access it by object, because later on if someone wants access contents by it, they can request to get it by current object. And it will be correct, because it will access the last active object at that time.
So I'm wondering if it is possible to wrap object somehow, so it would keep its state and all attributes the same, but the only difference would be this new kind of object which is actually unique.
For example:
dct = {}
for obj in some_objects_lst:
# Well this kind of wraps it, but it loses state, so if I would
# instantiate I would lose all information that was in that obj.
wrapped = type('Wrapped', (type(obj),), {})
dct[wrapped] = # add some content
Now if there are some better alternatives than this, I would like to hear it too.
P.S. objects being iterated would be in different context, so even if object is the same, it would be treated differently.
Update
As requested, to give better example where the problem comes from:
I have this excel reports generator module. Using it, you can generate various excel reports. For that you need to write configuration using python dictionary.
Now before report is generated, it must do two things. Get metadata (metadata here is position of each cell that will be when report is about to be created) and second, parse configuration to fill cells with content.
One of the value types that can be used in this module, is formula (excel formulas). And the problem in my question is specifically with one of the ways formula can be computed: formula values that are retrieved for parent , that are in their childs.
For example imagine this excel file structure:
A | B | C
Total Childs Name Amount
1 sum(childs)
2 child_1 10
3 child_2 20
4 sum(childs)
...
Now in this example sum on cell 1A, would need to be 10+20=30 if sum would use expression to sum their childs column (in this case C column). And all of this is working until same object (I call it iterables) is repeated. Because when building metadata it I need to store it, to retrieve later. And key is object being iterated itself. So now when it will be iterated again when parsing values, it will not see all information, because some will overwritten by same object.
For example imagine there are invoice objects, then there are partner objects which are related with invoices and there are some other arbitrary objects that given invoice and partner produce specific amounts.
So when extracting such information in excel, it goes like this:
inoice1 -> partner1 -> amount_obj1, amount_obj2
invoice2 -> partner1 -> amount_obj3, amount_obj4.
Notice that partner in example is the same. Here is the problem, because I can't store this as key, because when parsing values, I will iterate over this object twice when metadata will actually hold values for amount_obj3 and amount_obj4
P.S Don't know if I explained it better, cause there is lots of code and I don't want to put huge walls of code here.
Update2
I'll try to explain this problem from more abstract angle, because it seems being too specific just confuses everyone even more.
So given objects list and empty dictionary, dictionary is built by iterating over objects. Objects act as a key in dictionary. It contains metadata used later on.
Now same list can be iterated again for different purpose. When its done, it needs to access that dictionary values using iterated object (same objects that are keys in that dictionary). But the problem is, if same object was used more than once, it will have only latest stored value for that key.
It means object is not unique key here. But the problem is the only thing I know is the object (when I need to retrieve the value). But because it is same iteration, specific index of iteration will be the same when accessing same object both times.
So uniqueness I guess then is (index, object).

I'm not sure if I understand your problem, so here's two options. If it's object content that matters, keep object copies as a key. Something crude like
new_obj = copy.deepcopy(obj)
dct[new_obj] = whatever_you_need_to_store(new_obj)
If the object doesn't change between the first time it's checked by your code and the next, the operation is just performed the second time with no effect. Not optimal, but probably not a big problem. If it does change, though, you get separate records for old and new ones. For memory saving you will probably want to replace copies with hashes, __str__() method that writes object data or whatever. But that depends on what your object is; maybe hashing will take too much time for miniscule savings in memory. Run some tests and see what works.
If, on the other hand, it's important to keep the same value for the same object, whether the data within it have changed or not (say, object is a user session that can change its data between login and logoff), use object ids. Not the builtin id() function, because if the object gets GCed or deleted, some other object may get its id. Define an id attribute for your objects and make sure different objects cannot possibly get the same one.

Related

Unexpected output when looping through subdictionaries in Python

I have a nested dictionary, where I have tickers to identifiy certain assets in my dictionary and then for each of these assets I would like to store characteristics in a subdictionary for the asset, creating them in a simple loop like the below:
ticker = ["a","bb","ccc"]
ticker_dict = dict.fromkeys(ticker, {"Var":[]})
for key in ticker_dict:
ticker_dict[key]["Var"] = len(key)
From the above output I would expect, that for each ticker/asset it saves the "Var" variable as the length of its name, meaning the following:
{"a":{"Var":1},
"bb":{"Var":2},
"ccc":{"Var":3}}
But, in my view rather weirdly, the result is this
{"a":{"Var":3},
"bb":{"Var":3},
"ccc":{"Var":3}}
To provide further context, the real process is that I have four assets, for which I would like to store dataframes in their subdictionaries as this makes it easy for me to access them later in loops etc. Somehow though, the data from the last asset is simply copied over all assets, eventhogh I explicitly loop through different keys.
What's going on?
PS: I'm not sure how to explain the problem without the sample code, so I might have missed a similar entry on this site. If so, any hints to it would be appreciated as well of course.
In your code, {"Var":[]} is only evaluated once, causing there to be only 1 inner dictionary shared by all keys. Instead, you can use a dictionary comprehension:
ticker_dict = {key:{"Var":[]} for key in ticker}
and it will work as expected.

Saving a DataFrame with some extra information

I am trying to store some extra information with DataFrames directly in the same DataFrame, such as some parameters describing the data stored.
I added this information just as extra attributes to the DataFrame:
df.data_origin = 'my_origin'
print(df.data_origin)
But when it is saved and loaded, those extra attributes are lost:
df.to_pickle('pickle_test.pkl')
df2 = pd.read_pickle('pickle_test.pkl')
print(len(df2))
print(df2.definition)
...
465387
>>> AttributeError: 'DataFrame' object has no attribute 'definition'
The workaround I have found is to save the dict of the DataFrame and then assign it to the dict of an empty DataFrame:
with open('modified_dataframe.pkl', "wb") as pkl_out:
pickle.dump(df.__dict__, pkl_out)
df2 = pd.DataFrame()
with open('modified_dataframe.pkl', "rb") as pkl_in:
df2.__dict__ = pickle.load(pkl_in)
print(len(df2))
print(df2.data_origin)
...
465387
my_origin
It seems to work, but:
Is there a better way to do it?
Am I losing information? (apparently, all the data is there)
Here a different solution is discussed, but I would like to know if the approach of saving the dict of a class is valid to hold its entire information.
EDIT: Ok, I found the big drawback. This works fine to save single DataFrames in isolated files, but will not work if I have dictionaries, lists or similar with DataFrames in them.
I suggest that you can get your things done by making a new child class for pandas.DataFrame, make a new class inherit things from pandas.DataFrame class, and add your wanted attributes there. This may seem a bit spooky, but you can play around with it safely when you using in different places. Other stuff might be useful for specific cases though.

How to resolve inconsistent annotation in Django?

I have the following 2 lines of
CategoryContext = Somemodel.objects.values('title__categories__category').distinct()
CategoryContextSum = CategoryContext.annotate(Total_Revenue=Sum('revenue')).order_by('-Total_Revenue')
CategoryContextAvg = CategoryContext.annotate(Average_Revenue=Avg('revenue')).order_by('-Average_Revenue')
The avg query yields a querylist of objects where the category comes first, followed by the revenue. So basically:
<QuerySet [{'title__categories__category':'Category', 'Average_Revenue':Decimal('100'),}, {'title__categories__category':'Category2':'Average_Revenue':Decimal('120'), }]>
The sum query on the other hand yields the revenue followed by the category, so basically:
<QuerySet [{'Total_Revenue':Decimal('100'), 'title__categories__category':'Category'}, {'Total_Revenue':Decimal('120'), 'title__categories__category':'Category2'}]>
Now I have tried flipping the queries around and changing the variable names so far, but I cannot seem to figure out why in the heck these 2 statements are behaving so differently. Does anybody know what could influence annotation behavior in Django?
Edit:
In case you are wondering why I need to understand this: I am passing the queryset to a method that turns it into data for generating a barchart and the first object in the dataset must be the identifier of the value. I could make it so that it inverts the whole process by checking whether this indeed is the case and ivnerting otherwise, but it seems to me that this shouldnt be necessary
This has little or nothing to do with annotate. Dictionaries in Python have no conventional sense of ordering (at least not until Python 3.6), and keys can be ordered differently across different queryset results.
And this constitutes little or not problem since you'll be access required values by key and not serially (as with sequences):
for obj_dct in your_qs:
print(obj_dct[some_key])
If your plot function takes dicts, no need to worry about ordering.

A list of lists of lists in Python

I'm working with the Flask framework in Python, and need to hand off a list of lists to a renderer.
I step through a loop and create a list, sort it, append it to another list, then call the render function with the masterlist, like so:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
candidateArray.sort()
multiCandidateArray.append(candidateArray)
renderPage(multiCandidateArray)
My problem is that I need to clear the candidateArray and create a new one each time through the loop, but it looks like the candidateArray that I append to the multiCandidateArray is actually a pointer, not the values themselves.
When I do this:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
candidateArray.sort()
multiCandidateArray.append(candidateArray)
**del candidateArray[:]**
renderPage(multiCandidateArray)
I end up with no values.
Is there a way to handle this situation that I'm missing?
I would probably go with something like:
for itemID in itemsArray:
avgQuantity = getJitaQuantity(itemID)
lowestJitaSell = getJitaLowest(itemID)
candidateArray = findLowestPrices(itemID, lowestJitaSell, candidateArray, avgQuantity)
multiCandidateArray.append(sorted(candidateArray))
No need to del anything here, and sorted returns a new list, so even if FindLowestPrices is for some reason returning references to the same list (which is unlikely), then you'll still have unique lists in the multiCandidateArray (although your unique lists could hold references to the same objects).
Your code already creates a new one each time through the loop.
candidateArray = findLowestPrices(...)
This assigns a new list to the variable, candidateArray. It should work fine.
When you do this:
del candidateArray[:]
...you're deleting the contents of the same list you just appended to the master list.
Don't think about pointers or variables; just think about objects, and remember nothing in Python is ever implicitly copied. A list is an object. At the end of the loop, candidateArray names the same list object as multiCandidateArray[-1]. They're different names for the same thing. On the next run through the loop, candidateArray becomes a name for a new list as produced by findLowestPrices, and the list at the end of the master list is unaffected.
I've written about this before; the C way of thinking about variables as being predetermined blocks of memory just doesn't apply to Python at all. Names are moved onto values, rather than values being copied into some fixed number of buckets.
(Also, nitpicking, but Python code generally uses under_scores and doesn't bother with types in names unless it's really ambiguous. So you might have candidates and multi_candidates. Definitely don't call anything an "array", since there's an array module in the standard library that does something different and generally not too useful. :))

Accessing Python object in tuple

I am using easyzone and dnspython to extract DNS records from a zone file. When extracting A records I am given back a string and an object in a tuple. I am new to Python coming from PHP and am not quite sure how to get at this object to get the value of it? I had no problems getting the string value in the tuple.
In this code snippet I iterate through the A records and write the values into a CSV:
# Write all A records
for a in z.names.items():
c.writerow([domain, 'A', a.__getitem__(0), a])
a contains the following:
('www.121dentalcare.com.', <easyzone.easyzone.Name object at 0x1012dd190>)
How would I access this object within a which is in the 2nd half of this tuple??
You can use indices to get items from a tuple:
sometuple[1]
just as you can do with lists and strings (see sequence types).
The documentation of easyzone is a little on the thin side, but from looking at the source code it appears the easyzone.easyzone.Name objects have .name, .soa and .ttl attributes:
print sometuple[1].name
The .soa attribute is another custom class, with .mname, .rname, .serial, .refresh, .retry, .expire and .minttl properties.

Categories