I have a Model like below
class XXX(db.Model):
f_list = db.ListProperty(int,indexed=True) #Store 50000 numbers
How to access the 3rd item in f_list?
You would use a standard list indexing operation to access the 3rd item in the list
some_obj.f_list[2]
However the entire entity will be loaded into memory when you fetch an instance of XXX
There is no way around it with the model you have.
Even a projection query will return the entire list.
The only possibility would be to start creating multiple sub entities.
Related
I'm using Scrapetube to get videos from a channel, and it brings a generator object. From the very simple documentation, I know it includes the parameter "videoId", but how can I know what other parameters I can get from there? Can I transform a generator object into, say, a dataframe?
Generators allow you to efficiently iterate over (potentially infinite) sequences.
In your case, you probably want to first convert the generator into a list to expose all items in the sequence.
Then you can inspect what the returned elements look like and extract the information you need.
You can then create a dataframe for instance from a list of dictionaries:
result_gen = scrapetube.xxx()
result_list = list(result_gen)
# Inspect first element
print(result_list[0])
# Inspect attributes of the first element
print(dir(result_list[0]))
# Convert/extract information of interest into a dictionary
def to_dict(element):
...
result_df = pd.DataFrame([to_dict(element) for element in result_list])
I have a dictionary that is being built while iterating through objects. Now same object can be accessed multiple times. And I'm using object itself as a key.
So if same object is accessed more than once, then key becomes not unique and my dictionary is no longer correct.
Though I need to access it by object, because later on if someone wants access contents by it, they can request to get it by current object. And it will be correct, because it will access the last active object at that time.
So I'm wondering if it is possible to wrap object somehow, so it would keep its state and all attributes the same, but the only difference would be this new kind of object which is actually unique.
For example:
dct = {}
for obj in some_objects_lst:
# Well this kind of wraps it, but it loses state, so if I would
# instantiate I would lose all information that was in that obj.
wrapped = type('Wrapped', (type(obj),), {})
dct[wrapped] = # add some content
Now if there are some better alternatives than this, I would like to hear it too.
P.S. objects being iterated would be in different context, so even if object is the same, it would be treated differently.
Update
As requested, to give better example where the problem comes from:
I have this excel reports generator module. Using it, you can generate various excel reports. For that you need to write configuration using python dictionary.
Now before report is generated, it must do two things. Get metadata (metadata here is position of each cell that will be when report is about to be created) and second, parse configuration to fill cells with content.
One of the value types that can be used in this module, is formula (excel formulas). And the problem in my question is specifically with one of the ways formula can be computed: formula values that are retrieved for parent , that are in their childs.
For example imagine this excel file structure:
A | B | C
Total Childs Name Amount
1 sum(childs)
2 child_1 10
3 child_2 20
4 sum(childs)
...
Now in this example sum on cell 1A, would need to be 10+20=30 if sum would use expression to sum their childs column (in this case C column). And all of this is working until same object (I call it iterables) is repeated. Because when building metadata it I need to store it, to retrieve later. And key is object being iterated itself. So now when it will be iterated again when parsing values, it will not see all information, because some will overwritten by same object.
For example imagine there are invoice objects, then there are partner objects which are related with invoices and there are some other arbitrary objects that given invoice and partner produce specific amounts.
So when extracting such information in excel, it goes like this:
inoice1 -> partner1 -> amount_obj1, amount_obj2
invoice2 -> partner1 -> amount_obj3, amount_obj4.
Notice that partner in example is the same. Here is the problem, because I can't store this as key, because when parsing values, I will iterate over this object twice when metadata will actually hold values for amount_obj3 and amount_obj4
P.S Don't know if I explained it better, cause there is lots of code and I don't want to put huge walls of code here.
Update2
I'll try to explain this problem from more abstract angle, because it seems being too specific just confuses everyone even more.
So given objects list and empty dictionary, dictionary is built by iterating over objects. Objects act as a key in dictionary. It contains metadata used later on.
Now same list can be iterated again for different purpose. When its done, it needs to access that dictionary values using iterated object (same objects that are keys in that dictionary). But the problem is, if same object was used more than once, it will have only latest stored value for that key.
It means object is not unique key here. But the problem is the only thing I know is the object (when I need to retrieve the value). But because it is same iteration, specific index of iteration will be the same when accessing same object both times.
So uniqueness I guess then is (index, object).
I'm not sure if I understand your problem, so here's two options. If it's object content that matters, keep object copies as a key. Something crude like
new_obj = copy.deepcopy(obj)
dct[new_obj] = whatever_you_need_to_store(new_obj)
If the object doesn't change between the first time it's checked by your code and the next, the operation is just performed the second time with no effect. Not optimal, but probably not a big problem. If it does change, though, you get separate records for old and new ones. For memory saving you will probably want to replace copies with hashes, __str__() method that writes object data or whatever. But that depends on what your object is; maybe hashing will take too much time for miniscule savings in memory. Run some tests and see what works.
If, on the other hand, it's important to keep the same value for the same object, whether the data within it have changed or not (say, object is a user session that can change its data between login and logoff), use object ids. Not the builtin id() function, because if the object gets GCed or deleted, some other object may get its id. Define an id attribute for your objects and make sure different objects cannot possibly get the same one.
I have an hdf5 database with 3 keys (features, image_ids, index). The image_ids and index each have 1000 entries.
The problem is, while I can get the 10th image_ids via:
dbhdf5 ["image_ids"][10]
>>> u'image001.jpg'
I want to do the reverse, i.e. find the index by passing the image name. Something like:
dbhdf5 ["image_ids"="image001.jpg"]
or
dbhdf5 ["image_ids"]["image001.jpg"]
or
dbhdf5 ['index']['image001.jpg']
I've tried every variation I can think of, but can't seem to find a way to retrieve the index of an image, given it's id. I get errors like 'Field name only allowed for compound types'
What you are trying is not possible. HDF5 works by storing arrays, that are accessed via numerical indices.
Supposing that you also manage the creation of the file, you can store your data in separate named arrays:
\index
\-- image001.jpg
\-- image002.jpg
...
\features
\-- image001.jpg
\-- image002.jpg
...
So you can access them via names:
dbhdf5['features']['image001.jpg']
If the files are generated by someone else, you have to store the keys yourself, for instance with a dict:
lookup = {}
for i, key in enumerate(dbhdf5['image_ids'][:]):
lookup[key] = i
and access them via this indirection
dbhdf5['index'][lookup['image001.jpg']]
I am poking at XBRL documents trying to get my head around how to effectively extract and use the data. One thing I have been struggling with is making sure I use the context information correctly. Below is a snippet from one of the documents I am playing with (this is from Mattel's latest 10-K)
I want to be able to efficiently collect the context key value pairs as they are important to help align the 'real' data' Here is an example of a context element
- <context id="eol_PE6050----0910-K0010_STD_0_20091231_0">
- <entity>
<identifier scheme="http://www.sec.gov/CIK">0000063276</identifier>
</entity>
- <period>
<instant>2009-12-31</instant>
</period>
</context>
When I started this I thought that if there was a parent-child relationship I should be able to get the attributes, keys, values and text of all the children directly from applying a method (?) to the parent. But the children retain their independence though they can be found from the parent. What I mean is that if the children have attributes, keys, values and or text those constructs cannot be directly accessed from the parent instead you have to determine/identify the children and from the children access the data or metadata that is needed.
I am not fully certain why this block of code is a good starting point:
from lxml import etree
test_tree=etree.parse(r'c:\temp\test_xml\mat-20091231.xml')
tree_list=[p for p in test_tree.getiterator()
so my tree_list is a list of the elements that were determined to exist in my xml file
Because there were only 664 items in my tree_list I made the very bad assumption that all of the elements within a parent were subsumed in the parent so I kept trying to access the entity, period and instant by referencing just those elements (not their children)
for each in tree_list:
if 'context' in each.tag:
contextlist.append(each)
That is I kept applying different methods to the items in the contextlist and got really frustrated. Finally while I was writing out the question I was trying to get some help figuring out what method would give me the entity and period I just decided to try
children=[c for c in contextlist[0].iterchildren()]
so my list children has all of the children from the first item in my contextlist
One of the children is the entity element, the other is the period element
Now, it should be that each of those children have a child, the entity element has an identifier child element and the period element has an instant child element
This is getting much more complicated than it seemed this morning.
I have to know the details that are reported by the context elements to correctly evaluate and operate on the real data. It seems like I have to test each of the children of the context elements Is there a faster more efficient way to get those values? Rephrased, is there a way to have some element and create a data structure that contains all of its children, and grandchildren etc without having to do a fair amount of try else statements
Once I have them I can start building a data dictionary and assign data elements to particular entries based on the context. So getting the context elements efficiently and completely is critical to my task.
Using the element-tree interface (which lxml also supports), getiterator iterates over all the nodes in the subtree rooted at the current element.
So, [list(c.getiterator()) for c in contextlist] gives you the list of lists you want (or you may want to keep c in the resulting list to avoid having to zip it with contextlist later, i.e. diretly make a list of tuples [(c, list(c.getiterator())) for c in contextlist], depending on your intended use).
Note in passing that a listcomp of the exact form [x for x in whatever] never makes much sense -- use list(whatever), instead, to turn whatever other iterable into a list.
I'm using python for my shopping cart class which has a list of items. When a customer wants to edit an item, I need to pass the JavaScript front-end some way to refer to the item so that it can call AJAX methods to manipulate it.
Basically, I need a simple way to point to a particular item that isn't its index, and isn't a reference to the object itself.
I can't use an index, because another item in the list might be added or removed while the identifier is "held" by the front end. If I were to pass the index forward, if an item got deleted from the list then that index wouldn't point to the right object.
One solution seems to be to use UUIDs, but that seems particularly heavyweight for a very small list. What's the simplest/best way to do this?
Instead of using a list, why not use a dictionary and use small integers as the keys? Adding and removing items from the dictionary will not change the indices into the dictionary. You will want to keep one value in the dictionary that lets you know what the next assigned index will be.
A UUID seems perfect for this. Why don't you want to do that?
Do the items have any sort of product_id? Can the shopping cart have more than one of the same product_id, or does it store a quantity? What I'm getting at is: If product_id's in the cart are unique, you can just use that.