How to get 10 random GAE ndb entities?

How to get 10 random GAE ndb entities? - python

I have the following class to keep my records:
class List(ndb.Model):
'''
Index
Key: sender
'''
sender = ndb.StringProperty()
...
counter = ndb.IntegerProperty(default=0)
ignore = ndb.BooleanProperty(default=False)
added = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
updated = ndb.DateTimeProperty(auto_now=True, indexed=False)
The following code is used to return all entities I need:
entries = List.query()
entries = entries.filter(List.counter > 5)
entries = entries.filter(List.ignore == False)
entries = entries.fetch()
How should I modify the code to get 10 random records from entries? I am planning to have a daily cron task to extract random records, so they should be really random. What is the best way to get these records (to minimize number of read operations)?
I don't think that the following code is the best:
entries = random.sample(entries, 10)

Well after reading the comments the only improvement you can make as far I can see is to fetch the keys only and limit if possible.
Haven't tested but like so
list_query = List.query()
list_query = list_query.filter(List.counter > 5)
list_query = list_query.filter(List.ignore == False)
list_keys = list_query.fetch(keys_only=True) # maybe put a limit here.
list_keys = random.sample(list_keys, 10)
lists = [list_key.get() for list_key in list_keys]

Related

Shopify Python - Set inventory quantity

I'm trying to set inventory quantity of a product in shopify using the Shopify Python Api.
As i understand it, i need to set the 'inventory_level' of the 'inventory_item' that belongs to the product, but after a few days of searching and testing i still have no luck.
Where i'm at
I have my products showing up in my store with all the data but the inventory quantity.
I'm not sure how to proceed as there's not a whole lot of documentation.
Here's my code for creating a product
def CreateShopifyProduct(data):
# CREATE PRODUCT
product = shopify.Product()
# Add stuff to product, variant and inventoryItem here
product.title = data['title']
#product.status = ""
#product.tags = data['tags']
product.body_html = data['description']
if 'catagory' in data:
product.product_type = data['category']
if 'vendor' in data:
product.vendor = data['vendor']
if 'image_url' in data:
image_path = data['image_url']
image = shopify.Image()
image.src = image_path
product.images = [image]
else:
try:
image = GetLocalImageFiles(data['id'])
product.images = [image]
except:
print("No local images found")
success = product.save() #returns false if the record is invalid
# CREATE VARIANT
variant = shopify.Variant()
if 'ean' in data:
variant.barcode = data['ean']
variant.price = data['gross_price']
variant.weight = data['weight']
#variant.count = data['inventory']
variant.inventory_management = 'shopify'
product.variants = [variant]
variant.product_id = product.id
s = variant.save()
success = product.save() #returns false if the record is invalid
# CREATE INVENTORYITEM
inventoryItem = shopify.InventoryItem()
#inventoryItem = variant.inventory_item
inventoryItem.tracked = True
inventoryItem.id = product.id
variant.inventory_quantity = data['inventory']
inventoryItem.inventory_quantity = data['inventory']
variant.inventory_item = inventoryItem
s = variant.save()
success = product.save()
#ii = inventoryItem.save() # this returns 406
#inv_level = shopify.InventoryLevel.find(inventory_item_ids=6792364982390, location_ids=61763518582)
#quantity = inv_level[0].__dict__['attributes']['available']
#shopify.InventoryLevel.set(location_id=61763518582, inventory_item_id=variant.inventory_item.id, available=data['inventory'])
#shopify.InventoryLevel.connect(61763518582, variant.inventory_item.id)
if product.errors:
#something went wrong, see new_product.errors.full_messages() for example
print("error")
print(product.errors.full_messages())
If i try to set the InventoryLevel with
shopify.InventoryLevel.set(61527654518, inventoryItem.id, 42)
# or
shopify.InventoryLevel.set(location_id=61527654518, inventory_item_id=inventoryItem.id, available=17)
I recieve a
pyactiveresource.connection.ResourceNotFound: Not Found: https://domain.myshopify.com/admin/api/2022-07/inventory_levels/set.json

You need three things to update an inventory level. One, you need a valid location ID. Two, you need the inventory item ID. Finally, you need the amount to adjust inventory to, that will adjust the inventory there to match your needs.
You should really play at the command-line and ensure you can quickly get the information you need, then try your updates. In other words, ensure you are getting a good location ID, inventory item ID and finally, that you know the amount of inventory already in Shopify. Since you have to calculate a delta change, these are the minimum steps most people take.
Note that once you get good at doing one item, you'll realize Shopify also accepts up to 100 at a time, making your updates a lot faster.

Reduce time complexity for a working model (python3)

I have a working model for a chat application. The requirement is such that upon service restart, we design an in-mem mapper and fetch the first page details for each DM / group from that mapper based on the ID.
The working model is as follows:
'''
RECEIVER_SENDER_MAPPER = {"61e7dbcf9edba13755a4eb07" : {"61e7a5559edba13755a4ea65":[{},{},{},{},first page entries(25)],
"61de751742fc165ec8b729c9":[{},{},{},{},first page entries(25)]},
"61e7a5559edba13755a4ea65" : {"61e7dbcf9edba13755a4eb07":[{},{},{},{},first page entries(25)],
"61de751742fc165ec8b729c9":[{},{},{},{},first page entries(25)]}
}
'''
RECEIVER_SENDER_MAPPER = {}
def sync_to_inmem_from_db():
global RECEIVER_SENDER_MAPPER
message = db.messages.find_one()
if message:
#Fetch all users from db
users = list(db.users.find({},{"_id":1, "username":1}))
prepared_data = {}
counter = 0
for user in users:
counter += 1
#find all message groups which user is a part of
user_channel_ids = list(db.message_groups.find({"channel_members":{"$in":[user["_id"]]}},{"_id":1, "channel_name":1}))
combined_list = user_channel_ids + users
users_mapped_data = {}
for x in combined_list:
counter += 1
if x["_id"] == user["_id"]:
continue
find_query = {"receiver_type":"group", "receiver_id":x["_id"], "pid":"0"}
if x.get("username"):
find_query = {"pid":"0", "receiver_type":"user"}
messages = list(db.messages.find(find_query).sort("created_datetime",
-1).limit(50))
if messages:
users_mapped_data[x["_id"]] = messages
prepared_data[user["_id"]] = users_mapped_data
RECEIVER_SENDER_MAPPER = prepared_data
if not RECEIVER_SENDER_MAPPER:
sync_to_inmem_from_db()
The value of the counter for 70 users and 48 message groups is : 5484, It takes close to 9 mins to create the RECEIVER_SENDER_MAPPER.
I have to reduce this atleast to 1/4th of the value
One optimization i found was, since group messages will be same for all the users of the particular group, i can just create a dictionary this way:
all_channels = list(db.message_groups.find())
channels_data = {channel["_id"] : list(db.messages.find({"receiver_id":channel["_id"]}).limit(50)) for channel in all_channels}
But here again, while looping the users, i have to again loop the groups to find if the "user" is a part of that group or not.!
Any idea to reduce the complexity of this ? Thanks in advance.

Can't make apache beam write outputs to bigquery when using DataflowRunner

I'm trying to understand why this pipeline writes no output to BigQuery.
What I'm trying to achieve is to calculate the USD index for the last 10 years, starting from different currency pairs observations.
All the data is in BigQuery and I need to organize it and sort it in a chronollogical way (if there is a better way to achieve this, I'm glad to read it because I think this might not be the optimal way to do this).
The idea behing the class Currencies() is to start grouping (and keep) the last observation of a currency pair (eg: EURUSD), update all currency pair values as they "arrive", sort them chronologically and finally get the open, high, low and close value of the USD index for that day.
This code works in my jupyter notebook and in cloud shell using DirectRunner, but when I use DataflowRunner it does not write any output. In order to see if I could figure it out, I tried to just create the data using beam.Create() and then write it to BigQuery (which it worked) and also just read something from BQ and write it on other table (also worked), so my best guess is that the problem is in the beam.CombineGlobally part, but I don't know what it is.
The code is as follows:
import logging
import collections
import apache_beam as beam
from datetime import datetime
SYMBOLS = ['usdjpy', 'usdcad', 'usdchf', 'eurusd', 'audusd', 'nzdusd', 'gbpusd']
TABLE_SCHEMA = "date:DATETIME,index:STRING,open:FLOAT,high:FLOAT,low:FLOAT,close:FLOAT"
class Currencies(beam.CombineFn):
def create_accumulator(self):
return {}
def add_input(self,accumulator,inputs):
logging.info(inputs)
date,currency,bid = inputs.values()
if '.' not in date:
date = date+'.0'
date = datetime.strptime(date,'%Y-%m-%dT%H:%M:%S.%f')
data = currency+':'+str(bid)
accumulator[date] = [data]
return accumulator
def merge_accumulators(self,accumulators):
merged = {}
for accum in accumulators:
ordered_data = collections.OrderedDict(sorted(accum.items()))
prev_date = None
for date,date_data in ordered_data.items():
if date not in merged:
merged[date] = {}
if prev_date is None:
prev_date = date
else:
prev_data = merged[prev_date]
merged[date].update(prev_data)
prev_date = date
for data in date_data:
currency,bid = data.split(':')
bid = float(bid)
currency = currency.lower()
merged[date].update({
currency:bid
})
return merged
def calculate_index_value(self,data):
return data['usdjpy']*data['usdcad']*data['usdchf']/(data['eurusd']*data['audusd']*data['nzdusd']*data['gbpusd'])
def extract_output(self,accumulator):
ordered = collections.OrderedDict(sorted(accumulator.items()))
index = {}
for dt,currencies in ordered.items():
if not all([symbol in currencies.keys() for symbol in SYMBOLS]):
continue
date = str(dt.date())
index_value = self.calculate_index_value(currencies)
if date not in index:
index[date] = {
'date':date,
'index':'usd',
'open':index_value,
'high':index_value,
'low':index_value,
'close':index_value
}
else:
max_value = max(index_value,index[date]['high'])
min_value = min(index_value,index[date]['low'])
close_value = index_value
index[date].update({
'high':max_value,
'low':min_value,
'close':close_value
})
return index
def main():
query = """
select date,currency,bid from data_table
where date(date) between '2022-01-13' and '2022-01-16'
and currency like ('%USD%')
"""
options = beam.options.pipeline_options.PipelineOptions(
temp_location = 'gs://PROJECT/temp',
project = 'PROJECT',
runner = 'DataflowRunner',
region = 'REGION',
num_workers = 1,
max_num_workers = 1,
machine_type = 'n1-standard-1',
save_main_session = True,
staging_location = 'gs://PROJECT/stag'
)
with beam.Pipeline(options = options) as pipeline:
inputs = (pipeline
| 'Read From BQ' >> beam.io.ReadFromBigQuery(query=query,use_standard_sql=True)
| 'Accumulate' >> beam.CombineGlobally(Currencies())
| 'Flat' >> beam.ParDo(lambda x: x.values())
| beam.io.Write(beam.io.WriteToBigQuery(
table = 'TABLE',
dataset = 'DATASET',
project = 'PROJECT',
schema = TABLE_SCHEMA))
)
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
main()
They way I execute this is from shell, using python3 -m first_script (is this the way I should run this batch jobs?).
What I'm missing or doing wrong? This is my first attemp to use Dataflow, so i'm probably making several mistakes in the book.

For whom it may help: I faced a similar problem but I already used the same code for a different flow that had a pubsub as input where it worked flawless instead a file based input where it simply did not. After a lot of experimenting I found that in the options I changed the flag
options = PipelineOptions(streaming=True, ..
to
options = PipelineOptions(streaming=False,
as of course it is not a streaming source, it's a bounded source, a batch. After I set this flag to true I found my rows in the BigQuery table. After it had finished it even stopped the pipeline as it where a batch operation. Hope this helps

A log file which contains information like < timestamp , customer-id , page-id ,list of titles>

Write a code to print all the unique customers visited in last hour
My try:
import datetime
def find_repeated_customer():
file_obj = open(" my file path","r")
customer_last_visit = {}
repeat_customer = set()
while line in file_obj:
timestamp,customer_id,page_id = line.split(" : ")
last_visit = customer_last_vist.get(customer_id,None)
if not last_visit:
customer_last_visit[customer_id] = last_visit
else:
# assuming time stamp looks like 2016-10-29 01:03:26.947000
year,month,date = timestamp.split(" ")[0].split("-")
current_visit = datetime.date(year,month,date)
day_diff = current_visit - last_visit
if day_diff >=1:
repeat_customer.add(customer_id)
customer_last_visit[customer_id] = current_visit
I am completely failing over in order to get my desired output. By doing this I am able to get repeated customers in last one day but how to get unique users?

You can't do this kind of manipulation in one pass. You have to pass once through lines to get customers, and only then You can check who came once. In another pass, You check if current customer is in list on once-customers and do something with him.

Using yield with multiple ndb.get_multi_async

I am trying to improve efficiency of my current query from appengine datastore. Currently, I am using a synchronous method:
class Hospital(ndb.Model):
name = ndb.StringProperty()
buildings= ndb.KeyProperty(kind=Building,repeated=True)
class Building(ndb.Model):
name = ndb.StringProperty()
rooms= ndb.KeyProperty(kind=Room,repeated=True)
class Room(ndb.Model):
name = ndb.StringProperty()
beds = ndb.KeyProperty(kind=Bed,repeated=True)
class Bed(ndb.Model):
name = ndb.StringProperty()
.....
Currently I go through stupidly:
currhosp = ndb.Key(urlsafe=valid_hosp_key).get()
nbuilds = ndb.get_multi(currhosp.buildings)
for b in nbuilds:
rms = ndb.get_multi(b.rooms)
for r in rms:
bds = ndb.get_multi(r.beds)
for b in bds:
do something with b object
I would like to transform this into a much faster query using get_multi_async
My difficulty is in how I can do this?
Any ideas?
Best
Jon

using the given structures above, it is possible, and was confirmed that you can solve this with a set of tasklets. It is a SIGNIFICANT speed up over the iterative method.
#ndb.tasklet
def get_bed_info(bed_key):
bed_info = {}
bed = yield bed_key.get_async()
format and store bed information into bed_info
raise ndb.Return(bed_info)
#nbd.tasklet
def get_room_info(room_key):
room_info = {}
room = yield room_key.get_async()
beds = yield map(get_bed_info,room.beds)
store room info in room_info
room_info["beds"] = beds
raise ndb.Return(room_info)
#ndb.tasklet
def get_building_info(build_key):
build_info = {}
building = yield build_key.get_async()
rooms = yield map(get_room_info,building.rooms)
store building info in build_info
build_info["rooms"] = rooms
raise ndb.Return(build_info)
#ndb.toplevel
def get_hospital_buildings(hospital_object):
buildings = yield map(get_building_info,hospital_object.buildings)
raise ndb.Return(buildings)
Now comes the main call from the hospital function where you have the hospital object (hosp).
hosp_info = {}
buildings = get_hospital_buildings(hospital_obj)
store hospital info in hosp_info
hosp_info["buildings"] = buildings
return hosp_info
There you go! It is incredibly efficient and lets the schedule complete all the information in the fastest possible manner within the GAE backbone.

You can do something with query.map(). See https://developers.google.com/appengine/docs/python/ndb/async#tasklets and https://developers.google.com/appengine/docs/python/ndb/queryclass#Query_map

Its impossible.
Your 2nd query (ndb.get_multi(b.rooms)) depends on the result of your first query.
So pulling it async dosnt work, as at this point the (first) result of the first query has to be avaiable anyway.
NDB does something like that in the background (it allready buffers the next items of ndb.get_multi(currhosp.buildings) while you process the first result).
However, you could use denormalization, i.e. keeping a big table with one entry per Building-Room-Bed pair, and pull your results from that table.
If you have more reads than writes to this table, this will get you a massive speed improvement (1 DB read, instead of 3).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get 10 random GAE ndb entities? - python

Related

Shopify Python - Set inventory quantity

Reduce time complexity for a working model (python3)

Can't make apache beam write outputs to bigquery when using DataflowRunner

A log file which contains information like < timestamp , customer-id , page-id ,list of titles>

Using yield with multiple ndb.get_multi_async

Categories

Resources