I have an issue, i want to save data to cache so when i need to get data i dont need it to get from database. But there is a data i need to get from database no matter what which is stock.
Illustration i need to save to cache from db:
ProductDetails:{
'product_name': ....,
'categories':....,
etc
}
but the stock product which i need to get is from the same db,i try to use multiple loop like this:
'''
products = queryset()
cache_key =f'test_detail_{parameter}'
information = {}
details = []
if not cache.get(cache_key):
for product in products:
information[product.id] = {
'product_name': product.name,
'date_input': product.date,
'weight':product.weight
}
cache.set(cache_key, information, duration)
information = cache.get(cache_key)
for key, value in information.items():
information[key][stock] = numpy.sum([x.stock for x in products if key == x.id ])
details.append(information[key])
return details
'''
is there any method more efficient and effective using only 1 Queryset because i'm using 2 Queryset (the first time is when i get the data to set cache, the second time is when i get the stock data)?
Thankss
Related
I have a flask app using app factory and sqlalchemy and have a class that updates all locations in a database. The structure is as follows:
clouds.py
def update_location_data(location_id, data):
# Goes through MySQL database and updates data where not equal
location = session.get(Location, location_id)
...
for index in (0, x)
location.data[index] = new_data
flag_modified(location, "data")
session.add(location)
session.commit()
def update_all_location_data(user_id):
# Gets all locations from tracked table using user id
# This returns a list of locations say locations
for location in locations:
lat = location.lat
long = location.long
params = {"lat": lat, "lon": long, ...}
data = requests.get(URL, params=params).json()
update_location_data(location.location_id, data)
This takes around 3.5 seconds to run for only 4 items in the location table. I'm worried this will significantly grow as the location table grows.
How would I go about optimizing this function call in the loop or optimize the whole thing so that it updates the table concurrently?
I'm trying to query all the Food values in the "Categories" attribute and the "review_count" attribute values that are at least 100. My first time working with scanning tables in DynamoDB through python. I need to use the table.scan function as well. This is what I have tried so far.
resp = table.scan(FilterExpression='(categories = cat1) AND' + '(review_count >= 100)',
ExpressionAttributeValues={
':cat1': 'Food',
})
Any help would be greatly appreciated. Thanks
Assuming table name is test
FilterExpression can't contain constants, should only have either table attributes like categories, review_count and placeholders like :cat1, :rc . So, 100 can be replaced with a variable :rc.
All placeholders should start with : , so, cat1 should be :cat1
table = dynamodb.Table('test')
response = table.scan(
FilterExpression= 'categories=:cat1 AND review_count>=:rc',
ExpressionAttributeValues= {
':cat1': "Food" ,
':rc': 100,
}
)
data = response['Items']
Important point to note on scan , from documentation
A single Scan operation reads up to the maximum number of items set
(if using the Limit parameter) or a maximum of 1 MB of data and then
apply any filtering to the results using FilterExpression.
I have 2 models, one with a list of clients and the other with a list of sales.
My intention is to add sales rank value to the clients queryset.
all_clients = Contactos.objects.values("id", "Vendedor", "codigo", 'Nombre', "NombrePcia", "Localidad", "FechaUltVenta")
sales = Ventas.objects.all()
Once loaded I aggregate all the sales per client summing the subtotal values of their sales and then order the result by their total sales.
sales_client = sales.values('cliente').annotate(
fact_total=Sum('subtotal'))
client_rank = sales_client .order_by('-fact_total')
Then I set the rank of those clients and store the value in a the "Rank" values in the same client_rank queryset.
a = 0
for rank in client_rank:
a = a + 1
rank['Rank'] = a
Everything fine up to now. When I print the results in the template I get the expected values in the "client_rank" queryset: "client name" + "total sales per client" + "Rank".
{'cliente': '684 DROGUERIA SUR', 'fact_total': Decimal('846470'), 'Rank': 1}
{'cliente': '699 KINE ESTETIC', 'fact_total': Decimal('418160'), 'Rank': 2}
etc....
The problem starts here
First we should take into account that not all the clients in the "all_clients" queryset have actual sales in the "sales" queryset. So I must find which ones do have sales, assign them the "Rank" value and a assign a standard value for the ones who don´t.
for subject in all_clients:
subject_code = str(client["codigo"])
try:
selected_subject = ranking_clientes.get(cliente__icontains=subject_code)
subject ['rank'] = selected_subject['Rank']
except:
subject ['rank'] = "Some value"
The Try always fails because "selected_subject" doesn´t seems to hace the "Rank" value. If I print the "selected_subject" I get the following:
{'cliente': '904 BAHIA BLANCA BASKET', 'fact_total': Decimal('33890')}
Any clues on why I´, lossing the "Rank" value? The original "client_rank" queryset still has that value included.
Thanks!
I presume that ranking_clientes is the same as client_rank.
The problem is that .get will always do a new query against the database. This means that any modifications you made to the dictionaries returned in the original query will not have been applied to the result of the get call.
You would need to iterate through your query to find the one you need:
selected_subject = next(client for client in ranking_clientes if subject_code in client.cliente)
Note, this is pretty inefficient if you have a lot of clients. I would rethink your model structure. Alternatively, you could look into using a database function to return the rank directly as part of the original query.
My use case;
My code runs multiple Scrapy spiders on different US counties to collect property data on every property. This is done by looping through a list of PINS/Parcels(100k to 200k) which are appended to the same URLS over and over, collecting sales data on each parcel or property, and storing that data on its respective county table one row at a time. My use case involves updating these tables frequently(once a week or so) to collect trends on sales data. Out of 100K properties, it can be that only a few acquired new sales records, but I would not know unless I went through all of them.
I currently began implementing this via pipeline below which essentially accomplishes getting the data in the table on the first run when the table is a clean slate. However when re-running to refresh data, I'm obviously unable to insert rows that contain the same unique ID and would need to update the row instead. My unique ID for each data point is its parcel number.
My questions-
1. What is the optimal method to update a database table that
requires a full refresh(all rows) frequently?
My guess so far based on research I've done is replacing the old table with a new temporary table. This is because it would be quicker(I think) to insert all data into a new table than to query each item in the old table, see if it has changed, and if changed, modify that row. This can be accomplished by inserting all data into the temporary table first, then replacing the old table with the new one.
If my method of implementation is optimal, how would I go about implementing this?
Should I use some kind of data migration module(panda?)- What would happen if I dropped the old table, and program was interrupted at this point prior to new table replacing it.
class PierceDataPipeline(object):
def __init__(self):
"""
Initializes database connection and sessionmaker.
Creates tables.
"""
engine = db_connect()
create_table(engine)
self.Session = sessionmaker(bind=engine)
def process_item(self,item,spider):
"""
This method is called for every item pipeline component
"""
session = self.Session()
propertyDataTable = PierceCountyPropertyData()
propertyDataTable.parcel = item["parcel"]
propertyDataTable.mailing_address = item["mailing_address"]
propertyDataTable.owner_name = item["owner_name"]
propertyDataTable.county = item["county"]
propertyDataTable.site_address = item["site_address"]
propertyDataTable.property_type = item["property_type"]
propertyDataTable.occupancy = item["occupancy"]
propertyDataTable.year_built = item["year_built"]
propertyDataTable.adj_year_built = item["adj_year_built"]
propertyDataTable.units = item["units"]
propertyDataTable.bedrooms = item["bedrooms"]
propertyDataTable.baths = item["baths"]
propertyDataTable.siding_type = item["siding_type"]
propertyDataTable.stories = item["stories"]
propertyDataTable.lot_square_footage = item["lot_square_footage"]
propertyDataTable.lot_acres = item["lot_acres"]
propertyDataTable.current_balance_due = item["current_balance_due"]
propertyDataTable.tax_year_1 = item["tax_year_1"]
propertyDataTable.tax_year_2 = item["tax_year_2"]
propertyDataTable.tax_year_3 = item["tax_year_3"]
propertyDataTable.tax_year_1_assessed = item["tax_year_1_assessed"]
propertyDataTable.tax_year_2_assessed = item["tax_year_2_assessed"]
propertyDataTable.tax_year_3_assessed = item["tax_year_3_assessed"]
propertyDataTable.sale1_price = item["sale1_price"]
propertyDataTable.sale1_date = item["sale1_date"]
propertyDataTable.sale2_date = item["sale2_date"]
propertyDataTable.sale2_price = item["sale2_price"]
try:
session.add(propertyDataTable)
session.commit()
except:
session.rollback()
raise
finally:
session.close()
return item
I have almost 100 million product names present in DB.I am displaying 100 products in UI each time & after scrolling showing next 100 & so on. For this I have used Django RawQuery as my database(mysql) doesn't support distinct functionality.
Here 'fetch' is callback function used in otherfile:
def fetch(query_string, *query_args):
conn = connections['databaseName']
with conn.cursor() as cursor:
cursor.execute(query_string, query_args)
record = dictfetchall(cursor)
return record
Here is the main call in views.py
So sample raw query code snippet:
record= fetch("select productname from abc")
Here if I am going to apply sorting criteria to the records
record= fetch("select productname from abc orderby name ASC")
Same doing for descending as well. As a result it takes so much time to display the sorted products.
What I want is like I will query 1 time & will store in a python object then will start applying ascending or descending.
So that for the first time when it loads, it will take some time but then applying sorting criteria it won't go to database to apply sorting for each time sort is hitted.
Overally want to say increase performance in case of sorting the records.
I think what you are looking for is pagination. This is an essential technique when you want to display data in batches(pages).
from django.core.paginator import Paginator, EmptyPage, PageNotAnInteger
def listing(request):
query_string = 'your query'
query_args = []
conn = connections['databaseName']
with conn.cursor() as cursor:
cursor.execute(query_string, *query_args)
all_records = dictfetchall(cursor)
paginator = Paginator(all_records, 100) # Show 100 records per page
page = request.GET.get('page')
try:
records = paginator.page(page)
except PageNotAnInteger:
# If page is not an integer, deliver first page.
records = paginator.page(1)
except EmptyPage:
# If page is out of range (e.g. 9999), deliver last page of results.
records = paginator.page(paginator.num_pages)
return records
Whenever you make a request you should include the page you want to display (e.g. 1,2,3...) in the url parameters.
Example GET http://localhost/products/?page=1
In terms of logic, your javascript should display the first page and have a counter that hold the next page you should request, after the user scrolls make an AJAX request to get the second page and increase the page counter, etc...
EDIT: As for the sorting matter you can also use javascript to sort your data
Below is my try & I got the desired answer.
I fetch the data from database & is stored in the form of array of dictionary in a list.
Assuming the data stored format in a list:
l = [{'productName'='soap',id=1},{'productName'='Laptop',id=2}]
Below code snippet is the solution to sort dependening upon key:
from operator import itemgetter
for Ascending
res= sorted(l, key=itemgetter('name'))
For Descending
res= sorted(l, key=itemgetter('name'),reverse=True)