FQL query limits the results to 25 - python

Hi I am trying to fetch total no of movies i liked but graph API restricts the results to 25. I have tried using the Timestamp until and also LIMIT keyword but still only 25 movies are getting fetched. My code goes like this
query = "https://graph.facebook.com/USER_NAME?limit=200&access_token=%s&fields=name,movies" % TOKEN
result = requests.get(query)
data = json.loads(result.text)
fd = open('Me','a')
for key in data:
if key=='movies':
fd.write("KEY: MOVIES\n")
#print data[key]
count = len((data[unicode(key)])['data'])
fd.write("COUNT = "+str(count)+"\n")
for i in (data[unicode(key)])['data']:
fd.write((i['name']).encode('utf8'))
fd.write("\n")
Please help me in fixing it
THANKS IN ADVANCE

Since the platform update on October the 2nd, there are only 25 likes returned at once via the Graph API (see https://developers.facebook.com/roadmap/completed-changes/), and Movies are Likes. You can either implement result pagination, or use the FQL table page_fan with the following FQL:
select page_id, name from page where page_id in (select page_id from page_fan where uid=me() and profile_section = 'movies')
You have to count the entries in your application, FB has no aggregation functionality.

Related

Elastic Search retrieve all records

I am using elastic search as a database which has millions of records. I am using the below code to retrieve the data but it is not giving me complete data.
response = requests.get(http://localhost:9200/cityindex/_search?q=:&size=10000)
This is giving me only 10000 records.
when I am extending the size to the size of doc count(which is 784234) it's throwing an error.
'Result window is too large, from + size must be less than or equal
to: [10000] but was [100000]. See the scroll API for a more efficient
way to request large data sets. This limit can be set by changing the
[index.max_result_window] index level setting.'}]
Context what I want to do.
I want to extract all the data of a particular index and then do the analysis on that(I am looking to get the whole data in JSON format). I am using python for my project.
Can someone please help me with this?
You need to scroll over pages ES returns to you and store them into a list/array.
You can use elastic search library for the same
example python code
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts="localhost", port=9200, timeout=30)
page = es.search(
index = 'index_name',
scroll = '5m',
search_type = 'scan',
size = 5000)
sid = page['_scroll_id']
scroll_size = page['hits']['total']
print scroll_size
records = []
while (scroll_size > 0):
print "Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
# Update the scroll ID
sid = page['_scroll_id']
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
for rec in page['hits']['hits']:
ele = rec['_source']
records.append(ele)

How to use date filter correctly on aws dynamodb boto3

I want to retrieve items in a table in dynamodb. then i will add this data to below the last data of the table in big query.
client = boto3.client('dynamodb')
table = dynamodb.Table('table')
response = table.scan(FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query))
#first part
data = response['Items']
#second part
while response.get('LastEvaluatedKey'):
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
df=pd.DataFrame(data)
df=df[['query','created_at','result_count','id','isfuzy']]
# load df to big query
.....
the date filter working true but in while loop session (second part), the code retrieve all items.
after first part, i have 100 rows. but after this code
while response.get('LastEvaluatedKey'):
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
data.extend(response['Items'])
i have 500.000 rows. i can use only first part. but i know there is a 1 mb limit, thats why i am using second part. how can i get data in given date range
Your 1st scan API call has a FilterExpression set, which applies your data filter:
response = table.scan(FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query))
However, the 2nd scan API call doesn't have one set and thus is not filtering your data:
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
Apply the FilterExpression to both calls:
while response.get('LastEvaluatedKey'):
response = table.scan(
ExclusiveStartKey=response['LastEvaluatedKey'],
FilterExpression=Attr('created_at').gt(max_date_of_the_table_in_big_query)
)
data.extend(response['Items'])

Why does a Where clause cause more data to be returned?

I have a script that uses shareplum to get items from a very large and growing SharePoint (SP) list. Because of the size, I encountered the dreaded 5000 item limit set in SP. To get around that, I tried to page the data based on the 'ID' with a Where clause on the query.
# this is wrapped in a while.
# the idx is updated to the latest max if the results aren't empty.
df = pd.DataFrame(columns=cols)
idx = 0
query = {'Where': [('Gt', 'ID', str(idx))], 'OrderBy': ['ID']}
data = sp_list.GetListItems(view, query=query, row_limit=4750)
df = df.append(pd.DataFrame(data[0:]))
That seemed to work but, after I added the Where, it started returning rows not visible on the SP web list. For example, the minimum ID on the web is, say, 500 while shareplum returns rows starting at 1. It also seems to be pulling in rows that are filtered out on the web. For example, it includes column values not included on the web. If the Where is removed, it brings back the exact list viewed on the web.
What is it that I'm getting wrong here? I'm brand new to shareplum; I looked at the docs but they don't go into much detail and all the examples are rather trivial.
Why does a Where clause cause more data to be returned?
After further investigation, it seems shareplum will ignore any filters applied to the list to create the view when a query is provided to GetListItems. This is easily verified by removing the query param.
As a workaround, I'm now paging 'All Items' with a row_limit and query as below. This at least lets me get all the data and do any further filtering/grouping in python.
df = pd.DataFrame(columns=cols)
idx = 0
more = True
while more:
query = {'Where': [('Gt', 'ID', str(idx))]}
# Page 'All Items' based on 'ID' > idx
data = sp_list.GetListItems('All Items', query=query, row_limit=4500)
data_df = pd.DataFrame(data[0:])
if not data_df.empty:
df = df.append(data_df)
ids = pd.to_numeric(data_df['ID'])
idx = ids.max()
else:
more = False
As to why shareplum behaves this way is still an open question.

Django: queryset "loosing" a value

I have 2 models, one with a list of clients and the other with a list of sales.
My intention is to add sales rank value to the clients queryset.
all_clients = Contactos.objects.values("id", "Vendedor", "codigo", 'Nombre', "NombrePcia", "Localidad", "FechaUltVenta")
sales = Ventas.objects.all()
Once loaded I aggregate all the sales per client summing the subtotal values of their sales and then order the result by their total sales.
sales_client = sales.values('cliente').annotate(
fact_total=Sum('subtotal'))
client_rank = sales_client .order_by('-fact_total')
Then I set the rank of those clients and store the value in a the "Rank" values in the same client_rank queryset.
a = 0
for rank in client_rank:
a = a + 1
rank['Rank'] = a
Everything fine up to now. When I print the results in the template I get the expected values in the "client_rank" queryset: "client name" + "total sales per client" + "Rank".
{'cliente': '684 DROGUERIA SUR', 'fact_total': Decimal('846470'), 'Rank': 1}
{'cliente': '699 KINE ESTETIC', 'fact_total': Decimal('418160'), 'Rank': 2}
etc....
The problem starts here
First we should take into account that not all the clients in the "all_clients" queryset have actual sales in the "sales" queryset. So I must find which ones do have sales, assign them the "Rank" value and a assign a standard value for the ones who don´t.
for subject in all_clients:
subject_code = str(client["codigo"])
try:
selected_subject = ranking_clientes.get(cliente__icontains=subject_code)
subject ['rank'] = selected_subject['Rank']
except:
subject ['rank'] = "Some value"
The Try always fails because "selected_subject" doesn´t seems to hace the "Rank" value. If I print the "selected_subject" I get the following:
{'cliente': '904 BAHIA BLANCA BASKET', 'fact_total': Decimal('33890')}
Any clues on why I´, lossing the "Rank" value? The original "client_rank" queryset still has that value included.
Thanks!
I presume that ranking_clientes is the same as client_rank.
The problem is that .get will always do a new query against the database. This means that any modifications you made to the dictionaries returned in the original query will not have been applied to the result of the get call.
You would need to iterate through your query to find the one you need:
selected_subject = next(client for client in ranking_clientes if subject_code in client.cliente)
Note, this is pretty inefficient if you have a lot of clients. I would rethink your model structure. Alternatively, you could look into using a database function to return the rank directly as part of the original query.

Django+python: Use Order by to Django RawQuery objects?

I have almost 100 million product names present in DB.I am displaying 100 products in UI each time & after scrolling showing next 100 & so on. For this I have used Django RawQuery as my database(mysql) doesn't support distinct functionality.
Here 'fetch' is callback function used in otherfile:
def fetch(query_string, *query_args):
conn = connections['databaseName']
with conn.cursor() as cursor:
cursor.execute(query_string, query_args)
record = dictfetchall(cursor)
return record
Here is the main call in views.py
So sample raw query code snippet:
record= fetch("select productname from abc")
Here if I am going to apply sorting criteria to the records
record= fetch("select productname from abc orderby name ASC")
Same doing for descending as well. As a result it takes so much time to display the sorted products.
What I want is like I will query 1 time & will store in a python object then will start applying ascending or descending.
So that for the first time when it loads, it will take some time but then applying sorting criteria it won't go to database to apply sorting for each time sort is hitted.
Overally want to say increase performance in case of sorting the records.
I think what you are looking for is pagination. This is an essential technique when you want to display data in batches(pages).
from django.core.paginator import Paginator, EmptyPage, PageNotAnInteger
def listing(request):
query_string = 'your query'
query_args = []
conn = connections['databaseName']
with conn.cursor() as cursor:
cursor.execute(query_string, *query_args)
all_records = dictfetchall(cursor)
paginator = Paginator(all_records, 100) # Show 100 records per page
page = request.GET.get('page')
try:
records = paginator.page(page)
except PageNotAnInteger:
# If page is not an integer, deliver first page.
records = paginator.page(1)
except EmptyPage:
# If page is out of range (e.g. 9999), deliver last page of results.
records = paginator.page(paginator.num_pages)
return records
Whenever you make a request you should include the page you want to display (e.g. 1,2,3...) in the url parameters.
Example GET http://localhost/products/?page=1
In terms of logic, your javascript should display the first page and have a counter that hold the next page you should request, after the user scrolls make an AJAX request to get the second page and increase the page counter, etc...
EDIT: As for the sorting matter you can also use javascript to sort your data
Below is my try & I got the desired answer.
I fetch the data from database & is stored in the form of array of dictionary in a list.
Assuming the data stored format in a list:
l = [{'productName'='soap',id=1},{'productName'='Laptop',id=2}]
Below code snippet is the solution to sort dependening upon key:
from operator import itemgetter
for Ascending
res= sorted(l, key=itemgetter('name'))
For Descending
res= sorted(l, key=itemgetter('name'),reverse=True)

Categories