I'm working with an api that gives me 61 items that I include in a discord embed in a for loop.
As all of this is planned to be included into a discord bot using pagination from DiscordUtils, I need to make it so it male an embed for each 10 entry to avoid a too long message / 2000 character message.
Currently what I use to do my loop is here: https://api.nepmia.fr/spc/ (I recomend the usage of a parsing extention for your browser or it will be a bit hard to read it)
But what I want to create is something that will look like that : https://api.nepmia.fr/spc/formated/
So I can iterate each range in a different embed and then use pagination.
I use TinyDB to generate the JSON files I shown before with this script:
import urllib.request, json
from shutil import copyfile
from termcolor import colored
from tinydb import TinyDB, Query
db = TinyDB("/home/nepmia/Myazu/db/db.json")
def api_get():
print(colored("[Myazu]","cyan"), colored("Fetching WynncraftAPI...", "white"))
try:
with urllib.request.urlopen("https://api.wynncraft.com/public_api.php?action=guildStats&command=Spectral%20Cabbage") as u1:
api_1 = json.loads(u1.read().decode())
count = 0
if members := api_1.get("members"):
print(colored("[Myazu]","cyan"),
colored("Got expecteded answer, starting saving process.", "white"))
for member in members:
nick = member.get("name")
ur2 = f"https://api.wynncraft.com/v2/player/{nick}/stats"
u2 = urllib.request.urlopen(ur2)
api_2 = json.loads(u2.read().decode())
data = api_2.get("data")
for item in data:
meta = item.get("meta")
playtime = meta.get("playtime")
print(colored("[Myazu]","cyan"),
colored("Saving playtime for player", "white"),
colored(f"{nick}...","green"))
db.insert({"username": nick, "playtime": playtime})
count += 1
else:
print(colored("[Myazu]","cyan"),
colored("Unexpected answer from WynncraftAPI [ERROR 1]", "white"))
except:
print(colored("[Myazu]","cyan"),
colored("Unhandled error in saving process [ERROR 2]", "white"))
finally:
print(colored("[Myazu]","cyan"),
colored(f"Finished saving data for", "white"),
colored(f"{count}", "green"),
colored("players.", "white"))
but this will only create a range like this : https://api.nepmia.fr/spc/
what I would like is something like this : https://api.nepmia.fr/spc/formated/
Thanks for your help!
PS: Sorry for your eyes I'm still new to Python so I know I don't do stuff really properly :s
To follow up from the comments, you shouldn't store items in your database in a format that is specific to how you want to return results from the database to a different API, as it will make it more difficult to query in other contexts, among other reasons.
If you want to paginate items from a database it's better to do that when you query it.
According to the docs, you can iterate over all documents in a TinyDB database just by iterating directly over the DB like:
for doc in db:
...
For any iterable you can use the enumerate function to associate an index to each item like:
for idx, doc in enumerate(db):
...
If you want the indices to start with 1 as in your examples you would just use idx + 1.
Finally, to paginate the results, you need some function that can return items from an iterable in fixed-sized batches, such as one of the many solutions on this question or elsewhere. E.g. given a function chunked(iter, size) you could do:
pages = enumerate(chunked(enumerate(db), 10))
Then list(pages) gives a list of lists of tuples like [(page_num, [(player_num, player), ...].
The only difference between a list of lists and what you want is you seem to want a dictionary structure like
{'range1': {'1': {...}, '2': {...}, ...}, 'range2': {'11': {...}, ...}}
This is no different from a list of lists; the only difference is you're using dictionary keys to give numerical indices to each item in a collection, rather than the indices being implict in the list structure. There's many ways you can go from a list of lists to this. The easiest I think is using a (nested) dict comprehension:
{f'range{page_num + 1}': {str(player_num + 1): player for player_num, player in page}
for page_num, page in pages}
This will give output in exactly the format you want.
Thanks #Iguananaut for your precious help.
In the end I made something similar from your solution using a generator.
def chunker(seq, size):
for i in range(0, len(seq), size):
yield seq[i:i+size]
def embed_creator(embeds):
pages = []
current_page = None
for i, chunk in enumerate(chunker(embeds, 10)):
current_page = discord.Embed(
title=f'**SPC** Last week online time',
color=3903947)
for elt in chunk:
current_page.add_field(
name=elt.get("username"),
value=elt.get("play_output"),
inline=False)
current_page.set_footer(
icon_url="https://cdn.discordapp.com/icons/513160124219523086/a_3dc65aae06b2cf7bddcb3c33d7a5ecef.gif?size=128",
text=f"{i + 1} / {ceil(len(embeds) / 10)}"
)
pages.append(current_page)
current_page = None
return pages
Using embed_creator I generate a list named pages that I can simply use with DiscordUtils paginator.
Related
So I have some 50 document ID's. My python veno list contains document ID's as shown below.
5ddfc565bd293f3dbf502789
5ddfc558bd293f3dbf50263b
5ddfc558bd293f3dbf50264f
5ddfc558bd293f3dbf50264d
5ddfc565bd293f3dbf502792
But when I am trying to delete those 50 document ID's then I am finding a hard time. Let me explain - I need to run my python script over and over again in order to delete all the 50 documents. The first time I run my script it will delete some 10, the next time I run then it deletes 18 and so on. My for loop is pretty simple as shown below
for i in veno:
vv = i[0]
db.Products2.delete_many({'_id': ObjectId(vv)})
If your list is just the ids, then you want:
for i in veno:
db.Products2.delete_many({'_id': ObjectId(i)})
full example:
from pymongo import MongoClient
from bson import ObjectId
db = MongoClient()['testdatabase']
# Test data setup
veno = [str(db.testcollection.insert_one({'a': 1}).inserted_id) for _ in range(50)]
# Quick peek to see we have the data correct
for x in range(3): print(veno[x])
print(f'Document count before delete: {db.testcollection.count_documents({})}')
for i in veno:
db.testcollection.delete_many({'_id': ObjectId(i)})
print(f'Document count after delete: {db.testcollection.count_documents({})}')
gives:
5ddffc5ac9a13622dbf3d88e
5ddffc5ac9a13622dbf3d88f
5ddffc5ac9a13622dbf3d890
Document count before delete: 50
Document count after delete: 0
I dont have any mongo instance to test but what about
veno = [
'5ddfc565bd293f3dbf502789',
'5ddfc558bd293f3dbf50263b',
'5ddfc558bd293f3dbf50264f',
'5ddfc558bd293f3dbf50264d',
'5ddfc565bd293f3dbf502792',
]
# Or for your case (Whatever you have in **veno**)
veno = [vv[0] for vv in veno]
####
db.Products2.delete_many({'_id': {'$in':[ObjectId(vv) for vv in veno]}})
If this doesnt work, then maybe this
db.Products2.remove({'_id': {'$in':[ObjectId(vv) for vv in veno]}})
From what I understand, delete_many's first argument is filter, so
its designed in such a way, that you dont delete particular documents
but instead documents that satisfies particular condition.
In above case, best is delete all documents at once by saying -> delete all documents whose _id is in ($in) the list [ObjectId(vv) for vv in veno]
I am trying to accomplish something like this:
Batch PCollection in Beam/Dataflow
The answer in the above link is in Java, whereas the language I'm working with is Python. Thus, I require some help getting a similar construction.
Specifically I have this:
p = beam.Pipeline (options = pipeline_options)
lines = p | 'File reading' >> ReadFromText (known_args.input)
After this, I need to create another PCollection but with a List of N rows of "lines" since my use case requires a group of rows. I can not operate line by line.
I tried a ParDo Function using variables for count associating with the counter N rows and after groupBy using Map. But these are reset every 1000 records, so it's not the solution I am looking for. I read the example in the link but I do not know how to do something like that in Python.
I tried saving the counters in Datastore, however, the speed difference between Dataflow reading and writing with Datastore is quite significant.
What is the correct way to do this? I don't know how else to approach it.
Regards.
Assume the grouping order is not important, you can just group inside a DoFn.
class Group(beam.DoFn):
def __init__(self, n):
self._n = n
self._buffer = []
def process(self, element):
self._buffer.append(element)
if len(self._buffer) == self._n:
yield list(self._buffer)
self._buffer = []
def finish_bundle(self):
if len(self._buffer) != 0:
yield list(self._buffer)
self._buffer = []
lines = p | 'File reading' >> ReadFromText(known_args.input)
| 'Group' >> beam.ParDo(Group(known_args.N)
...
I am using elasticsearch-dsl python library to connect to elasticsearch and do aggregations.
I am following code
search.aggs.bucket('per_date', 'terms', field='date')\
.bucket('response_time_percentile', 'percentiles', field='total_time',
percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()
This works fine but returns only 10 results in response.aggregations.per_ts.buckets
I want all the results
I have tried one solution with size=0 as mentioned in this question
search.aggs.bucket('per_ts', 'terms', field='ts', size=0)\
.bucket('response_time_percentile', 'percentiles', field='total_time',
percents=percentiles, hdr={"number_of_significant_value_digits": 1})
response = search.execute()
But this results in error
TransportError(400, u'parsing_exception', u'[terms] failed to parse field [size]')
I had the same issue. I finally found this solution:
s = Search(using=client, index="jokes").query("match", jks_content=keywords).extra(size=0)
a = A('terms', field='jks_title.keyword', size=999999)
s.aggs.bucket('by_title', a)
response = s.execute()
After 2.x, size=0 for all bucket results won't work anymore, please refer to this thread. Here in my example I just set the size equal 999999. You can pick a large number according to your case.
It is recommended to explicitly set reasonable value for size a number
between 1 to 2147483647.
Hope this helps.
This is a bit older but I ran into the same issue. What I wanted was basically an iterator that i could use to go through all aggregations that i got back (i also have a lot of unique results).
The best thing i found is to create a python generator like this
def scan_aggregation_results():
i=0
partitions=20
while i < partitions:
s = Search(using=elastic, index='my_index').extra(size=0)
agg = A('terms', field='my_field.keyword', size=999999,
include={"partition": i, "num_partitions": partitions})
s.aggs.bucket('my_agg', agg)
result = s.execute()
for item in result.aggregations.my_agg.buckets:
yield my_field.key
i = i + 1
# in other parts of the code just do
for item in scan_aggregation_results():
print(item) # or do whatever you want with it
The magic here is that elastic will automatically partition the number of results by 20, ie the number of partitions i define. I just have to set the size to something large enough to hold a single partition, in this case the result can be up to 20 million items large (or 20*999999). If you have much less items, like me, to return (like 20000) then you will just have 1000 results per query in your bucket, regardless that you defined a much larger size.
Using the generator construct as outlined above you can then even get rid of that and create your own scanner so to speak, iterating over all results individually, just what i wanted.
You should read the documentation.
So in your case, this should be like this :
search.aggs.bucket('per_date', 'terms', field='date')\
.bucket('response_time_percentile', 'percentiles', field='total_time',
percents=percentiles, hdr={"number_of_significant_value_digits": 1})[0:50]
response = search.execute()
We had our customer details spread over 4 legacy systems and have subsequently migrated all the data into 1 new system.
Our customers previously had different account numbers in each system and I need to check which account number has been used in the new system, which supports API calls.
I have a text file containing all the possible account numbers, structured like this:
30000001, 30000002, 30000003, 30000004
30010000, 30110000, 30120000, 30130000
34000000, 33000000, 32000000, 31000000
Where each row represents all the old account numbers for each customer.
I'm not sure if this is the best approach but I want to open the text file and create a nested list:
[['30000001', '30000002', '30000003', '30000004'], ['30010000', '30110000', '30120000', '30130000'], ['34000000', '33000000', '32000000', '31000000']]
Then I want to iterate over each list but to save on API calls, as soon as I have verified the new account number in a particular list, I want to break out and move onto the next list.
import json
from urllib2 import urlopen
def read_file():
lines = [line.rstrip('\n') for line in open('customers.txt', 'r')]
return lines
def verify_account(*customers):
verified_accounts = []
for accounts in customers:
for account in accounts:
url = api_url + account
response = json.load(urlopen(url))
if response['account_status'] == 1:
verified_accounts.append(account)
break
return verified_accounts
The main issue is when I read from the file it returns the data like below so I can't iterate over the individual accounts.
['30000001, 30000002, 30000003, 30000004', '30010000, 30110000, 30120000, 30130000', '34000000, 33000000, 32000000, 31000000']
Also, is there a more Pythonic way of using list comprehensions or similar to iterate and check the account numbers. There seems to be too much nesting being used for Python?
The final item to mention is that there are over 255 customers to check, well there is almost 1000. Will I be able to pass more than 255 parameters into a function?
What about this? Just use str.split():
l = []
with open('customers.txt', 'r') as f:
for i in f:
l.append([s.strip() for s in i.split(',')])
Output:
[['30000001', '30000002', '30000003', '30000004'],
['30010000', '30110000', '30120000', '30130000'],
['34000000', '33000000', '32000000', '31000000']]
How about this?
with open('customers.txt','r') as f:
final_list=[i.split(",") for i in f.read().replace(" ","").splitlines()]
print final_list
Output:
[['30000001', '30000002', '30000003', '30000004'],
['30010000', '30110000', '30120000', '30130000'],
['34000000', '33000000', '32000000', '31000000']]
Looking for a simple example of retrieving 500 items from dynamodb minimizing the number of queries. I know there's a "multiget" function that would let me break this up into chunks of 50 queries, but not sure how to do this.
I'm starting with a list of 500 keys. I'm then thinking of writing a function that takes this list of keys, breaks it up into "chunks," retrieves the values, stitches them back together, and returns a dict of 500 key-value pairs.
Or is there a better way to do this?
As a corollary, how would I "sort" the items afterwards?
Depending on you scheme, There are 2 ways of efficiently retrieving your 500 items.
1 Items are under the same hash_key, using a range_key
Use the query method with the hash_key
you may ask to sort the range_keys A-Z or Z-A
2 Items are on "random" keys
You said it: use the BatchGetItem method
Good news: the limit is actually 100/request or 1MB max
you will have to sort the results on the Python side.
On the practical side, since you use Python, I highly recommend the Boto library for low-level access or dynamodb-mapper library for higher level access (Disclaimer: I am one of the core dev of dynamodb-mapper).
Sadly, neither of these library provides an easy way to wrap the batch_get operation. On the contrary, there is a generator for scan and for query which 'pretends' you get all in a single query.
In order to get optimal results with the batch query, I recommend this workflow:
submit a batch with all of your 500 items.
store the results in your dicts
re-submit with the UnprocessedKeys as many times as needed
sort the results on the python side
Quick example
I assume you have created a table "MyTable" with a single hash_key
import boto
# Helper function. This is more or less the code
# I added to devolop branch
def resubmit(batch, prev):
# Empty (re-use) the batch
del batch[:]
# The batch answer contains the list of
# unprocessed keys grouped by tables
if 'UnprocessedKeys' in prev:
unprocessed = res['UnprocessedKeys']
else:
return None
# Load the unprocessed keys
for table_name, table_req in unprocessed.iteritems():
table_keys = table_req['Keys']
table = batch.layer2.get_table(table_name)
keys = []
for key in table_keys:
h = key['HashKeyElement']
r = None
if 'RangeKeyElement' in key:
r = key['RangeKeyElement']
keys.append((h, r))
attributes_to_get = None
if 'AttributesToGet' in table_req:
attributes_to_get = table_req['AttributesToGet']
batch.add_batch(table, keys, attributes_to_get=attributes_to_get)
return batch.submit()
# Main
db = boto.connect_dynamodb()
table = db.get_table('MyTable')
batch = db.new_batch_list()
keys = range (100) # Get items from 0 to 99
batch.add_batch(table, keys)
res = batch.submit()
while res:
print res # Do some usefull work here
res = resubmit(batch, res)
# The END
EDIT:
I've added a resubmit() function to BatchList in Boto develop branch. It greatly simplifies the worklow:
add all of your requested keys to BatchList
submit()
resubmit() as long as it does not return None.
this should be available in next release.