What's the efficient way to acess the BSC node mempool content? - python

I'm currently writing some program to monitor the mempool of a bsc node. As my BSC node is a charged by request count, I'm trying to explore the best way to save the time and cost.
Here's some plan I found:
Use service of mempool explorer. https://www.blocknative.com/. This is obviously not the best plan since I've already paid 99 dollar on quicknode service already and I found some transactions is still not among the list it provided.
User web3py pending filter: new_transaction_filter = w3.eth.filter('pending') new_transaction_filter.get_new_entries() and w3.eth.get_transaction(entry) for each entry. This is also not effcient because it's quite time wasting and cost lots of web3 requests.
Using pending_block = w3.eth.get_block(block_identifier='pending', full_transactions=True) The call only returns transactions with mined block number and obviously not the 'pending' ones.
Use w3.geth.txpool.content(). This can print out all the pending transactions in one shot but when you keep calling it, duplicated record will appears.
Can anyone give me a hint which is the correct way to fetch the mempool?

I think option 2 is best, I've been trying to see if theres a way to apply only a specific address to the filter but I've had not luck with that, I've tried option 3 which is too late and option 4 only works on a geth node (I've only been using speedynode so not the best).

Subscribing to pending transactions with the websockets library may be better than option 2 since it would just listen and receive the hashes in real time rather than using new_transaction_filter.get_new_entries() in a loop.
However, I don't believe public nodes will be fast enough to receive them before the block gets mined.
Check out this link. It also works with bsc:
https://docs.infura.io/infura/tutorials/ethereum/subscribe-to-pending-transactions

Related

How to implement "Transaction Controller" in locust

Is there a way to make Locust measure (completed tasks/second) assuming we have more that one request in the task, as the Jmeter dose by using "Transaction Controller"
measuring performance by how many completed flows (several requests) per second
If it were me, I would reduce your work down to a single request or action per task. You can use SequentialTasks or put TaskSets inside TaskSets to structure the work however you need it to be, then Locust will do all the work for you.
Alternatively, you can create your own User class with your code however you want and use EventHooks to manually tell Locust when stuff happens by firing a request event. You can call it with request_type as whatever label you want for tasks to be tagged with as a type and name to be whatever you want the task to be called when it reports.
In addition to #solowalkers answer you may want to check out locust-plugins TransactionManager.
See the transaction_example* files here https://github.com/SvenskaSpel/locust-plugins/tree/master/examples

Delaying 1 second per request, not enough for 3600 per hour

The Amazon API limit is apparently 1 req per second or 3600 per hour. So I implemented it like so:
while True:
#sql stuff
time.sleep(1)
result = api.item_lookup(row[0], ResponseGroup='Images,ItemAttributes,Offers,OfferSummary', IdType='EAN', SearchIndex='All')
#sql stuff
Error:
amazonproduct.errors.TooManyRequests: RequestThrottled: AWS Access Key ID: ACCESS_KEY_REDACTED. You are submitting requests too quickly. Please retry your requests at a slower rate.
Any ideas why?
This code looks correct, and it looks like 1 request/second limit is still actual:
http://docs.aws.amazon.com/AWSECommerceService/latest/DG/TroubleshootingApplications.html#efficiency-guidelines
You want to make sure that no other process is using the same associate account. Depending on where and how you run the code, there may be an old version of the VM, or another instance of your application running, or maybe there is a version on the cloud and other one on your laptop, or if you are using a threaded web server, there may be multiple threads all running the same code.
If you still hit the query limit, you just want to retry, possibly with the TCP-like "additive increase/multiplicative decrease" back-off. You start by setting extra_delay = 0. When request fails, you set extra_delay += 1 and sleep(1 + extra_delay), then retry. When it finally succeeds, set extra_delay = extra_delay * 0.9.
Computer time is funny
This post is correct in saying "it varies in a non-deterministic manner" (https://stackoverflow.com/a/1133888/5044893). Depending on a whole host of factors, the time measured by a processor can be quite unreliable.
This is compounded by the fact that Amazon's API has a different clock than your program does. They are certainly not in-sync, and there's likely some overlap between their "1 second" time measurement and your program's. It's likely that Amazon tries to average out this inconsistency, and they probably also allow a small bit of error, maybe +/- 5%. Even so, the discrepancy between your clock and theirs is probably triggering the ACCESS_KEY_REDACTED signal.
Give yourself some buffer
Here are some thoughts to consider.
Do you really need to hit the Amazon API every single second? Would your program work with a 5 second interval? Even a 2-second interval is 200% less likely to trigger a lockout. Also, Amazon may be charging you for every service call, so spacing them out could save you money.
This is really a question of "optimization" now. If you use a constant variable to control your API call rate (say, SLEEP = 2), then you can adjust that rate easily. Fiddle with it, increase and decrease it, and see how your program performs.
Push, not pull
Sometimes, hitting an API every second means that you're polling for new data. Polling is notoriously wasteful, which is why Amazon API has a rate-limit.
Instead, could you switch to a queue-based approach? Amazon SQS can fire off events to your programs. This is especially easy if you host them with Amazon Lambda.

Scraping Edgar with Python regular expressions

I am working on a personal project's initial stage of downloading 10-Q statements from EDGAR. Quick disclaimer, I am very new to programming and python so the code that I wrote is very basic, not even using custom functions and classes, just a very long script that I'm more comfortable editing. As a result, some solutions are quite rough (i.e. concatenating urls using CIKs and other search options instead of doing requests with "browser" headers)
I keep running into a problem that those who have scraped EDGAR might be familiar with. Every now and then my script just stops running. It doesn't raise any exceptions (I created some that append txt reports with links that can't be opened and so forth). I suspect that either SEC servers have a certain limit of requests from an IP per some unit of time (if I wait some time after CTRL-C'ing the script and run it again, it generates more output compared to rapid re-activation), alternatively it could be TWC that identifies me as a bot and limits such requests.
If it's SEC, what could potentially work? I tried learning how to work with TOR and potentially get a new IP every now and then but I can't really find some basic tutorial that would work for my level of expertise. Maybe someone can recommend something good on the topic?
Maybe the timers would work? Like force the script to sleep every hour or so (still trying to figure out how to make such timers and reset them if an event occurs). The main challenge with this particular problem is that I can't let it run at night.
Thank you in advance for any advice, I keep fighting with it for days and at this stage it could take me more than a month to get what I want (before I even start tackling 10-Ks)
It seems like delays are pretty useful - sitting at 3.5k downloads with no interruptions thanks to a simple:
import(time)
time.sleep(random.randint(0, 1) + abs(random.normalvariate(0, 0.2)))

GAE Backend fails to respond to start request

This is probably a truly basic thing that I'm simply having an odd time figuring out in a Python 2.5 app.
I have a process that will take roughly an hour to complete, so I made a backend. To that end, I have a backend.yaml that has something like the following:
-name: mybackend
options: dynamic
start: /path/to/script.py
(The script is just raw computation. There's no notion of an active web session anywhere.)
On toy data, this works just fine.
This used to be public, so I would navigate to the page, the script would start, and time out after about a minute (HTTP + 30s shutdown grace period I assume, ). I figured this was a browser issue. So I repeat the same thing with a cron job. No dice. Switch to a using a push queue and adding a targeted task, since on paper it looks like it would wait for 10 minutes. Same thing.
All 3 time out after that minute, which means I'm not decoupling the request from the backend like I believe I am.
I'm assuming that I need to write a proper Handler for the backend to do work, but I don't exactly know how to write the Handler/webapp2Route. Do I handle _ah/start/ or make a new endpoint for the backend? How do I handle the subdomain? It still seems like the wrong thing to do (I'm sticking a long-process directly into a request of sorts), but I'm at a loss otherwise.
So the root cause ended up being doing the following in the script itself:
models = MyModel.all()
for model in models:
# Magic happens
I was basically taking for granted that the query would automatically batch my Query.all() over many entities, but it was dying at the 1000th entry or so. I originally wrote it was computational only because I completely ignored the fact that the reads can fail.
The actual solution for solving the problem we wanted ended up being "Use the map-reduce library", since we were trying to look at each model for analysis.

chatbot using twisted and wokkel

I am writing a chatbot using Twisted and wokkel and everything seems to be working except that bot periodically logs off. To temporarily fix that I set presence to available on every connection initialized. Does anyone know how to prevent going offline? (I assume if i keep sending available presence every minute or so bot wont go offline but that just seems too wasteful.) Suggestions anyone? Here is the presence code:
class BotPresenceClientProtocol(PresenceClientProtocol):
def connectionInitialized(self):
PresenceClientProtocol.connectionInitialized(self)
self.available(statuses={None: 'Here'})
def subscribeReceived(self, entity):
self.subscribed(entity)
self.available(statuses={None: 'Here'})
def unsubscribeReceived(self, entity):
self.unsubscribed(entity)
Thanks in advance.
If you're using XMPP, as I assume is the case given your mention of wokkel, then, per RFC 3921, the applicable standard, you do need periodic exchanges of presence information (indeed, that's a substantial overhead of XMPP, and solutions to it are being researched, but that's the state of the art as of now). Essentially, given the high likelihood that total silence from a client may be due to that client just going away, periodic "reassurance" of the kind "I'm still here" appears to be a must (I'm not sure what direction those research efforts are taking to ameliorate this situation -- maybe the client could commit to "being there for at least the next 15 minutes", but given that most clients are about a fickle human user who can't be stopped from changing their mind at any time and going away, I'm not sure that would be solid enough to be useful).

Categories