Python Outlook subfolder sniffer/alert - python

I'm very new to python and have very little experience.
I'm wondering if you can help me with the below:
I would like to create a script that would run in the background and by consuming W32client and outlook events i would like to fire an alert when specific amount of emails will land in a specific amount of time in a subfolder.
Let's say 5 emails in a 15 minutes. Once that condition is met i would like to stop script for next 15 minutes (even with time.sleep).
I've been thinking and came up with a rough idea like below:
Maybe some dict that would have a key (email subject or anything) and value of recived_time? Then email count would be a len of that dict, and could fire the condition when after looping through items oldest received date would be less than 15 mins?
Can you just suggest anything else?
i hooked up some events to check how does that work, but haven't came with any working code yet

Related

What's the efficient way to acess the BSC node mempool content?

I'm currently writing some program to monitor the mempool of a bsc node. As my BSC node is a charged by request count, I'm trying to explore the best way to save the time and cost.
Here's some plan I found:
Use service of mempool explorer. https://www.blocknative.com/. This is obviously not the best plan since I've already paid 99 dollar on quicknode service already and I found some transactions is still not among the list it provided.
User web3py pending filter: new_transaction_filter = w3.eth.filter('pending') new_transaction_filter.get_new_entries() and w3.eth.get_transaction(entry) for each entry. This is also not effcient because it's quite time wasting and cost lots of web3 requests.
Using pending_block = w3.eth.get_block(block_identifier='pending', full_transactions=True) The call only returns transactions with mined block number and obviously not the 'pending' ones.
Use w3.geth.txpool.content(). This can print out all the pending transactions in one shot but when you keep calling it, duplicated record will appears.
Can anyone give me a hint which is the correct way to fetch the mempool?
I think option 2 is best, I've been trying to see if theres a way to apply only a specific address to the filter but I've had not luck with that, I've tried option 3 which is too late and option 4 only works on a geth node (I've only been using speedynode so not the best).
Subscribing to pending transactions with the websockets library may be better than option 2 since it would just listen and receive the hashes in real time rather than using new_transaction_filter.get_new_entries() in a loop.
However, I don't believe public nodes will be fast enough to receive them before the block gets mined.
Check out this link. It also works with bsc:
https://docs.infura.io/infura/tutorials/ethereum/subscribe-to-pending-transactions

Scraping Edgar with Python regular expressions

I am working on a personal project's initial stage of downloading 10-Q statements from EDGAR. Quick disclaimer, I am very new to programming and python so the code that I wrote is very basic, not even using custom functions and classes, just a very long script that I'm more comfortable editing. As a result, some solutions are quite rough (i.e. concatenating urls using CIKs and other search options instead of doing requests with "browser" headers)
I keep running into a problem that those who have scraped EDGAR might be familiar with. Every now and then my script just stops running. It doesn't raise any exceptions (I created some that append txt reports with links that can't be opened and so forth). I suspect that either SEC servers have a certain limit of requests from an IP per some unit of time (if I wait some time after CTRL-C'ing the script and run it again, it generates more output compared to rapid re-activation), alternatively it could be TWC that identifies me as a bot and limits such requests.
If it's SEC, what could potentially work? I tried learning how to work with TOR and potentially get a new IP every now and then but I can't really find some basic tutorial that would work for my level of expertise. Maybe someone can recommend something good on the topic?
Maybe the timers would work? Like force the script to sleep every hour or so (still trying to figure out how to make such timers and reset them if an event occurs). The main challenge with this particular problem is that I can't let it run at night.
Thank you in advance for any advice, I keep fighting with it for days and at this stage it could take me more than a month to get what I want (before I even start tackling 10-Ks)
It seems like delays are pretty useful - sitting at 3.5k downloads with no interruptions thanks to a simple:
import(time)
time.sleep(random.randint(0, 1) + abs(random.normalvariate(0, 0.2)))

Strategies for storing frequency of dynamic data

Sorry if the title is misleading.
I am trying to write a program that calculates frequency of emails being sent out of different email ids. We need to trigger alerts based on number and frequency of mails sent. For example for a particular email if in past 60 minutes more than 25 mails were sent, a trigger needs to be sent.
A different trigger for another directory based on another rule. Fundamental rules are about how many mails sent over past 60 minutes, 180 minutes, 12 hours and 24 hours. How do we come up with a strategy to calculate frequency and store it without too much of system/cpu/database overheads.
The actual application is a legacy CRM system. We have no access to the Mail Server to hack something inside the Postfix or MTA. Moreover there are multiple domains involved, so any suggestion to do something on the mail server may not help.
We have ofcourse access to every attempt to send a mail, and can look at recording them. My challenge is on a large campaign database writes would be frequent, and doing some real time number crunching resource intensive. I would like to avoid that and come up with an optimal solution
Language would be Python, because CRM is also written using the same.
Try to do hack on client side in recording email attempt to a log file. Then you can read that file to count frequency of emails sent.
I think that you can put data in memory in dict for some time say for ex 5 or 10 min. Then you can send data to DB thus not putting load on DB of frequent writes. If you put a check in your code for sudden surge in email from a particular domain then it might provide you a solution for your problem.
Let m be the largest threshold (maximum number of emails sent) of any of your tests. E.g., if you have a limit of 150 emails per 24 hours, then m = 150 (presumably the thresholds for shorter-period checks are all lower than this -- it wouldn't make sense to also have a limit of 200 emails per 12 hours). Your longest period is 24 hours, which means m can't be more than 25*24 = 600 -- since if it were, then some checks would be redundant with the 25-emails-per-hour limit.
Since this is quite a small number of emails per user, I'd suggest just keeping a per-user ring buffer that simply stores the dates and times of the last m <= 600 messages sent by that user. Every time the user sends a new email, check the send time of the 25th-most-recent message sent; if it's less than 1 hour ago, sound the alarm. Likewise check the other limits (e.g. if there is also a limit of 50 emails per 180 minutes, check that the 50th-most-recent message was sent more than 180 minutes ago). If all tests pass, then (over)write the next entry in the ring buffer with the current date/time. A date/time can be stored in 4 bytes, so this is at most about 2.4Kb bytes per user, with O(1) queries and updates. (Of course you could use a DB instead, but a ring buffer is trivial to implement in-memory, and the total size needed is probably small enough if you don't have millions of users.)

Python chat server when can't guarantee page not reset?

I'm relatively new to Python, cgi, and passenger-wsgi, so please bear with me.
I set up a python script that's not much more than
import time
startTime = time.time()
def main():
return time.time()-startTime
.. just so I know how long the passenger server has been running. I did this a few days ago, but it's only at about 12 minutes now.
Keeping track of state isn't important for any of the scripts I currently have, but I'm planning on writing a simple chat page. Keeping track of the various users online and the chat groups will be very important, and I wouldn't want everything to be reset every 12 minutes.
What can I do about this?
My only thought is to store any necessary variables inside an object, then serialize that object and store it in a file every time I change it so that if the server restarts again I still have everything. Is this normally how it's done?

How to sleep a python script running as a cronjob?

I wrote a python script to monitor a log file on a CentOS server for a specific value and send an email when it finds it. It runs as a cron every 5 minutes.
My question is what is the best way to put this script to sleep after it has sent the first email. I don't want it to be sending emails every 5 mins, but it needs to wake up and check the log again after an hour or so. This is assuming the problem can be fixed in under an hour. The people who are receiving the email don't have shell access to disable the cron.
I thought about sleep but I'm not sure if cron will try to run the script again if another process is active (sleeping).
cron will absolutely run the script again. You need to think this through a little more carefully than just "sleep" and "email every 10 minutes."
You need to write out your use cases.
System sends message and user does something.
System sends message and user does nothing. Why email the user again? What does 2 emails do that 1 email didn't do? Perhaps you should SMS or email someone else.
How does the user register that something was done? How will they cancel or stop this cycle of messages?
What if something is found in the log, an email is sent and then (before the sleep finishes) the thing is found again in the log. Is that a second email? It is two incidents. Or is that one email with two incidents?
#Lennart, #S. Lott: I think the question was somewhat the other way around - the script runs as a cron job every five minutes, but after sending an error-email it shouldn't send another for at least an hour (even if the error state persists).
The obvious answer, I think, is to save a self-log - for each problem detected, an id and a timestamp for the last time an email was sent. When a problem is detected, check the self-log; if the last email for this problem-id was less than an hour ago, don't send the email. Then your program can exit normally until called again by cron.
When your scripts sends email, make it also create a txt file "email_sent.txt". Then make it check for existence of this txt file before sending email. If it exists, don't send email. If it does not exist, send email and create the text file.
The text files serves as an indicator that email has already been sent and it does not need to be sent again.
You are running it every five minutes. Why would you sleep it? Just exit. If you want to make sure it doesn't send email every five minutes, then make the program only send an email if there is anything to send.
If you sleep it for an hour, and run it every five minutes, after an hour you'll have 12 copies running (and twelve emails sent) so that's clearly not the way to go forward. :-)
Another way to go about this might be to run your script as a daemon and, instead of having cron run it every five minutes, put your logic in a loop. Something like this...
while True:
# The check_my_logfile() looks for what you want.
# If it finds what you're looking for, it sends
# an email and returns True.
if check_my_logfile():
# Then you can sleep for 10 minutes.
time.sleep(600)
# Otherwise, you can sleep for 5 minutes.
else:
time.sleep(300)
Since you are monitoring a log file, It might be worth checking into things that already do log file monitoring. Logwatch is one, but there are log analyzing tools, that handle all of these things for you:
http://chuvakin.blogspot.com/2010/09/on-free-log-management-tools.html
Is a good wrap-up of some options. They would handle yelling at people. Also there are system monitoring tools such as opennms or nagios, etc. They also do these things.
I agree with what other people have said above, basically cron ALWAYS runs the job at the specified time, there is a tool called at which lets you run jobs in the future, so you could batch a job for 5 minutes, and then at runtime decide, when do I need to run again, and submit a job to at for whatever time you need it to run again (be it 5 minutes, 10 minutes or an hour). You'd still need to keep state somewhere (like what #infrared said) that would figure out what got sent when, and if you should care some more.
I'd still suggest using a system monitoring tool, which would easily grow and scale and handles people being able to say 'I'm working on XX NOW stop yelling at me' for instance.
Good luck!

Categories