How can I read all topic? - python

When I create the consumer
consumer = pulsar.Client(
PULSAR_URL,
authentication=AuthenticationOauth2(params)
).subscribe(
topic=PULSAR_TOPIC,
subscription_name=PULSAR_SUBSCRIPTION_NAME
)
I cannot read all messages from the beginning, or all non read messages, I only can read messages created after the consumer is created.
The questions is about how can I set the consumer in order to read all non read messages previously.
Thanks

You can specify the initial_position in the subscribe method to set the initial position of a consumer when subscribing to the topic. It could be either: InitialPosition.Earliest or InitialPosition.Latest. Default: Latest
So in your case, if you wanted to start at the oldest available message then you would want something like:
consumer = pulsar.Client(
PULSAR_URL,
authentication=AuthenticationOauth2(params)
).subscribe(
topic=PULSAR_TOPIC,
subscription_name=PULSAR_SUBSCRIPTION_NAME,
initial_position=InitialPosition.Earliest
)
Hope this helps!

Related

Bittorrent and sockets: how to handle multiple messages?

I'm writing a bittorrent client in python, and have been using a loop to continually read messages from the peer sockets using recv().
When I run my program I look in wireshark to see what bittorrent messages I'm getting. It's pretty easy to tell what kind of message you got from the first 5 bytes of the message, since the length and message ID are specified there.
I'm running into some problems when dealing with receiving data containing multiple messages.
I've tried tackling it by writing a method like this:
def handleMultiple(self, message, peer):
total_length = len(message)
parsed = 0
while parsed < total_length:
m_len, m_id = struct.unpack(">IB", message[parsed:parsed + 5])
m_total = m_len + 4
print(m_len, total_length, parsed, m_id, peer.made_handshake, peer.ip)
self.handleMessage(message[parsed:m_total + parsed], peer)
parsed += m_total
The function just breaks down the received bytes into its constituent messages and hands it off to the message handler that knows how to deal with individual messages.
The problem is that when I printed out the length prefix and message ID from a message I received using recv(), sometimes it looks like just garbage numbers.
This is really my first time experimenting with sockets, so I lack the intuition to know what I'm really getting when calling recv(). Should I just call receive on the first 5 bytes of data I get, then do some checking to make sure that the length and ID are valid, then call recv() on the rest of the message?
How should I go about handling multiple messages incoming at a time?
Edit:
I wanted to provide some images of the results I'm seeing to see if anyone can help identify the issue I'm having.
Here's a picture of the bittorrent messages I'm receiving:
Here's a corresponding logging output:
The columns are supposed to be message length + 4, total message length, message id, and the IP from the sender:
As I can see, the length prefix for the first messages, (the ones that are multiple messages sent to me at a time) are completely too large. The fifth message I got from 95.211.212.26 is a well formed bitfield message.
Another thing I noticed is that the supposed message ID from each of the multi-message messages is 255. Also given that the total length of a bitfield message for this given torrent is 126, the total lengths (303, 328, 325) are not inconceivable for messages of a bitfield followed by several have messages.
Alright so I've managed to figure out where I was going wrong. I was reading from the socket assuming that my message would be there in full. In reality, I was reading the initial snippet of the message, and at a later time I was reading the middle of the message. The 255 values I was seeing weren't message IDs but actually the middle of the peer's bitfield (0xff).
I changed my approach to store the read in bytes from the socket to the peer's message buffer. Once the message buffer was at least as long as the expected payload, I read the message and trimmed the buffer to exclude what I just read. Now all of my messages' IDs are looking as I expect.

How to specify ">" in redis-py

I'm looking at this in the redis stream documentation, which says:
It is time to try reading something using the consumer group:
> XREADGROUP GROUP mygroup Alice COUNT 1 STREAMS mystream >
1) 1) "mystream"
2) 1) 1) 1526569495631-0
2) 1) "message"
2) "apple"
XREADGROUP replies are just like XREAD replies. Note however the GROUP
provided above, it states that I want to
read from the stream using the consumer group mygroup and I'm the
consumer Alice. Every time a consumer performs an operation with a
consumer group, it must specify its name uniquely identifying this
consumer inside the group.
There is another very important detail in the command line above,
after the mandatory STREAMS option the ID requested for the key
mystream is the special ID >. This special ID is only valid in the
context of consumer groups, and it means: messages never delivered to
other consumers so far.
I am trying to specify the ">" parameter in redis-py.
When I look at the documentation here, I don't see any parameter in streams that seems to let me do this. Specifically, I'm trying:
>>> r.xreadgroup(mygroupname,myconsumer,{mystream : ">"},1)
[] # oh no, empty. WHY?!
#
# even though
>>> r.xread({mystream: '1561950326849-0'}, count=1)
[[b'stuff-returned-successfully.]]
What am I missing? Why can't I specify a ">" to indicate unseen messages?
You had a mistaken assumption in this question that you had /unseen/ messages. That command should work, but will not if you have already seen all the messages once.
Try
# make sure you have not seen anything in your stream by resetting last seen to 0
>>> r.xgroup_setid(mystream,mygroupname,0) # RESET ALL
Now
r.xreadgroup(mygroupname,myconsumer,{mystream : ">"},1)
works fine.

python win32com: Delete multiple emails in Outlook

I need to delete multiple email messages in Outlook from python via win32com module.
I understand there is a VBA method MailItem.Delete() available to win32com via COM and it works; but it is VERY VERY slow when deleting more than one email since one would have to delete emails sequentially ie loop over the MailItem collection of emails.
Is there any way to delete a selected collection of mailItems at once, something like MailItemCollection.DeleteAll()?
Also, if above is not possible; is it at all possible to delete many emails via multi-threaded approach ie divide the collection of mailItems into, let's say, 4 subsets; have 4 threads operate on those?
I figure since I can delete multiple emails in outlook via its GUI very fast, there has to be a way where I can do the same thing via COM API.
Not in OOM - MailItem.Delete or Items.Remove(Index) is all you get.
On the Extended MAPI level (C++ or Delphi, but not Python), you can delete multiple messages using IMAPIFolder.DeleteMessages (which takes a list of entry ids). Or you can use IMAPIFolder.EmptyFolder (deletes all messages in a folder).
If using Redemption (any language; I am its author) is an option, you can use RDOFolder2.EmptyFolder or RDOFolder.Items.RemoveMultiple. RDOFolder can be retrieved from RDOSession.GetRDOObjectFromOutlookObject if you pass Outlook's MAPIFolder object as a parameter.
On top of a great answer by #Dimitry I'll add a remark which may be important for you: if you start deleting from Items as you iterate over it, strange things may happen.
For example on my system the following Python code:
for mail in folder.Items:
mail.Delete()
as well as
for index, mail in enumerate(folder.Items, 1):
folder.Remove(index)
both remove only half of the items in the folder! The reason seems to be that Items uses a range of indices internally to provide an iterator so each time an element is deleted, the tail of the list is shifted by one...
To remove all items in the folder try:
for i in range(len(folder.Items)):
folder.Remove(1)
If you need to filter by a certain criterion consider first gathering EntryIDs and then deleting searching for ID:
ids = []
for i in range(len(folder.Items), 1):
if to_be_deleted(folder.Items[index]):
ids.append(index)
for id in ids:
outlook.GetEntryByID(id).Delete()
I imagine performance of that is even worse, though :c
Great answer from Dedalus above. Wanted to make a more concise version of the code:
import win32com.client
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
# Select main Inbox
inbox = outlook.GetDefaultFolder(6)
messages = inbox.Items
# Delete all messages from a specific sender
sender = 'myname#abc.com'
try:
for message in messages:
try:
s = message.sender
s = str(s)
if s == sender:
message.Delete()
except:
pass
except:
pass
You may not need two "trys" but I found it was more stable when applying the script to a long and heavily used inbox. Usually I combine this with a script that limits the message = inbox.Items to within a week so it doesn't do the entire inbox.
For me it worked by iterating the items in reverse.
Old:
for mail in folder.Items:
if 'whatever' in mail.Subject: # just a condition (optional)
mail.Delete()
New code:
for mail in reversed(folder.Items): # just tried deleting Items in reverse order
if 'whatever' in mail.Subject: # just a condition (optional)
mail.Delete()
Hope this helps someone.
Am I missing something? Neither Application nor NameSpace objects appear to have a GetEntryByID method, though the rest of what Dedalus pointed out was correct.
Namespace objects have a GetItemFromID method, and MailItem objects have a EntryID property which will uniquely identify them so long as they don't get reorganized into different folders.
Documentation: https://learn.microsoft.com/en-us/office/vba/outlook/how-to/items-folders-and-stores/working-with-entryids-and-storeids
My full solve:
import win32com.client
outlook = win32com.client.gencache.EnsureDispatch("Outlook.Application")
folders = outlook.GetNamespace("MAPI")
inbox= folders.GetDefaultFolder(6)
messages=inbox.Items
email_ids = []
folder_id = inbox.StoreID
# Here create a function to isolate/exclude. Below is just an example of filtering by a subject line.
email_subjects = ['Subj1','Subj2','Subj3']
for i in range(len(messages)):
if any(header in inbox.Items[i].Subject for header in email_subjects):
email_ids.append(inbox.Items[i].EntryID)
for id in email_ids:
folders.GetItemFromID(id, folder_id).Delete()
I've implemented an alternative solution in local Outlook, by moving email ítems from.inbox folder to deleted items folder or to an archive folder, by using VBA code or Outlook filter rules directly.
This way, I just mannualy empty the deleted items folder once a week (of course this periodic step can also be programmed).
I observed that this strategy can be more efficient instead of delete item per item using code (you mentioned the internal.indexes problem).

how to delete kafka message after reading

I am using the below code to read messages from a topic. How do i delete a message after it is read?
from kafka import KafkaConsumer
consumer = KafkaConsumer('my-topic',
group_id='my-group',
bootstrap_servers=['localhost:9092'])
for message in consumer:
# message value and key are raw bytes -- decode if necessary!
# e.g., for unicode: `message.value.decode('utf-8')`
print ("%s:%d:%d: key=%s value=%s" % (message.topic, message.partition,
message.offset, message.key,
message.value))
There is no way to delete a specific message from kafka - kafka simply is not designed to do that. The only way to delete messages is by setting log.retention.hours in kafka's config/server.properties to a value of your liking. The default is 168 - meaning that messages are not kept after 168 hours.
If you instead are looking for a way to read messages from a specific offset - i.e. not read from the beginning every time, look here http://kafka-python.readthedocs.org/en/master/apidoc/KafkaConsumer.html
commit() - committing read offsets to kafka
seek_to_end() - fast forward to consuming only newly arriving messages
seek() - moving to a given offset (presumably stored somewhere else than in kafka)

Maildir - open latest email and reply to sender

I've configured postfix on the email server with .forward file which saves a copy of email and invokes a python script. These emails are stored in Maildir format.
I want to use this Python script to send a reply to the sender acknowledging that the email has been received. I was wondering if there is any way I can open/access that e-mail, get the header info and sender address and send email back.
I looked at several examples of Maildir functions of Python, but they mostly add/delete e-mails. How can I open the latest e-mail received in Maildir/new and get the required information?
The program I have so far:
md = mailbox.Maildir('/home/abcd/Maildir')
message = md.iterkeys().next()
#print message
#for msg in md:
# subject = msg.get('Subject',"")
# print subject
print message
sender = message.get('From',"")
print sender
When I execute this, I do get the sender name. But It is rather the oldest email arrived in Maildir/new folder not the latest one.
Also, if I use get_date function, what if two (or more) e-mails arrive on the same day?
The MaildirMessage's method .get_date() gets you the timestamp of the
message file on disc. Depending on your filesystem, this may have anywhere between two second and nanosecond accuracy. The changes of two messages giving the same value with .get_date() are vastly smaller than when this actually returned a date only.
However if the message files were touched for some reason the return from .get_date() would not be relevant at all. Dovecot e.g. explicitly states that a files mtime should not be changed.
There are several dates associated with a MaildirMessage:
The arrival time timestamp, as encoded in the name of message (the part before the first dot, these are "whole" seconds). If the part
between the first and second dot has a segment of the form Mn than n is the microsecond arrival time, and be used to improve the resolution of the timestamp.
The timestamp of the file on disc
The 'Date:' header field as set by the sending program (or added by some
MTA)
The dates added by intermediate MTA in the 'Received:' header field
The last of these might not be available e.g. if you and the sender are on the same mail server. The third can be easily faked/incorrect (ever got spam in your inbox dated many years ago?). And the second is incorrect if the file ever got touched.
That leaves selecting on the first option:
d = {}
for name in md.keys():
d.setdefault(int(name.split('.', 1)[0]), []).append(name)
result = sorted(d.items())[-1][1]
assert len(result) == 1 # might fail
msg = md.get_message(result[0])
If you are lucky result is a list with a single item. But this value has only second resolution, so you might have multiple emails and then you have to decide on how to decide which message to select based on one of the other values (e.g. by sorting using the files timestamp .get_date()) or just select the first, randomly select one. (If you have the log file, you can search for the result messages' keys in there to determine which one arrived latest).
If you wouldn't convert to int, and have old emails (i.e. before 2001-09-09 03:46:40) a string comparison would probably not give you the message with the latest arrival time.
Some hints for this:
You can open a Maildir with the mailbox.Maildir class (see the Documentation for mailbox)
You can iterate over all the mails in a Maildir via the method itervalues
Now you get all the mails in the Maildir. One of them is the most recent one.
The mails are objects of the class MaildirMessage, which is a subclass of Message. For these classes, also a documentation exists (on the same page as mailbox, currently)
With the method "get_date" on those objects, you can find out, which one is the most recent one. You still have to select it yourself.
So much as beginners help: A little bit you should also do by yourself.
You should make yourself familiar with the Python documentation - I agree, that it is not easy to find the right packages and how to use them, but you can try them directly in the Python shell.
Ok, here another code snippet:
newest = None
for message in md.itervalues():
if newest == None or message.get_date() > newest.get_date():
newest = message
# now newest should contain the newest message
Did not see your last question: get_date does not only contain the date, but also the time, because it gives the number of seconds since (normally) 1970.

Categories