Receive messages within a time frame from google pub/sub

Receive messages within a time frame from google pub/sub - python

I would like to only receive messages within a time frame, as the messages before and after have corrupted data.
I know that I can seek to the starting position, so that issue is solved, how do I configure the other end (stop at a specific time).
Any answers or suggestions are highly appreciated.
After setting the interval I plan to read the messages from python code if it helps.

There is no integrated solution for that. You have to implement a filter in Python for discarding messages with corrupted data.
However, your use case is interesting and opening a feature request on the issue tracker could help you (and others) in the future.

Related

RabbitMQ / rstream : Can I choose between getting all messages or all messages since the start of the program?

I am currently attempting to use some RabbitMQ streams for the purposes of my project
I have attempted to use the rstream library such as recommended on the RabbitMQ website in order to do so and it works to an extent but is has no documentation and I cannot get it to work as I need.
I would like to be able to be able to choose to get from an index such as the last previously processed message or from the start ( such as the code example shows )
Currently my solution is to name the messages in a way that would include the date and then parse the names in order to know from where I need to start. This is not ideal and can cause issues if I start to accumulate many messages
How can I get the messages starting from the last received message rather than from the start ? ( if my program is restarted for what ever reason as an example )

Can't receive any data from websocket - Only "sleeping to keep loop open"

When using the sample code in the wiki of python-kucoin:
https://python-kucoin.readthedocs.io/en/latest/websockets.html
I keep on getting "sleeping to keep loop open" while in debug mode (I am using PyCharm).
Digging into the code, I also realized that at this stage of the code (res['instanceServers'][0]['endpoint']):
https://github.com/sammchardy/python-kucoin/blob/develop/kucoin/client.py#L183
the enpoint is
wss://ws-api.kucoin.com/endpoint
while according from Kucoin documentation it should be:
wss://push1-v2.kucoin.com/endpoint
Is this expected?
I forced the code to change this but it doesn't help receiving any data either...
How long am I supposed to wait before receiving any data from the websocket?

Most of SammChardy's Kucoin Python package hasn't been updated in years. The other one at:
https://github.com/Kucoin/kucoin-python-sdk
Looks very much similar to SammChardy's but is updated more regularly. I tried both and settled on the latter one. Give it a go and see if it'll resolve your issues.

What is the behaviour of kafka when a commit is made without reading the message?

I have code that looks like.
def message_reader(consumer):
consumed_message = consumer.consume_batch()
if consumed_message:
#do something
def run_reader():
process_consumer = get_consumer() #gets a SimpleConsumer()
message_reader(process_consumer)
process_consumer.commit()
process_consumer.close()
so, my question is , Suppose there is no message in the topic and no messages are consumed - does the commit() increase the offset?
And also, does the producer check for the latest offset before producing a message ?

Not an expert on the python client, but the java one would just re-commit the same position if it hasn't actually consumed anything between commit calls.
I'm certain, however, that all clients do the same (commit the same position) as doing otherwise would cause you to skip records. There are also entire Kafka monitoring systems that have been written to rely on this behavior - for example burrow.

Python - Plot.ly - MySQL Real time streaming visualization

Hope you're well and thanks for reading.
Been revisiting an old project, leveraging plotly to stream data out of mysql with python in-between the two. I've never had great luck w/ plot.ly (which I'm sure relates more to my understanding than their platform), streams/iframes seem to stall over time and I am not apt enough to troubleshoot completely.
My current symptom is this: Plots arbitrarily stall - I'm pushing data, but the iframe isn't updating.
The current solution is: Refresh the browser every X minutes.
The solution works, but it's aggrevating, because I dont understand why the visual is stalling in the first place (is it me, is it them, etc).
As I was reviewing some of the documentation, specifically this link:
https://plot.ly/streaming/
I noticed they call out NOT continually opening and closing streams, and that heartbeats should be placed every so often to keep things alive/fresh.
Here's what I'm currently calling every 10 minutes:
pullData(mysql)
format data
open(plotly.stream1)
write data to plotly.stream1
close(plotly.stream1)
open(plotly.stream2)
write data to plotly.stream2
close(plotly.stream2)
Based on what I am reading, it sounds like I should actually execute the script once on startup, and keep the streams open, but heartbeat() them every 15 or-so seconds between actual write() calls like this:
open(plotly.stream1)
open(plotly.stream2)
every 10 minutes:
pullData(mysql)
format data
write data to plotly.stream1
write data to plotly.stream2
while not pulling and writing:
every 15 seconds:
heartbeat(plotly.stream1)
heartbeat(plotly.stream2)
if error:
close(plotly.stream1)
close(plotly.stream2)
Please excuse the sudo-mess, I'm just trying to convey an idea. Anyone have any advice? I started on my original path of opening, writing, closing based on the streaming example, but that's a one time write. The other example is a constant stream of data. I'm somewhere in between those two.
Furthermore - is this train of thought even related to the iframe not refreshing? Part of me believes the symptom is unrelated to my idea - the data is getting to plot.ly fine - it's my session that's expiring, or the iframe "connection" that's going stale. If the symptom is unrelated, at least I'll have made my source code a bit cleaner and more appropriate.
Any advice is greatly appreciated!
Thanks
-justin

Plotly will close a stream that is inactive for more than 60 seconds. You must send a newline down the streaming channel (a heartbeat) to keep it open. I recommend every 30 seconds.
Your first code example may not work as expected because the client side websocket (that connects the Plot to our system) may close when your first source stream (the stream that connects your script to our system) exits. When you disconnect a source stream a signal is sent to our system that lets it know your stream is now inactive. If a new source stream does not reconnect quickly we close the client connecting websockets.
Now, when your script gets more data and opens a new stream it will successfully stream data to our system but the client-side websocket, now closed, will not pass the data to the Plot. We will cache a certain amount of points for you behind the scenes so that when you refresh the page the websocket reconnects and you get the last n points (where n is set by max-points in the API call).
This is why sending the heartbeat is important. We keep the source stream open and that in turn ensures that all the connected Clients keep their websockets open.
This isn't necessarily the most robust behaviour for a streaming platform to have and we will likely make it better in the future. For now though you will likely see better results by attempting to implement the code in your second example.
Hope that helped!

using pykestrel,the python library for the kestrel queue system

I am new to asynchronous message queues and would be using the python api to kestrel, pykestrel in my project (https://github.com/empower/pykestrel).
The example on the github page,has the following line:
q.add("test job")
What is test job in practice ?. Can someone please provide some more examples demonstrating the use of pykestrel ?
Please Help
Thank You

The code in your question adds a message to the Kestrel Queue.
kestrel.next()
will get the next message in the queue.
You can find full documentation in the code : https://github.com/empower/pykestrel/blob/master/kestrel/client.py
Also, kestrel uses the memcache protocol which you can find here : http://code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
Basically, anything that works with Memcache can be used with Kestrel.

For posterity, note that the original project is at https://github.com/matterkkila/pykestrel/ and is newer.

"test job", in practice, is the description of the action to be done by your worker. For example, if you're a video site, once you receive a new video:
"MakeIcon('/path/to/video')"
Your worker process should know what to do based on that message. The message can be larger and contain more information.
It can be anything, encoded anyway you please.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.