zookeeper python client watches really slow - python

so i recently want to set up a config server using zookeeper (3.4.3), with python client (http://pypi.python.org/pypi/zc-zookeeper-static).
i noticed if i just set up one watch, it's pretty fast to get the notification that the node changes. but when i try to watch 100 nodes from the same session, it takes about 2 minutes to get notified for some reason. here's my python script: http://pastebin.com/BC6nKdRV
zookeeper server config is pretty simple:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=0
not sure if there's something i did wrong here. any advice would be great. thx!

turns out it's the problem with the client. kazoo has no problem with multiple watches per session. i tested with 5000 watches and change notification is still almost instant. https://github.com/python-zk/kazoo

Related

Streaming Kafka with Python :kafka.errors.NoBrokersAvailable: NoBrokersAvailable

I have tried this tutorial and have successfully consumed kafka topics that are published on a server at my work place. I am not the producer, just purely a consumer. However, the code in that tutorial is to stream in a terminal. Now I want to try it with Python and record the messages into text file (or something of that sort).
This is the code I use, after reading a few more threads and tutorials (such as here):
from kafka import KafkaConsumer
bootstrap_servers = ['xxxxxx:xxxx']
topicName: = 'my_topic_name'
consumer = KafkaConsumer(topicName, group_id='group1', bootstrap_servers=bootstrap_servers, consumer_timeout_ms=1000)
for msg in consumer:
print(msg.value)
Here I want to first print out the message. But I get this error after 1000ms timeout:
kafka.errors.NoBrokersAvailable: NoBrokersAvailable
which sounds logical to me, since a broker is needed and the code above does not seem to do anything with a broker.
If I don't set the consumer_timeout_ms=1000ms, the Python console get just stuck without displaying anything.
How do I resolve this?
More details:
I am doing the following in parallel:
1 - Run zookeeper in one terminal
2 - Run kafka cluster in another terminal
3 - Streaming the topics (nicely) in another terminal with the command kafka-consumer-console
4 - In another terminal, run the Python code in this question.
All of these terminals are Ubuntu in WLS2 (Windows).
If you're able to use the WSL terminal with kafka-console-consumer, then running Python code there should work the same.
If you're connecting to a remote Kafka server, chances are the WSL2 network settings are simply not able to reach that address. (multiple issues elsewhere talk about WSL2 and not having external internet access) . Therefore, you should really consider running Python code on the Windows host itself. Otherwise, sounds like you'll need to adjust your network configurations.
The for loop will wait for new messages to a topic, not read existing data until you add another parameter to the consumer to tell it to
FWIW, you can use kafka-console-consumer ... >> file.txt to write out to a file

Methodology to stop remote autostart celery servers on start up when need changes

I have a rather big project with a fairly simple structure.A VPC(virtual private cloud) have a banch of servers.The main server contains RabbitMQ and MySQL server binded to a private ip and a script i wrote for the autoscaling of consumer's servers,and a number of other servers running celery workers(consumers) with supervisor(autostart,autorestart) that are closed and open on demand.The problem I'm facing now is that in production i cannot edit the source code of the droplets as they will auto bind to the broker if i open the worker server ,and they may consume a task from the queue(which can be very long and costly on execution).Schema:
The only solution i came up with is pre-ban the ip from the consumer i want to edit on firewall(ufw),which is very messy as it will log a banch of errors and can be tedious for a big number of remote servers.Is there a better solution to achieve this?
Prety fast to ask a question.The best solution was to add a kill function on start up of celery worker that connects to the backend database and query a mode argument.So now i can edit the backend database,and choose if i want the worker to connect or exit on start up.

How to create a chat room over internet with raspberry pi and python

I have a bit of an open ended questions for you all. I wish to create a simple chat-room such as this example here: https://www.geeksforgeeks.org/simple-chat-room-using-python/ but I am lost as how to do it over the internet rather than just local network.
Any pointers/help would be appricated!
Thanks :)
There are multiple ways about this. You can either:
Run locally and expose your Python chat system to the internet.
Run your Python chat system in some online server provider (Heroku, AWS, etc.).
The first method requires you to do some port-forwarding on your local network, essentially mapping your 127.0.0.1:8081 local server to your public IP (so you would connect via the internet as myip:8081). This method, however, comes with its limitations; when you turn off your computer you are also effectively turning off your server to the rest of the internet. The second method will ensure the server stays on at all times, and is likely what you are looking for. Heroku is a great starting point as they provide a free tier that you can test everything out.

Communication Between Processes and Machines Python

I'm struggling to design an efficient way to exchange information between processes in my LAN.
Till now, I've been working with one single RPi, and I had a bunch of python scripts running as services. The services communicated using sockets (multprocessing.connection Client and Listener), and it was kind of ok.
I recently installed another RPi with some further services, and I realized that as the number of services grows, the problem scales pretty badly. In general, I don't need all the services to communicate with any other, but I'm looking for an elegant solution to enable me to scale quickly in case I need to add other services.
So essentially I though I first need a map of where each service is, like
Service 1 -> RPi 1
Service 2 -> RPi 2
...
The first approach I came up with was the following:
I thought I could add an additional "gateway" service so that any application running in RPx would send its data/request to the gateway, and the gateway would then forward it to the proper service or the gateway running on the other device.
Later I also realized that I could actually just give the map to each service and let all the services manage their own connection. This would mean to open many listeners to the external address, though, and I'm not sure it's the best option.
Do you have any suggestions? I'm also interested in exploring different options to implement the actual connection, might the Client / Listener one not be efficient.
Thank you for your help. I'm learning so much with this project!

How can there be differences in what my Computer sends and what my Router receives?

Since this is my first question, please excuse me if I did anything wrong, I'm happy to learn :)
I have tried to solve this for about 3 months but couldn't get it to work. I think the fault is mine, but the only thing clear to me is that something is wrong. However I've run out of ideas where this could be.
tl;dr:
I'm having trouble with my desktop and router appearing to capture different traffic, without anything between those two. I have rewritten my scripts several times but couldn't get it to work.
Here is my context:
In my bachelor thesis I'm interested in middlebox behaviour.
For this I have a setup in which I have one Ubuntu Server machine set up as a router using dnsmasq and the isc-dhcp-server and another machine running Ubuntu Desktop connected to the Server machines subnet over ethernet.
To test the middleboxes, I'm calling every on of the Alexa top sites (for testing purposes either the top 10 or top 100) using Firefox + Selenium with each middlebox and once without anything between the Desktop and Server(Router). At the same time I'm logging the requested domains using tcpdump on the desktop and on the server. However for my question, the middleboxes are not really important, they're only illustrating why I'm doing this.
To illustrate my setup I made this diagram(I'm not allowed to post images since I don't have enough reputation):
The Desktop is looping through the Alexa List, whereas the server is in an infinite loop, until it receives a quit message from the Desktop.
In the Desktops script there are timeouts (I've experimented with timeouts between 3s and 60s) between each step. It cycles through the Alexa List with websites.
Tcpdumps are named according to the current domain+middlebox/plain.
Afterwards another python script is loading the tcpdumps, cycles through dns packets and creates a dictionary with IP:Domain mapping. Then it creates a dictionary with each domain from the Alexa list as a key and the value containing a set of subsequently called domains. This is done for traffic captured on the server and traffic captured on the Desktop, however they both use the Desktops DNS Dictionary.
Finally I have a Script comparing the generated Dictionaries.
To verify the differences between Desktop and Server for middleboxes, I compare the Plain pages as well. However there are always differences between the domains captured on the desktop and on the server. Usually between 2 and 5 subcalls per alexa domain differing (Those are subcalls I would expect other Alexa Domains to call. For example wikipedia.org is probably not calling facebook.com, but facebook.com itself probably is. Facebook showing up as a subcall of wikipedia is what irritates me). From my understanding this shouldn't be the case. In the beginning I was using the Python Library PyShark, but because those problems were appearing I thought using tcpdump directly might do the trick.
I tried setting bigger timeouts, I tried capturing all traffic in a single file and I tried rewriting every line of code I thought could be erroneous.
There has to be an error somewhere, but I can't seem to find it. I know there is always some package loss, but especially when connected directly through ethernet I can't imagine it being this high.
I expect unexpected behavior from the combination between selenium/firefox and tcpdump. Latency in startup/closing down of those may be an issue, but I don't think this could be longer than 60s. I also expect the Ubuntu Desktop to send auto update requests and other system services while I'm running the test, but first I don't think they're doing that many requests and second I have my iptables set up to only allow tcp requests from the user that starts the python script.
Thank you so much for taking the time.
If you have any ideas/remarks where I could have gone wrong, I'd be grateful to hear it. If you have further questions, please don't hesitate to ask.
EDIT:(Clarification about what I'm trying to achieve)
My hypothesis is, that if I call a domain with my desktop computers browser and capture the network traffic on both the desktop and the router, both captures should contain the same packets.
If I have a middlebox which is blocking some of the domains and put it between the desktop computer and router, comparing the domains appearing in the captured traffic on the pc and on the router should yield exactly those domains, which the middlebox blocked.
My Problem:
Even without a middlebox, there is a difference in the captured traffic and I don't know where it is coming from.
Example (I made this one up, I'll post a real one once I'm back at uni):
Expected behavior:
wikipedia.org: {On PC but not on Router: [], On Router but not on PC: []}
facebook.com: {On PC but not on Router: [], On Router but not on PC: []}
Actual behavior:
wikipedia.org: {On PC but not on Router: [facebook.com], On Router but not on PC: []}
facebook.com: {On PC but not on Router: [], On Router but not on PC: []}

Categories