Write UDP data to CSV in Python? - python

DISCLAIMER: I am a total Python n00b and have never ever written anything in Python, I haven't programmed anything in years, and the last language I learned was Visual Basic 6. So bear with me!
So I have an Android app that transmits my phone's sensor (accelerometer, magnet, light etc) data to my Windows PC via UDP, and I have a Python 3.3 script to display that data on screen, and write it to a CSV:
#include libraries n stuff
import socket
import traceback
import csv
#assign variables n stuff
host = ''
port = 5555
csvf = 'accelerometer.csv'
#do UDP stuff
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
s.bind((host, port))
#do CSV stuff
with open(csvf, 'w', newline='', encoding='ascii') as csv_handle:
csv_writer = csv.writer(csv_handle, delimiter=',')
while 1:
try:
message, address = s.recvfrom(8192)
print(message) #display data on screen
csv_writer.writerow(message) #write data to CSV
except(KeyboardInterrupt, SystemExit):
raise
except:
traceback.print_exc()
The data on screen looks like this, which is correct:
b'7407.75961, 3, 0.865, 1.423, 9.022, 5,
The data in the CSV file looks like the numerical values of the ASCII codes of the data (note: codes won't match with above because data is slightly different):
57,48,48,50,46,54,51,57,57,57,44,32,51,44,32,32,32,48,46,53,52,57,44,32,32,53,46,54,56,56,44,32,32,56,46,51,53,53
How can I get my CSV to just write the string that the UDP socket is receiving? I tried adding "encoding='ascii'", as you can see, but that didn't make a difference from leaving it out.

writerow expects a sequence or iterable of values. But you just have one value.
The reason it sort of works, but does the wrong thing, is that your one value—a bytes string—is actually itself a sequence. But it's not a sequence of the comma-separated values, it's a sequence of bytes.
So, how do you get a sequence of the separate values?
One option is to use split. Either message.split(b', ') or map(str.strip, message.split(b',')) seems reasonable here. That will give you this sequence:
[b'7407.75961', b'3', b'0.865', b'1.423', b'9.022', b'5']
But really, this is exactly what the csv module is for. You can, e.g., wrap the input in a BytesIO and pass it to a csv.reader, and then you just copy rows from that reader to the writer.
But if you think about it, you're getting data in csv format, and you want to write it out in the exact same csv format, without using it in any other way… so you don't even need the csv module here. Just use a plain old binary file:
with open(csvf, 'wb') as csv_handle:
while True:
try:
message, address = s.recvfrom(8192)
print(message) #display data on screen
csv_handle.write(message + b'\n')
except(KeyboardInterrupt, SystemExit):
raise
except:
traceback.print_exc()
While we're at it, you almost never need to do this:
except(KeyboardInterrupt, SystemExit):
raise
except:
traceback.print_exc()
The only exceptions in Python 3.x that don't inherit from Exception are KeyboardInterrupt, SystemExit, GeneratorExit (which can't happen here), and any third-party exceptions that go out of their way to act like KeyboardInterrupt and SystemExit. So, just do this:
except Exception:
traceback.print_exc()

Try:
csv_writer.writerow([message])
csv_writer.writerow expects an iterable as it's first argument, which would be a list of comma-separated values. In your example the message is a string, which is iterable so the csv_writer writes each character into separate, comma-separated value.

Related

Python Socket is receiving inconsistent messages from Server

So I am very new to networking and I was using the Python Socket library to connect to a server that is transmitting a stream of location data.
Here is the code used.
import socket
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((gump.gatech.edu, 756))
try:
while (1):
data = s.recv(BUFFER_SIZE).decode('utf-8')
print(data)
except KeyboardInterrupt:
s.close()
The issue is that the data arrives in inconsistent forms.
Most of the times it arrives in the correct form like this:
2016-01-21 22:40:07,441,-84.404153,33.778685,5,3
Yet other times it can arrive split up into two lines like so:
2016-01-21
22:40:07,404,-84.396004,33.778085,0,0
The interesting thing is that when I establish a raw connection to the server using Putty I only get the correct form and never the split. So I imagine that there must be something happening that is splitting the message. Or something Putty is doing to always assemble it correctly.
What I need is for the variable data to contain the proper line always. Any idea how to accomplish this?
It is best to think of a socket as a continuous stream of data, that may arrive in dribs and drabs, or a flood.
In particular, it is the receivers job to break the data up into the "records" that it should consist of, the socket does not magically know how to do this for you. Here the records are lines, so you must read the data and split into lines yourself.
You cannot guarantee that a single recv will be a single full line. It could be:
just part of a line;
or several lines;
or, most probably, several lines and another part line.
Try something like: (untested)
# we'll use this to collate partial data
data = ""
while 1:
# receive the next batch of data
data += s.recv(BUFFER_SIZE).decode('utf-8')
# split the data into lines
lines = data.splitlines(keepends=True)
# the last of these may be a part line
full_lines, last_line = lines[:-1], lines[-1]
# print (or do something else!) with the full lines
for l in full_lines:
print(l, end="")
# was the last line received a full line, or just half a line?
if last_line.endswith("\n"):
# print it (or do something else!)
print(last_line, end="")
# and reset our partial data to nothing
data = ""
else:
# reset our partial data to this part line
data = last_line
The easiest way to fix your code is to print the received data without adding a new line, which the print statement (Python 2) and the print() function (Python 3) do by default. Like this:
Python 2:
print data,
Python 3:
print(data, end='')
Now print will not add its own new line character to the end of each printed value and only the new lines present in the received data will be printed. The result is that each line is printed without being split based on the amount of data received by each `socket.recv(). For example:
from __future__ import print_function
import socket
s = socket.socket()
s.connect(('gump.gatech.edu', 756))
while True:
data = s.recv(3).decode('utf8')
if not data:
break # socket closed, all data read
print(data, end='')
Here I have used a very small buffer size of 3 which helps to highlight the problem.
Note that this only fixes the problem from the POV of printing the data. If you wanted to process the data line-by-line then you would need to do your own buffering of the incoming data, and process the line when you receive a new line or the socket is closed.
Edit:
socket.recv() is blocking and like the others said, you wont get an exact line each time you call the method. So as a result, the socket is waiting for data, gets what it can get and then returns. When you print this, because of pythons default end argument, you may get more newlines than you expected. So to get the raw stuff from your server, use this:
import socket
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('gump.gatech.edu', 756))
try:
while (1):
data=s.recv(BUFFER_SIZE).decode('utf-8')
if not data: break
print(data, end="")
except KeyboardInterrupt:
s.close()

Converting data from serial/usb using PySerial

I have a UBlox receiver connected to my computer and I am trying to read it using PySerial however I am new to python and was hoping to get some clarification/help on understanding the data.
My code looks like:
import serial
# open the connection port
connection = serial.Serial('/dev/ttyACM0', 9600)
# open a file to print the data. I am doing this to make
# sure it is working
file1 = open('output_file', 'wb+')
# All messages from ublox receivers end with a carriage return
# and a newline
msg = connection.readline()
# print the message to the file
print >> file1, msg
What I get in the file, and when I print the 'type' of msg it is a list:
['\xb5b\x01\x064\x00\xe0\x88\x96#\xd3\xb9\xff\xffX\x07\x03\xdd6\xc31\xf6\xfd)\x18\xea\xe6\x8fd\x1d\x00\x01\x00\x00\x00\x00\x00\x00\xfd\xff\xff\xff\x01\x00\x00\x00\x02\x00\x00\x00p\x00\x02\x0f\x16\xa2\x02\x00\x9c\xeb\xb5b\x01\x07\\x00\xe0\x88\x96#\xe0\x07\x01\x17\x15237\x04\x00\x00\x00\xd6\xb9\xff\xff\x03\x01\n']
["\x1a\x0c\x04\x19'y\x00$\xf7\xff\xff\x1a\x1d\x04\x01\x00\x007\x00\x00\x00\x00\x00\x02\x1f\x0c\x01\x00+:\x00\x00\x00\x00\x00\x01 \r\x07&-\x9f\x00\xff\x01\x00\x00\x17\xc1\x0c\x04\x16\n"]
In order to interpret/decode the ublox messages have two format types. Some of the messages are in NMEA format(basically comma delimited)
$MSG, 1, 2, 3, 4
Where the other messages are straight hexidecimal, where each byte or set of bytes represent some information
[AA BB CC DD EE]
So my question is: is there a way I can interpret/convert the data from serial connection to a readable or more usable format so I can actually work with the messages. Like I said, I am new to python and more used to C++ style strings or array of characters
`
A typical parsing task. In this case, it'll probably be the simplest to make tokenization two-stage:
read the data until you run into a message boundary (you didn't give enough info on how to recognize it)
split the read message into its meaningful parts
for type 1, it's likely re.split(", *",text)
for type 2, none needed
display the parts however you want
Regarding why serial.Serial.readline returns a list. I consulted the sources - serial.Serial delegates readline to io.IOBase, and its source indeed shows that it should return a bytestring.
So, the function might be overridden in your code by something. E.g. what do print connection.readline and print serial.Serial.readline show?

Receive image in Python

The following code is for a python server that can receive a string.
import socket
TCP_IP = '127.0.0.1'
TCP_PORT = 8001
BUFFER_SIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((TCP_IP, TCP_PORT))
s.listen(1)
conn, addr = s.accept()
print 'Connection address:', addr
while 1:
length = conn.recv(1027)
data = conn.recv(int(length))
import StringIO
buff = StringIO.StringIO()
buff.write(data)
if not data: break
print "received data:", data
conn.send('Thanks') # echo
get_result(buff)
conn.close()
Can anyone help me to edit this code or create a similar one to be able to receive images instead of string?
First, your code actually can't receive a string. Sockets are byte streams, not message streams.
This line:
length = conn.recv(1027)
… will receive anywhere from 1 to 1027 bytes.
You need to loop around each recv and accumulate a buffer, like this:
def recvall(conn, length):
buf = b''
while len(buf) < length:
data = conn.recv(length - len(buf))
if not data:
return data
buf += data
return buf
Now you can make it work like this:
while True:
length = recvall(conn, 1027)
if not length: break
data = recvall(conn, int(length))
if not data: break
print "received data:", data
conn.send('Thanks') # echo
You can use StringIO or other techniques instead of concatenation for performance reasons, but I left that out because it's simpler and more concise this way, and understanding the code is more important than performance.
Meanwhile, it's worth pointing out that 1027 bytes is a ridiculous huge amount of space to use for a length prefix. Also, your sending code has to make sure to actually send 1027 bytes, no matter what. And your responses have to always be exactly 6 bytes long for this to work.
def send_string(conn, msg):
conn.sendall(str(len(msg)).ljust(1027))
conn.sendall(msg)
response = recvall(conn, 6)
return response
But at least now it is workable.
So, why did you think it worked?
TCP is a stream of bytes, not a stream of messages. There's no guarantee that a single send from one side will match up with the next recv on the other side. However, when you're running both sides on the same computer, sending relatively small buffers, and aren't loading the computer down too badly, they will often happen to match up 1-to-1. After all, each time you call recv, the other side has probably only had time to send one message, which is sitting in the OS's buffers all by itself, so the OS just gives you the whole thing. So, your code will appear to work in initial testing.
But if you send the message through a router to another computer, or if you wait long enough for the other side to make multiple send calls, or if your message is too big to fit into a single buffer, or if you just get unlucky, there could be 2-1/2 messages waiting in the buffer, and the OS will give you the whole 2-1/2 messages. And then your next recv will get the leftover 1/2 message.
So, how do you make this work for images? Well, it depends on what you mean by that.
You can read an image file into memory as a sequence of bytes, and call send_string on that sequence, and it will work fine. Then the other side can save that file, or interpret it as an image file and display it, or whatever it wants.
Alternatively, you can use something like PIL to parse and decompress an image file into a bitmap. Then, you encode the header data (width, height, pixel format, etc.) in some way (e.g., pickle it), send_string the header, then send_string the bitmap.
If the header has a fixed size (e.g., it's a simple structure that you can serialize with struct.pack), and contains enough information for the other side to figure out the length of the bitmap in bytes, you don't need to send_string each one; just use conn.sendall(serialized_header) then conn.sendall(bitmap).

Python: Upload huge amount of files via FTP

I'm developing a python script that monitors a directory (using libinotify) for new files and for each new file it does some processing and then copies it to a storage server. We were using a NFS mount but had some performance issues and now we are testing with FTP. It looks that FTP is using far less resources than nfs ( the load is always under 2, with nfs it was above 5 ).
The problem we are having now is the amount of connections that keeps open in TIME_WAIT state. The storage has peaks of about 15k connections in time wait.
I was wondering if there is some way to re-use previous connection for new transfers.
Anyone knows if there is some way of doing that?
Thanks
Here's a new answer, based on the comments to the previous one.
We'll use a single TCP socket, and send each file by alternating sending name and contents, as netstrings, for each file, all in one big stream.
I'm assuming Python 2.6, that the filesystems on both sides use the same encoding, and that you don't need lots of concurrent clients (but you might occasionally need, say, two—e.g., the real one, and a tester). And I'm again assuming you've got a module filegenerator whose generate() method registers with inotify, queues up notifications, and yields them one by one.
client.py:
import contextlib
import socket
import filegenerator
sock = socket.socket()
with contextlib.closing(sock):
sock.connect((HOST, 12345))
for filename in filegenerator.generate():
with open(filename, 'rb') as f:
contents = f.read()
buf = '{0}:{1},{2}:{3},'.format(len(filename), filename,
len(contents), contents)
sock.sendall(buf)
server.py:
import contextlib
import socket
import threading
def pairs(iterable):
return zip(*[iter(iterable)]*2)
def netstrings(conn):
buf = ''
while True:
newbuf = conn.recv(1536*1024)
if not newbuf:
return
buf += newbuf
while True:
colon = buf.find(':')
if colon == -1:
break
length = int(buf[:colon])
if len(buf) >= colon + length + 2:
if buf[colon+length+1] != ',':
raise ValueError('Not a netstring')
yield buf[colon+1:colon+length+1]
buf = buf[colon+length+2:]
def client(conn):
with contextlib.closing(conn):
for filename, contents in pairs(netstrings(conn)):
with open(filename, 'wb') as f:
f.write(contents)
sock = socket.socket()
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
with contextlib.closing(sock):
sock.bind(('0.0.0.0', 12345))
sock.listen(1)
while True:
conn, addr = sock.accept()
t = threading.Thread(target=client, args=[conn])
t.daemon = True
t.start()
If you need more than about 200 clients on Windows, 100 on linux and BSD (including Mac), a dozen on less good platforms, you probably want to go with an event loop design instead of a threaded design, using epoll on linux, kqueue on BSD, and IO completion ports on Windows. This an be painful, but fortunately, there are frameworks that wrap everything up for you. Two popular (and very different) choices are Twisted and gevent.
One nice thing about gevent in particular is that you can write threaded code today, and with a handful of simple changes turn it into event-based code like magic.
On the other hand, if you're eventually going to want event-based code, it's probably better to learn and use a framework from the start, so you don't have to deal with all the fiddly bits of accepting and looping around recv until you get a full message and shutting down cleanly and so on, and just write the parts you care about. After all, more than half the code above is basically boilerplate for stuff that every server shares, so if you don't have to write it, why bother?
In a comment, you said:
Also the files are binary, so it's possible that I'll have problems if client encodings are diferent from server's.
Notice that I opened each file in binary mode ('rb' and 'wb'), and intentionally chose a protocol (netstrings) that can handle binary strings without trying to interpret them as characters or treat embedded NUL characters as EOF or anything like that. And, while I'm using str.format, in Python 2.x that won't do any implicit encoding unless you feed it unicode strings or give it locale-based format types, neither of which I'm doing. (Note that in 3.x, you'd need to use bytes instead of str, which would change a bit of the code.)
In other words, the client and server encodings don't enter into it; you're doing a binary transfer exactly the same as FTP's I mode.
But what if you wanted the opposite, to transfer text and reencode automatically for the target system? There are three easy ways to do that:
Send the client's encoding (either once at the top, or once per file), and on the server, decode from the client and reencode to the local file.
Do everything in text/unicode mode, even the socket. This is silly, and in 2.x it's hard to do as well.
Define an wire encoding—say, UTF-8. The client is responsible for decoding files and encoding to UTF-8 for send; the server is responsible for decoding UTF-8 on receive and encoding files.
Going with the third option, assuming that the files are going to be in your default filesystem encoding, the changed client code is:
with io.open(filename, 'r', encoding=sys.getfilesystemencoding()) as f:
contents = f.read().encode('utf-8')
And on the server:
with io.open(filename, 'w', encoding=sys.getfilesystemencoding()) as f:
f.write(contents.decode('utf-8'))
The io.open function also, by default, uses universal newlines, so the client will translate anything into Unix-style newlines, and the server will translate to its own native newline type.
Note that FTP's T mode actually doesn't do any re-encoding; it only does newline conversion (and a more limited version of it).
Yes, you can reuse connections with ftplib. All you have to do is not close them and keep using them.
For example, assuming you've got a module filegenerator whose generate() method registers with inotify, queues up notifications, and yields them one by one:
import ftplib
import os
import filegenerator
ftp = ftplib.FTP('ftp.example.com')
ftp.login()
ftp.cwd('/path/to/store/stuff')
os.chdir('/path/to/read/from/')
for filename in filegenerator.generate():
with open(filename, 'rb') as f:
ftp.storbinary('STOR {}'.format(filename), f)
ftp.close()
I'm a bit confused by this:
The problem we are having now is the amount of connections that keeps open in TIME_WAIT state.
It sounds like your problem is not that you create a new connection for each file, but that you never close the old ones. In which case the solution is easy: just close them.
Either that, or you're trying to do them all in parallel, but don't realize that's what you're doing.
If you want some parallelism, but not unboundedly so, you can easily, e.g. create a pool of 4 threads, each with an open ftplib connection, each reading from a queue, and then an inotify thread that just pushed onto that queue.

How to read rackspace cloudfile into string in python?

I want to parse logfiles from rackspace. I'm using the official python sdk.
I have previously saved the file to disk and then read it from there with gzip.open.
Now I'm on heroku and can't / don't want to save the file to disk, but do the unzipping in memory.
However, I can't manage to download the object as string or pseudo file object to handle it.
Does someone has an idea?
logString = ''
buffer = logfile.stream()
while True:
try:
logString += buffer.next()
except StopIteration:
break
# logString is always empty here
# I'd like to have something that enables me to do this:
for line in zlib.decompress(logString):
# having each line of the log here
Update
I've noticed, that the empty string is not entirely true. This is going through a loop, and just the first occurence is empty. The next occurences I do have data (that looks like it's gzipped), but I get this zlib error:
zlib.error: Error -3 while decompressing data: incorrect header check
Update II
As suggested, I implemented cStringIO, with the same result:
buffer = logfile.stream()
output = cStringIO.StringIO()
while True:
try:
output.write(buffer.next())
except StopIteration:
break
print(output.getvalue())
Update III
This does work now:
output = cStringIO.StringIO()
try:
for buffer in logfile.stream():
output.write(buffer)
except StopIteration:
break
And at least no crash in here, but it seems not to get actual lines:
for line in gzip.GzipFile(fileobj=output).readlines():
# this is never reached
How to proceed here? Is there some easy way to see the incoming data as normal string to know if I'm on the right way?
I found out, that read() is also an option, that led to an easy solution like this:
io = cStringIO.StringIO(logfile.read())
for line in GzipFile(fileobj=io).readlines():
impression = LogParser._parseLine(line)
if impression is not None:
impressions.append(impression)

Categories