jabber messages for file transfer - python

I'm interested in finding out the xml messages which are sent among two clients when a file is transferred from one to another(just for fun).
So far I've been finding out the xml messages involved in actions such as authentication, setting the status, sending a message etc. by using jabber.py - I modified xmlstream.py's network write function to print the data just before it writes it to the network.
However, jabber.py does not provide functions for file transfer. Could someone:-
Suggest a Python library that does that?
Or, show me some place where the xml messages sent from client to client would be documented.
Thanks.

Take a look at
XEP-0096: SI File Transfer
XEP-0234: Jingle File Transfer
for details (Edit: about the XML).
Edit: I don't know about the current support for these XPEs in Python libraries.

Related

Task processing in python

I need a help on technology choice for the following problem:
There are data files coming to the system to process. Some of them are self contained (.a) and can be processed immediately and some (.b) need to wait for more files to get a full set. I'm loading everything that arrives at the system to a DB assigning package ids and I can send a message on the MQ.
What I need here is a component that connects to that queue and listens to those messages. When it receives a message that file arrived it needs to do more or less:
If a file name is taskA.a then create a request to workerA.
If a file name is taskB.a then create a request to workerB.
If a file name is taskA.b and we got taskA.c before than send a request to workerA with both ids.
Which technology should I use. There are a few like Celery just its hard to find the proper one by just reading docs.

Python how to encrypt smb path?

How can I encrypt smb with python?
Basically writing to a share in a way that the path is concealed. I made this audit system that saves log files to specific path in a netapp that everyone can access.
The problem is that it sends the logs in cleartext and if someone uses wireshark they can figure the path immediately. What can I do to overcome it? Encryption? Run it with specific service that only it got access to that share?
Somehow conceal the path?
I have tried pysmb but it didn't quite work.
You have two or three options here:
Encrypt your logs; so that even if the location is known, the logs themselves are not easily read. This has the benefit of concealing information during transit, and while at rest (ie, while on disk).
So to read the logs you'll have to write a decryption tool. Now you have two problems. The first is, your tool needs to be written such that the encryption secret sauce you are using can't be figured out; and secondly if there is a problem in reading the logs - you won't know where to look - is it a problem with the decryption? Is is a problem with the encryption? Is it a problem with the hard disk itself? The network?
You also have to consider that logs are designed to be in plain text because eventually you will be reading/consuming those logs by some third party program.
For all that and more, this option isn't recommended.
You can prevent access to the file location. This way even if the location is discovered, the user will not have access to read the files. They can still read the information that's going across in transit.
You can encrypt the channel; and then make sure you count for the overhead that encryption brings.

Linux program to take newest ftp file and send to other ftp server

I was wondering if it was possible to take the newest files uploaded to an ftp server and send them to another ftp server. BUT, every file can only be sent once. If you can do this in python that would be nice, I know intermediate python. EXAMPLE:
2:14 PM file.txt is uploaded to the server. the program takes the file and sensd it to another server.
2:15 PM example.txt is uploaded to the server. the program takes just that file and sends it to another server.
I have searched online for this but cant find anything. Please help!
As you said that you already know python, I will give you some conceptual hints. Basically, you are looking for a one-way synchronisation. The main problem with this task is to make your program detect new files. The simplest way to do this is to create a database (note that by database I mean a way of storing data, not necessarly a specialized database). For example, a text file. In this database, each file will be recorded. Periodically, check the database with the current files (the basic ls or something similar will do). If a new file appears (meaning that there are files that are not in database), upload them.
This is the basic idea. You can improve it by using multi threading, some checks if a file has modified and so on.
EDIT: This is a programming way. As it has been suggested in comments, there are also some software solutions that will do this for you.

Listening to the Output of a Web Application

I have been trying, in vain, to make a program that reads text out loud using the web application found here (http://www.ispeech.org/text.to.speech.demo.php). It is a demo text-to-speech program, that works very well, and is relatively fast. What I am trying to do is make a Python program that would input text to the application, then output the result. The result, in this case, would be sound. Is there any way in Python to do this, like, say, a library? And if not, is it possible to do this through any other means? I have looked into the iSpeech API (found here), but the only problem with it is that there is a limited number of free uses (I believe that it is 200). While this program is only meant to be used a couple of times, I would rather it be able to use the service more then 200 times. Also, if this solution is impractical, could anyone direct me towards another alternative?
# AKX I am currently using eSpeak, and it works well. It just, well, doesn't sound too good, and it is hard to tell at times what is being said.
If using iSpeech is not required, there's a decent (it's surely not as beautifully articulated as many commercial solutions) open-source text-to-speech solution available called eSpeak.
It's usable from the command line (subprocess with Python), or as a shared library. It seems there's also a Python wrapper (python-espeak) for it.
Hope this helps.
OK. I found a way to do it, seems to work fine. Thanks to everyone who helped! Here is the code I'm using:
from urllib import quote_plus
def speak(text):
import pydshow
words = text.split()
temp = []
stuff = []
while words:
temp.append(words.pop(0))
if len(temp) == 24:
stuff.append(' '.join(temp))
temp = []
stuff.append(' '.join(temp))
for i in stuff:
pydshow.PlayFileWait('http://api.ispeech.org/api/rest?apikey=8d1e2e5d3909929860aede288d6b974e&format=mp3&action=convert&voice=ukenglishmale&text='+quote_plus(i))
if __name__ == '__main__':
speak('Hello. This is a text-to speech test.')
I find this ideal because it DOES use the API, but it uses the API key that is used for the demo program. Therefore, it never runs out. The key is 8d1e2e5d3909929860aede288d6b974e.
You can actually test this at work without the program, by typing the following into your address bar:
http://api.ispeech.org/api/rest?apikey=8d1e2e5d3909929860aede288d6b974e&format=mp3&action=convert&voice=ukenglishmale&text=
Followed by the text you want to speak. You can also adjust the language, by changing, in this case, the ukenglishmale to something else that iSpeech offers. For example, ukenglishfemale. This will speak the same text, but in a feminine voice.
NOTE: Pydshow is my wrapper around DirectShow. You can use yours instead.
The flow of your application would be like this:
Client-side: User inputs text into form, and form submits a request to server
Server: may be python or whatever language/framework you want. Receives http request with text.
Server: Runs text-to-speech either with pure python library or by running a subprocess to a utility that can generate speech as a wav/mp3/aiff/etc
Server: Sends HTTP response back by streaming file with a mime type to Client
Client: Receives the http response and plays the content
Specifically about step 3...
I don't have any particular advise on the most articulate open source speech synthesizing software available, but I can say that it does not have to necessarily be pure python, or even python at all for that matter. Most of these packages have some form of a command line utility to take stdin or a file and produce an audio file as output. You would simply launch this utility as a subprocess to generate the file, and then stream the file back in your http response.
If you decide to make use of an existing web service that provides text-to-speech via an API (iSpeech), then step 3 would be replaced with making your own server-side http request out to iSpeech, receiving the response and pretty much forwarding that response back to the original client request, like a proxy. I would say the benefit is not having to maintain your own speech synthesis solution or getting better quality that you could from an open source... but the downside is that you probably will have a bit more latency in your response time since your server has to make its own external http request and download the data first.

Twisted FTPFileListProtocol and file names with spaces

I am using Python and the Twisted framework to connect to an FTP site to perform various automated tasks. Our FTP server happens to be Pure-FTPd, if that's relevant.
When connecting and calling the list method on an FTPClient, the resulting FTPFileListProtocol's files collection does not contain any directories or file names that contain a space (' ').
Has anyone else seen this? Is the only solution to create a sub-class of FTPFileListProtocol and override its unknownLine method, parsing the file/directory names manually?
Firstly, if you're performing automated tasks on a retrieived FTP listing then you should probably be looking at NLST rather than LIST as noted in RFC 959 section 4.1.3:
NAME LIST (NLST)
...
This command is intended to return information that
can be used by a program to further process the
files automatically.
The Twisted documentation for LIST says:
It can cope with most common file listing formats.
This make me suspicious; I do not like solutions that "cope". LIST was intended for human consumption not machine processing.
If your target server supports them then you should prefer MLST and MLSD as defined in RFC 3659 section 7:
7. Listings for Machine Processing (MLST and MLSD)
The MLST and MLSD commands are intended to standardize the file and
directory information returned by the server-FTP process. These
commands differ from the LIST command in that the format of the
replies is strictly defined although extensible.
However, these newer commands may not be available on your target server and I don't see them in Twisted. Therefore NLST is probably your best bet.
As to the nub of your problem, there are three likely causes:
The processing of the returned results is incorrect (Twisted may be at fault, as you suggest, or perhaps elsewhere)
The server is buggy and not sending a correct (complete) response
The wrong command is being sent (unlikely with straight NLST/LIST, but some servers react differently if arguments are supplied to these commands)
You can eliminate (2) and (3) and prove that the cause is (1) by looking at what is sent over the wire. If this option is not available to you as part of the Twisted API or the Pure-FTPD server logging configuration, then you may need to break out a network sniffer such as tcpdump, snoop or WireShark (assuming you're allowed to do this in your environment). Note that you will need to trace not only the control connection (port 21) but also the data connection (since that carries the results of the LIST/NLST command). WireShark is nice since it will perform the protocol-level analysis for you.
Good luck.
This is somehow expected. FTPFileListProtocol isn't able to understand every FTP output, because, well, some are wacky. As explained in the docstring:
If you need different evil for a wacky FTP server, you can
override either C{fileLinePattern} or C{parseDirectoryLine()}.
In this case, it may be a bug: maybe you can improve fileLinePattern and makes it understand filename with spaces. If so, you're welcome to open a bug in the Twisted tracker.

Categories