Parse metadata from http live stream

Parse metadata from http live stream - python

I'd like to extract the info string from an internet radio streamed over HTTP. By info string I mean the short note about the currently played song, band name etc.
Preferably I'd like to do it in python. So far I've tried opening a socket but from there I got a bunch of binary data that I could not parse...
thanks for any hints

Sounds like you might need some stepping stone projects before you're ready for this. There's no reason to use a low-level socket library for HTTP. There are great tools both command line utilities and python standard library modules like urlopen2 that can handle the low level TCP and HTTP specifics for you.
Do you know the URL where you data resides? Have you tried something simple on the command line like using cURL to grab the raw HTML and then some basic tools like grep to hunt down the info you need? I assume here the metadata is actually available as HTML as opposed to being in a binary format read directly by the radio streamer (which presumably is in flash perhaps?).
Hard to give you any specifics because your question doesn't include any technical details about your data source.

Related

How can I transfer Data twice a day from an FTP Server to Python and then to a Website

I have climate data that is sent to a ftp server and I want to make it visable on our website. New data comes in twice a day. Beforehand the data needs to be prepared and I also want to visualize it preferably with python. I want to learn the best way to do this but have no idea where to start. I am not looking for a definite solution since I know this is a big project. I just need some suggestions on how to start or which tools might help me.

Like you said this is a project itself but I can give you some directions.
I would use an ftp library for handling pulling data to python for processing.
Check https://docs.python.org/3/library/ftplib.html
For visualization that would really depend on data you have. But generally, after processing the data, save it to a file(or db) and when backend recieves a request for that data, backend reads this file(or db) and you visualize it with javascript. Something like d3js would work.
Also you might use a tool for visualization but I don't know any(Power BI???).

Tools to convert CANopen raw data using *.dbc file?

I'm pretty new with CANopen, and also a little bit shooting in the dark...
I would like to know if there are tools or packages in R or Python to convert raw data logged from a CANopen device, to human readable values, with a *.dbc file?
Does someone have experience with this?
In advance thank you for your answers.

Looks like cantools might do the trick:
https://pypi.org/project/cantools/
It can decode CAN data using a DBC file. For actually reading the CAN bus it integrates with python-can.
https://python-can.readthedocs.io/en/master/index.html#
To read CAN data from a log file there's a module in python-can for that
https://python-can.readthedocs.io/en/master/listeners.html
And finally if you want to interact directly with a live CANopen bus there's the CANopen for Python library
https://canopen.readthedocs.io/en/latest/

Is pandas a local-only library

I recently started coding, but took a brief stint. I started a new job and I’m under some confidential restrictions. I need to make sure python and pandas are secure before I do this—I’ll also be talking with IT on Monday
I was wondering if pandas in python was a local library, or does the data get sent to or from elsewhere? If I write something in pandas—will the data be stored somewhere under pandas?
The best example of what I’m doing is best found on a medium article about stripping data from tables that don’t have csv Exports.
https://medium.com/#ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58

Creating a DataFrame out of a dict, doing vectorized operations on its rows, printing out slices of it, etc. are all completely local. I'm not sure why this matters. Is your IT department going to say, "Well, this looks fishy—but some random guy on the internet says it's safe, so forget our policies, we'll allow it"? But, for what it's worth, you have this random guy on the internet saying it's safe.
However, Pandas can be used to make network requests. Some of the IO functions can take a URL instead of a filename or file object. Some of them can also use another library that does so—e.g., if you have lxml installed, read_html, will pass the filename to lxml to open, and if that filename is an HTTP URL, lxml will go fetch it.
This is rarely a concern, but if you want to get paranoid, you could imagine ways in which it might be.
For example, let's say your program is parsing user-supplied CSV files and doing some data processing on them. That's safe; there's no network access at all.
Now you add a way for the user to specify CSV files by URL, and you pass them into read_csv and go fetch them. Still safe; there is network access, but it's transparent to the end user and obviously needed for the user's task; if this weren't appropriate, your company wouldn't have asked you to add this feature.
Now you add a way for CSV files to reference other CSV files: if column 1 is #path/to/other/file, you recursively read and parse path/to/other/file and embed it in place of the current row. Now, what happens if I can give one of your users a CSV file where, buried at line 69105, there's #http://example.com/evilendpoint?track=me (an endpoint which does something evil, but then returns something that looks like a perfectly valid thing to insert at line 69105 of that CSV)? Now you may be facilitating my hacking of your employees, without even realizing it.
Of course this is a more limited version of exactly the same functionality that's in every web browser with HTML pages. But maybe your IT department has gotten paranoid and clamped down security on browsers and written an application-level sniffer to detect suspicious followup requests from HTML, and haven't thought to do the same thing for references in CSV files.
I don't think that's a problem a sane IT department should worry about. If your company doesn't trust you to think about these issues, they shouldn't hire you and assign you to write software that involves scraping the web. But then not every IT department is sane about what they do and don't get paranoid about. ("Sure, we can forward this under-1024 port to your laptop for you… but you'd better not install a newer version of Firefox than 16.0…")

django/python: How to convert pptx/docx formats to PDF using python?

First of all, I agree that this might sound like a question which has already been asked many times in the past. However I couldn't find any answer that was relevant to me in the similar questions so I'll try to be more specific.
I would need to transform PPTX/DOCX files into PDF using Python but I don't have any experience in file format conversion. I have been looking in many places/forums/websites, read a lot of documentation and came across some useful libraries (python-pptx and pyPdf mainly), but I still don't know where to start.
When looking on the Internet, I can see many websites that offer file format conversions as a paying service, even with advanced API's: submit a file via POST and get the transformed PDF file in return. This could work for me, but I am really interested in writing myself the code that does the conversion work from OOXML to PDF.
How would you start doing this? Or is it just impossible on my own?
Thanks for your help!

After some research and with the help of python-pptx's creator, I was able to write to the PowerPoint COM interface using a Virtual Machine.
In case someone reads this thread, this is how I managed to get this done:
- Setup a VM with Microsoft Windows/Office installed on it ;
- Install Python, Django and win32com libraries on the VM.
The files are sent locally from the original Django project to the virtual machine (which are on the same network) through a simple POST request. The file is converted on the VM using win32com.client (which is just a simple call to the win32com.client library) and then sent back as a response to the original Django view, which in turn processes the response.
Note: it took me some time to realize I needed to use the #csrf_exempt decorator for this setup to work.

Output PCL from Word document using Python

I'm building a web application which will include functionality that takes MS Word (and possibly input from a web-based rich text editor) documents, substitutes values into the formfield placeholders in those documents, and generates a PCL document as output.
I'm developing in python and django on windows, but this whole solution will need to be deployed to a web host (yet to be chosen), which in practice means that the solution will need to run on linux.
I'm open to linux-only solutions if that's the only way. I'm open to solutions that involve talking to a server written in another language. I am able to write C++ or java if necessary to get this done. The final output does have to be in PCL format.
My question is: what is a good tool chain for generating PCL from word documents using python?
I am considering using some kind of interface to openoffice to open the word documents, do the substitutions, and send the output to some kind of printer driver. Does anyone have experience with this? What libraries would you recommend?
Options for interfacing that I have identified include the following; any other suggestions would be greatly welcomed:
Ulif.openoffice: http://pypi.python.org/pypi/ulif.openoffice/0.4
Py3o.renderserver: https://bitbucket.org/faide/py3o.renderserver
OpenOffice-python: http://openoffice-python.origo.ethz.ch/
A second approach would be to use something like paradocx ( https://bitbucket.org/yougov/paradocx/wiki/Home ) to open the word files, do the substitutions using that in python, then somehow interface with something that can output PCL. Again, any experience or comments on this approach would be appreciated.
I will very much appreciate any comments on tools and toolchains, and ideas or recipes that you may have.
This question covers similar ground to, but is not the same as: How to Create PCL file from MS word

Ghostscript can read PS (Postscript) or PDF and create PCL. You can use python libraries or just subprocess....

OK, so my final solution involved creating a java webservice to perform my transcoding.
Docx4j provides a class org.docx4j.convert.out.pdf.viaXSLFO.Conversion which hooks into apache FOP to convert Docx to PDF; that can be easily hacked to convert to PCL (because FOP outputs PCL)
Spark is a lightweight java web framework which allowed me to wrap my transcoder in a web service
Because I also manipulate the document, I need to have some metadata, so the perfect thing is a multipart form. I decode that using Apache Fileupload
In almost all cases, I had to upgrade to the development versions of libraries to get this to work.
On the python side I use:
requests to communicate with the web service
poster to prepare the multi-part request

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.