Read SPSS Viewer files (.spv) with Python - python

There seems to be a million questions on reading SPSS .sav files via Python but nothing on reading SPSS .spv files, aka Viewer Files. See image below, highlighted in yellow.
My aim is to read the information within it (usually frequencies, tables, charts etc) and do something fun with it. I know you can export the same information in excel but I want to know if I can work directly with the .spv file.
Is this possible?

Related

How to move and duplicate multiple images in Excel worksheets using Python?

I have Excel worksheets that I would like to move / duplicate multiple images in using Python (choosing Python due to reasons of existing infrastructure and technology choices).
Using xlwings, I can insert an image from either a .png or .jpg file, but cannot insert an image that comes from another worksheet in the workbook.
I am aware of the ability to insert a fig, but this appears to work only for matplotlib generated figures. Looking through the documentation or searching online have not found relevant answers.
Using Openpyxl, I can insert images into a worksheet from a file (.png or .jpg file), but multiple copies of the same image causes the file to be corrupted upon save due to Zip issues - and the images still need to come from a file not elsewhere in the workbook.
Any help on either of these modules or an explanation of how to do it another way in Python would be great.

How to retrieve highlighted sequences of an .epub file in Python?

I'm used to highlight the important parts of the .epub books I read on my Kobo reader, and I'd like to write a script extracting these highlighted parts, and saving them in a .txt file.
I've checked out epub library documentation, but I couldn't find anything relevant to my problem.
Can anyone give me some tips on how to select and save the highlighted parts of my epub files? Is there a specific tag I should search in the file?
The highlights and the annotations are stored in the KoboReader.sqlite file, not in each EPUB file.
You might want to check my script out:
http://github.com/pettarin/export-kobo
or search for Calibre plugins (there are several ones that can export highlights/annotations).

open and read a Sigmaplot .JNB file in Python

My lab has a very large directory of Sigmaplot files, saved as .JNB . I would like to process the data in these files using Python. However, I have thus far been unable to read the files into anything interpretable.
I've already tried pretty much every numpy read function and most the panda read functions, and am getting nothing but gibberish.
Does anyone have any advice about reading these files short of exporting them all to excel one by one?

XLSX to XML with schema map

I have built a couple basic workflows using XML tools on top of XLSX workbooks that are mapped to an XML schema. You would enter data into the spreadsheet, export the XML and I had some scripts that would then work with the data.
Now I'm trying to eliminate that step and build a more integrated and portable tool that others could use easily by moving from XSLT/XQuery to Python. I would still like to use Excel for the data entry, but have the Python script read the XLSX file directly.
I found a bunch of easy to use libraries to read from Excel but they need to explicitly state what cells the data is in, like range('A1:C2') etc. The useful thing about using the XML maps was that users could resize or even move tables to fit different rows and rename sheets. Is their a library that would let me select tables as units?
Another approach I tried was to just uncompress the XLSX and just parse the XML directly. The problem with that is that our data is quite complex (taking up to 30-50 sheets) and parsing that in the uncompressed XLSX structure is really daunting. I did find my XML schema within the uncompressed XLSX, so is there any way to reformat the data into this schema outside of Excel? (basically what Excel does when I save a workbook as an .xml file)
The Excel format is pretty complicated with dependencies between components – you can't for example be sure of that the order of the worksheets in the folder worksheets has any bearing to what the file looks like in Excel.
I don't really understand exactly what you're trying to do but the existing libraries present an interface for client code that hides the XML layer. If you don't want that you'll have to root around for the parts that you find useful. In openpyxl you want to look at the stuff in openpyxl/reader specifically worksheet.py.
However, you might have better luck using lxml as this (using libxml2 in the background) will allow you load a single XML into Python and manipulate it directly using the .objectify() method. We don't do this in openpyxl because XML trees consume a lot of memory (and many people have very large worksheets) but the library for working with Powerpoint shows just how easy this can be.

How to programmatically import csv into excel and use excel formatting?

I have a very large (> 2 million rows) csv file that is being generated and viewed in an internal web service. The problem is that when users of this system want to export this csv to run custom queries, they open these files in excel. Excel is formatting the numbers the best it can, but there are some requests to have the data in xlsx format with filters and whatnot.
The question boils down to: Using python2.7, how can I read a large csv file (>2 million rows) into excel (or multiple excel files) and control the formatting? (dates, numbers, autofilters, etc)
I am open to python and internal excel solutions.
Without more information about the data types in the csv, or your exact issue with EXCEL properly handling those data types, it's hard to give you an exact answer.
However, recommending looking at this module (https://xlsxwriter.readthedocs.org/) which can be used in Python to create xlsx files. I haven't used it, but it seems to have more features than you need.
Especially if you need to split between multiple files, or workbooks. And it looks like you can pre-create the filters and have total control over the formating

Categories