Read Binary string in Python, zlib - python

I want to store a large JSON (dict) from Python in dynamoDB.
After some investigation it seems that zlib is the way to go to get compression at a good level. Using below Im able to encode the dict.
ranking_compressed = zlib.compress(simplejson.dumps(response["Item"]["ranking"]).encode('utf-8'))
The (string?) then looks like this: b'x\x9c\xc5Z\xdfo\xd3....
I can directly decompress this and get the dict back with:
ranking_decompressed = simplejson.loads(str(zlib.decompress(ranking_compressed).decode('utf-8')))
All good so far. However, when putting this in dynamoDB and then reading it back using the same decompress code as above. The (string?) now looks like this:
Binary(b'x\x9c\xc5Z\xdf...
The error I get is:
bytes-like object is required, not 'Binary'
Ive tried accessing the Binary with e.g. .data but I cant reach it.
Any help is appreciated.

Boto3 Binary objects have a value property.
# in general...
binary_obj.value
# for your specific case...
ranking_decompressed = simplejson.loads(str(zlib.decompress(response["Item"]["ranking_compressed"].value).decode('utf-8')))
Oddly, this seems to be documented nowhere except the source code for the Binary class here

Related

How to properly return dataframe as JSON using FastAPI?

I created an API using FastAPI that returned a JSON. First, I used to turn the Dataframe to JSON using the Pandas .to_json() method, which allowed me to choose the correct "orient" parameter. This saved a .json file and then opened it to make fastAPI return it as it follows:
DATA2.to_json("json_records.json",orient="records")
with open('json_records.json', 'r') as f:
data = json.load(f)
return(data)
This worked perfectly, but i was told that my script shouldn't save any files since this script would be running on my company's server, so I had to directly turn the dataframe into JSON and return it. I tried doing this:
data = DATA2.to_json(orient="records")
return(data)
But now the API's output is a JSON full of "\". I guess there is a problem with the parsing but i can't really find a way to do it properly.
The output now looks like this:
"[{\"ExtraccionHora\":\"12:53:00\",\"MiembroCompensadorCodigo\":117,\"MiembroCompensadorDescripcion\":\"OMEGA CAPITAL S.A.\",\"CuentaCompensacionCodigo\":\"1143517\",\"CuentaNeteoCodigo\":\"160234117\",\"CuentaNeteoDescripcion\":\"UNION FERRO SRA A\",\"ActivoDescripcion\":\"X17F3\",\"ActivoID\":8,\"FinalidadID\":2,\"FinalidadDescripcion\":\"Margenes\",\"Cantidad\":11441952,\"Monto\":-16924935.3999999985,\"Saldo\":-11379200.0,\"IngresosVerificados\":11538288.0,\"IngresosNoVerificado\":0.0,\"MargenDelDia\":0.0,\"SaldoConsolidadoFinal\":-16765847.3999999985,\"CuentaCompensacionCodigoPropia\":\"80500\",\"SaldoCuentaPropia\":-7411284.3200000003,\"Resultado\":\"0\",\"MiembroCompensadorID\":859,\"CuentaCompensacionID\":15161,\"CuentaNeteoID\":7315285}.....
What would be a proper way of turning my dataframe into a JSON using the "records" orient, and then returning it as the FastAPI output?
Thanks!
update: i changed the to_json() method to to_dict() using the same parameters and seems to work... don't know if its correct.
data = DATA2.to_dict(orient="records")
return(data)

What happens exactly in the i/o of json files?

I struggled with the following for a couple of hours yesterday. I figured out a workaround, but I'd like to understand a little more of what's going on in the background and, ideally, I'd like to remove the intermediate file from my code just for the sake of elegance. I'm using python, by the way and files_df starts off as a pandas dataframe.
Can you help me understand why the following code gives me an error.
files_json = files_df.to_json(orient='records')
for file_json in files_json:
print(file_json) #do stuff
But this code works?
files_json = files_df.to_json(orient='records')
with open('export_json.json', 'w') as f:
f.write(files_json)
with open('export_json.json') as data:
files_json = json.load(data)
for file_json in files_json:
print(file_json) #do stuff
Obviously, the export/import is converting the data somehow into a usable format. I would like to understand that a little better and know if there is some option within the pandas files_df.to_json command to perform the same conversion.
json.load is the opposite of json.dump, but you export from pandas data frames into file and than import again with standard library into some sort of python structure.
Try files_df.to_dict

Converting JSON file to SQLITE or CSV

I'm attempting to convert a JSON file to an SQLite or CSV file so that I can manipulate the data with python. Here is where the data is housed: JSON File.
I found a few converters online, but those couldn't handle the quite large JSON file I was working with. I tried using a python module called sqlbiter but again, like the others, was never really able to output or convert the file.
I'm not. sure where to go now, if anyone has any recommendations or insights on how to get this data into a database, I'd really appreciate it.
Thanks in advance!
EDIT: I'm not looking for anyone to do it for me, I just need to be pointed in the right direction. Are there other methods I haven't tried that I could learn?
You can utilize pandas module for this data processing task as follows:
First, you need to read the JSON file using with, open and json.load.
Second, you need to change the format of your file a bit by changing the large dictionary that has a main key for every airport into a list of dictionaries instead.
Third, you can now utilize some pandas magic to convert your list of dictionaries into a DataFrame using pd.DataFrame(data=list_of_dicts).
Finally, you can utilize pandas's to_csv function to write your DataFrame as a CSV file into disk.
It would look something like this:
import pandas as pd
import json
with open('./airports.json.txt','r') as f:
j = json.load(f)
l = list(j.values())
df = pd.DataFrame(data=l)
df.to_csv('./airports.csv', index=False)
You need to load your json file and parse it to have all the fields available, or load the contents to a dictionary, then you could using pyodbc to write to the database these fields, or write them to the csv if you use import csv first.
But this is just a general idea. You need to study python and how to do every step.
For instance for writting to the database you could do something like:
for i in range(0,max_len):
sql_order = "UPDATE MYTABLE SET MYTABLE.MYFIELD ...."
cursor1.execute(sql_order)
cursor1.commit()

Open BytesIO (xlsx) with xlrd

I'm working with Django and need to read the sheets and cells of an uploaded xlsx file. It should be possible with xlrd but because the file has to stay in memory and may not be saved to a location I'm not sure how to continue.
The start point in this case is a web page with an upload input and a submit button. When submitted the file is caught with request.FILES['xlsx_file'].file and send to a processing class that would have to extract all the important data for further processing.
The type of request.FILES['xlsx_file'].file is BytesIO and xlrd is not able to read that type because of no getitem methode.
After converting the BytesIO to StringIO the error messages seems to stay the same '_io.StringIO' object has no attribute '__getitem__'
file_enc = chardet.detect(xlsx_file.read(8))['encoding']
xlsx_file.seek(0)
sio = io.StringIO(xlsx_file.read().decode(encoding=file_enc, errors='replace'))
workbook = xlrd.open_workbook(file_contents=sio)
I'm moving my comment into an answer of it's own. It related to the example code (which includes decoding) given in the updated question:
Ok, thanks for your pointers. I downloaded xlrd and tested it locally. It seems the best way to go here is to pass it a string ie. open_workbook(file_contents=xlsx_file.read().decode(encoding=file_enc, errors='replace')). I misunderstood the docs, but I'm positive that file_contents= will work with a string.
Try xlrd.open_workbook(file_contents=request.FILES['xlsx_file'].read())
I had a similar problem but in my case I needed to unit test a Djano app with download by the user of an xls file.
The basic code using StringIO worked for me.
class myTest(TestCase):
def test_download(self):
response = self.client('...')
f = StringIO.StringIO(response.content)
book = xlrd.open_workbook(file_contents = f.getvalue() )
...
#unit-tests here

Import Error - No module named numpyio

Anyone know how to solve this error?
Exception Type: ImportError
Exception Value: No module named numpyio
See my python code, my imports:
from scipy.io.numpyio import fwrite, fread
Can you help me??
This is becase the scipy.io.numpyio module was removed sometime aftey SciPy 0.7 (see, for example, this thread). From the SciPy Input/Output Cookbook page you can instead use the functions numpy.fromfile and numpy.nadarray.tofile (see under the heading "Raw binary").
While the numpy.ndarray.fromfile() allows you to specify the binary format to read (e.g. 'f' for float), the .tofile() function doesn't have such binary options. This is a highly inconvenient inconsistency for those of us who need to write binary files in a specific format for other software to read. Unfortunately this problem seems to be ignored by the development community as there seems to be no open ticket.
I have created a simple replacement function using the array module. The basic code goes something like this:
def fwrite(filename, formatstring, ndarray):
arr = array.array(formatstring, ndarray.flatten())
f = open(filename, 'w')
arr.tofile(f)
f.close()
So far that seems to work. Obviously this could/should be embellished with error checkes etc.
From the archives:
The I/O functions for numpy arrays have been moved to numpy where it made, or removed when they provided duplicate functionality. Use numpy.load and numpy.save for reading writing arrays in numpy's own .npy format, loadtxt/savetxt for ascii.

Categories