Trouble retrieving data from kivy's jsonstore - python

I'm having issues retrieving data from a '.json' file if the key contains non-ascii characters.
To explain better i want to illustrate this issue with an example.
Say if i want to save data into a json file as follows
store = JsonStore('example.json')
store.put('André Rose', type = 'sparkling wine', comment = 'favourite')
Then I want to retrieve it as follows
store.get('André Rose')
this returns an error that says:
KeyError: 'Andr\xc3\xa9'
I believe the problem is the non-ascii character " é ".
so my question is how can I save stuffs like this into a json file, and retrieve without getting this key error?

"There is a bug in kivy 1.8.0 under Python 3. When you are using Kivy 1.8.0 and Python 3, URlRequest fails to convert the incoming data to JSON. If you are using this combination you'll need to add:" (Philips, Creating Apps in Kivy)
import json
data = json.loads(data.decode())
I'm not sure if this will help your particular problem, but I thought I might throw it out there.

Related

Why Python is writting the UNICODE code instead the character on a file

I'm making a Python program that (after doing a lot of things haha) creates a HTML file with some of the generated info.
I open a HTML template and then I replace some 'tokens' with the generated info.
The way I open and replace the info is the following:
def getPlantilla():
with open('assets/plantillas/plantilla_proyecto3.html','r') as file:
plantilla = file.read()
return plantilla
def remplazarTokens(plantilla:str,PID,Pmap):
tabla_html = tabulate(Pmap,headers="firstrow",tablefmt='html')
return plantilla.format(PID=PID,TABLA=tabla_html)
But before 'replace the tokens' I generate some HTML code with the generated info with this function:
def crearTrigger(uso,id):
return f"{uso}"
And finally I create the file:
with open(filename,'w',encoding='UTF-8') as file:
file.write(html)
The problem is that in the final .html, the code that was generated with crearTrigger() dosen't works well because some characters are remplaced with the UNICODE code.
Example:
Out: <a href="#heap">Heap</a>
How it should be: Heap
I think that this is a encoding problem, but I had tried to encode it with .encode("utf-8") and still have the same problem.
Hope someone can help me. Thanks
Update: When I was writting the question, I realised that the library tabulate that I using to convert the info into a HTML table, it's creating the problem (Putting the UNICODE code instead the char), because the out's from crearTrigger() are saving in a list, that later tabulate converts into a HTLM table. But I still dont know how to solve it.

How to access remote and encrypted PDF text without writing to local drive [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I am very new to the coding world and have been stuck on this one problem for 3 days now, searching everywhere for an answer, so any help will be greatly appreciated. I am needing to extract a small amount of text from a url-located Pdf file. I'm using sessions.get(chart_PDF) as the driver for locating the URL where chart_PDF is the example url below.
Example url is https://www.airservicesaustralia.com/aip/pending/dap/PADGN01-166_09SEP2021.pdf
I know I am able to write it to my local drive but I don't want to do that, I want to be able to do it remotely, since I only need a couple of numbers from it.
I have tried finding the password from the url page for decrypting, couldn't find. I've tried to use PyPDF2, pdfminer and pikepdf (probably not well).
I only need to retrieve two numbers near the bottom of the PDF that can be used for the rest of my code. Please help, even if it is a simple fix, I'm new to all this and need some help. Thanks.
from io import BytesIO
from pikepdf import Pdf as PDF
from pdfminer import high_level
chart_PDF = https://www.airservicesaustralia.com/aip/pending/dap/PADGN01-166_09SEP2021.pdf
retrieve = s.get(chart_PDF)
content = retrieve.content
response =urllib.request.urlopen(chart_PDF)
p = BytesIO(content)
p.getbuffer()
check = PDFPage.get_pages(p, check_extractable=False)
extract = high_level.extract_text(p)
I'm getting:
PDFTextExtractionNotAllowedWarning: The PDF <_io.BytesIO object at 0x000001B007ABEC20> contains a metadata field indicating that it should not allow text extraction. Ignoring this field and proceeding.warnings.warn(warning_msg, PDFTextExtractionNotAllowedWarning)
Alternately, if I try this:
from pikepdf import Pdf as PDF
from pdfminer.pdfpage import PDFPage
from PyPDF2 import PdfFileReader
new_pdf = PDF.new()
with PDF.open(p) as pdf:
print(len(pdf.pages))
page1 = pdf.pages[0]
if PdfFileReader.getIsEncrypted(pdf):
print(True)
PdfFileReader.decrypt(page1, password='')
pdf.close()
I get:
line 1987, in decrypt
return self._decrypt(password)
AttributeError: _decrypt
UPDATE 3/8/21
Thank you so much K J! You've seriously been a huge help!
from io import BytesIO
from pdfminer.pdfpage import PDFPage
from pdfminer import high_level
retrieve = s.get(chart_PDF)
content = retrieve.content
bytes = BytesIO(content)
bytes.getbuffer()
PDFPage.get_pages(bytes, check_extractable=False)
extract = high_level.extract_text(bytes, password='') #THIS LINE THROWS ERROR
joined = ''.join(extract)
find_txt = re.findall(r'[(]\d*[-]\d[.]\d[)]', joined)
print(find_txt)
bytes.close()
This is now working well and I have been able to pull the numbers that I need (I have basically pulled all numbers from inside brackets off the PDF). I'll sort through that to find which one I need.
Strangely enough, although its giving me what I need, my extract = high_level.extract_text(bytes, password='') line still throws the Warning: (warning_msg, PDFTextExtractionNotAllowedWarning) which is rather annoying. Not sure how this process works but its still letting the info out.
I can't use try except or it skips over it. What is the way around this? how can I stop that error coming up?
FINAL UPDATE
I got around the warning and it works well now.
with warnings.catch_warnings():
warnings.simplefilter("ignore")
extract = high_level.extract_text(bytes)
Cheers fellas for putting up with my ignorance, you've helped so much.
The whole file has to be downloaded to a device via RAM so the blob as a FILE can be parsed at the very END for one OR more %%EOF and the location of page 0 (it gets converted to 1 or i) it could be ANYWHERE IN THE STREAM,.
THEN you can navigate to other sequential numbered pages in the RANDOM order they are built. Any complaints please contact Adobe.
However it is easiest if it is cached as a physical FILE object. If you dont want that on disk use a ram drive for your browser.
Again those two objects at bottom of page one could be anywhere mixed into the content of "page" 99's objects, or otherwise. each letter in a PDF can in its extreme be more than one object anywhere in the file. but a good authoring editor would try to keep them as lines by lines. (there is no such PDF thing as a word or paragraph.)
We can Print the file as Plain Text to see how it is composited and although (secured) that is allowed.
I tried printing from browser with little success but know that can depend on browser system and OS print drivers. Here I have printed the page as text using Acrobat portable, so we can see the sequential offsets of each text block from Left Hand margin JUST LIKE a PDF VIEWER would need to rebuild them.
UPDATE
You said your target is (1380-4.4) to the RIGHT of ALTERNATE but again A PDF has no concept of Left and Right or BEFORE or AFTER so we find IN THIS FILE the variable target is in 2 separate pieces PRIOR to the KNOWN characters which luckily is a complete single block (alternate). Thus here proximity of plain text could well work if the capture is confined to that nearby locality. However there is no guarantee that ALTERNATE would always be a single block.
It was perhaps not a good Idea To show the way a Printer would be given a stream of sequential data
Here is the way one PDF viewer goes about decrypting the file
As stated on this occasion the word ALTERNATE is defined as text however the next item is the "3" under "B" which is text as a vector path it is not called a "character" although it looks like one but a numbered glyph from a font table. We do see later that some of those numbers are stored as "text" and for your target it is mixed in with similar text in the same object.
Thus you need to call a PDF interpreter to give you a meaningful translation of all bits and pieces of objects so that you can extract the "right" text.
The easiest way for a "simple" one line target in a complex file is to use MuPDF to first tidy up the file
mutool clean -gggg -D infile.pdf outfile.pdf
combined with
PDFTOTXT -layout outfile.pdf outfile.txt
or similar to hopefully export that text on a line by line basis, such that you can consistently find your target instantly before ! or after ALTERNATE.
N.B Mutool convert to HTML would place the target value in a table entry AFTER the key word, and if the lines are consistent in number that would be a simpler way to find or grep.

Reading old HDF5 stores created by pandas

I'm having some trouble reading and old HDF5 file that I made with pandas in python 2.7.
At the time I was using the to_hdf method to append groups to the file (e.g. db.to_hdf('File.h5', 'groupNameA', mode='a', data_columns=True, format='table'))
Now when I open the store and get the keys of the groups I find that each one has a slash added to the name ('/groupNameA' in the example above). Attempting to access those groups with store['/groupNameA'], store.select('/groupNameA'), etc. produces TypeError: getattr(): attribute name must be string. Getting that error seems correct (slashes should not be used in these keys) but that doesn't help me get my data into a python 3 environment.
If there's a way to get around this problem in python 3, that'd be great.
Alternatively, I can still load the data in my 2.7 environment. So changing the code for writing the store so that slashes don't get added would probably solve the issue as well.

Trying to use json load with txt file

import json
f=open("99_jiayi.txt",'r',encoding = 'utf-8-sig')
a=json.load(f)
f.close()
https://i.stack.imgur.com/vD3M5.png
https://i.stack.imgur.com/QP0oW.png
I tried to turn the text file into a list in order to analyze the data
I used the "json load" and it worked on another file which is written in the same mode
But when i want to use it on another file it comes out the error
i searched google for a lot of time but still cant get the ans
Hope someone can help me with this question
i have some problem to express my thought with eng so if anyone cant understand what i am typing plz let me know tks!!
The error message points to "1":NR, which does not look like valid JSON, so it seems like a valid error.
Edit: try putting all NR within quotes.

Python reading responses and validating the result

currently I am stuck with being able to print out the result gotten from the API, but not being able to alter nor read them without parsing it into a text file.
Furthermore, I wouldn't need all of the information that the API provides and would be great if I can only have the match_id.
The response from the API:Result.
From the result I would only need the match_id and after I have gotten the match_id, I would compare it with a list of string e.g. 3238829394, 3238829395 and more, to check whether does any of the value are similar to mine, and if it's similar, the system would then alert me
I have found a way of doing it by passing the results into a text file, then comparing it with the list that I have.
The code for getting the response:
import dota2api
import json
import requests
api = dota2api.Initialise("[Value API][2]")
reponse = api.get_match_history_by_seq_num(start_at_match_seq_num=2829690055, matches_requested=1)
response = str(hist)
f = open('myfile.txt', 'w')
f.write(response)
f.close()
However I am hoping to find a faster and better way to do this process, as it is very time consuming and unstable. Thank you.
You are getting a JSON file back from that API. In python all data can be accessed directly without parsing it.
The response will be something like (sorry, but in that image I cannot copy paste to read the JSON properly):
for match in response['matches']:
if is_similar(match['match_id']):
do_something_cool_here
I think that should do what you need. If you give the answer as string I can help you building the code properly, but I guess you get the idea of what I am trying to say there :)
Hope it helps!
EDIT:
We talked by private and this works:
import dota2api
import requests
api = dota2api.Initialise("API_KEY")
response = api.get_match_history_by_seq_num(start_at_match_seq_num=SEQ_NUM, matches_requested=1)
match_id_check = MATCH_ID
for match in response['matches']:
if match_id_check == match['match_id']:
print(match)
with API_KEY, SEQ_NUM and MATCH_ID to configure

Categories