Python basics - request data from API and write to a file - python

I am trying to use "requests" package and retrieve info from Github, like the Requests doc page explains:
import requests
r = requests.get('https://api.github.com/events')
And this:
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
I have to say I don't understand the second code block.
filename - in what form do I provide the path to the file if created? where will it be saved if not?
'wb' - what is this variable? (shouldn't second parameter be 'mode'?)
following two lines probably iterate over data retrieved with request and write to the file
Python docs explanation also not helping much.
EDIT: What I am trying to do:
use Requests to connect to an API (Github and later Facebook GraphAPI)
retrieve data into a variable
write this into a file (later, as I get more familiar with Python, into my local MySQL database)

Filename
When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.
Mode
The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.
The Loop
This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.
Conclusion
The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:
import requests
r = requests.get('https://api.github.com/events')
with open('events.txt','w') as fd:
fd.write(r.text)

filename is a string of the path you want to save it at. It accepts either local or absolute path, so you can just have filename = 'example.html'
wb stands for WRITE & BYTES, learn more here
The for loop goes over the entire returned content (in chunks incase it is too large for proper memory handling), and then writes them until there are no more. Useful for large files, but for a single webpage you could just do:
# just W becase we are not writing as bytes anymore, just text.
with open(filename, 'w') as fd:
fd.write(r.content)

Related

Can't open and read content of an uploaded zip file with FastAPI

I am currently developing a little backend project for myself with the Python Framework FastAPI. I made an endpoint, where the user should be able to upload 2 files, while the first one is a zip-file (which contains X .xmls) and the latter a normal .xml file.
The code is as follows:
#router.post("/sendxmlinzip/")
def create_upload_files_with_zip(files: List[UploadFile] = File(...)):
if not len(files) == 2:
raise Httpex.EXPECTEDTWOFILES
my_file = files[0].file
zfile = zipfile.ZipFile(my_file, 'r')
filelist = []
for finfo in zfile.infolist():
print(finfo)
ifile = zfile.open(finfo)
line_list = ifile.readlines()
print(line_list)
This should print the content of the files, that are in the .zip file, but it raises the Exception
AttributeError: 'SpooledTemporaryFile' object has no attribute 'seekable'
In the row ifile = zfile.open(finfo)
Upon approximately 3 days research with a lot of trial and error involved, trying to use different functions such as .read() or .extract(), I gave up. Because the python docs literally state, that this should be possible in this way...
For you, who do not know about FastAPI, it's a backend fw for Restful Webservices and is using the starlette datastructure for UploadFile. Please forgive me, if I have overseen something VERY obvious, but I literally tried to check every corner, that may have been the possible cause of the error such as:
Check, whether another implementation is possible
Check, that the .zip file is correct
Check, that I attach the correct file (lol)
Debug to see, whether the actual data, that comes to the backend is indeed the .zip file
This is a known Python bug:
SpooledTemporaryFile does not fully satisfy the abstract for IOBase.
Namely, seekable, readable, and writable are missing.
This was discovered when seeking a SpooledTemporaryFile-backed lzma
file.
As #larsks suggested in his comment, I would try writing the contents of the spooled file to a new TemporaryFile, and then operate on that. As long as your files aren't too large, that should work just as well.
This is my workaround
with zipfile.ZipFile(io.BytesIO(file.read()), 'r') as zip:

Requests.get function not retrieving any data. What's wrong?

Trying to loop through some license ids to get data from a website. Example: when I enter id "E09OS0018" in the search box, I get a list of one school/daycare. But when I type the following code in my python script (website link and arguments obtained from developer tools), I get no data in the file. What's wrong with this requests.get() command? If I should use requests.post() instead, what arguments would I use with the requests.post() command (not very familiar with this approach).
flLicenseData = requests.get('https://cares.myflfamilies.com/PublicSearch/SuggestionSearch?text=E09OS0018&filter%5Bfilters%5D%5B0%5D%5Bvalue%5D=e09os0018&filter%5Bfilters%5D%5B0%5D%5Boperator%5D=contains&filter%5Bfilters%5D%5B0%5D%5Bfield%5D=&filter%5Bfilters%5D%5B0%5D%5BignoreCase%5D=true&filter%5Blogic%5D=and')
openFile = open('fldata', 'wb')
for chunk in flLicenseData.iter_content(100000):
openFile.write(chunk)
do openFile.flush() before checking the file's content.
Most likely, you are reading the file immediately before the contents are (Actually) written to the file.
There could be a lag between the contents writen to the file handler and contents actually transfered to the physical file, Due to the levels of buffers between the programming language API, OS and the actual physical file.
use openFile.flush() to ensure that the data is written into the file.
An excellent explanation of flush can be found here.
Or alternatively close the open file with openFile.close() or use a context manager
with open('fldata', 'wb') as open_file:
for chunk in flLicenseData.iter_content(100000):
openFile.write(chunk)

How to prevent multi python scripts to overwrite same file?

I use multiple python scripts that collect data and write it into one single json data file.
It is not possible to combine the scripts.
The writing process is fast and it happens often that errors occur (e.g. some chars at the end duplicate), which is fatal, especially since I am using json format.
Is there a way to prevent a python script to write into a file if there are other script currently trying to write into the file? (It would be absolutely ok, if the data that the python script tries to write into the file gets lost, but it is important that the file syntax does not get somehow 'injured'.)
Code Snipped:
This opens the file and retrieves the data:
data = json.loads(open("data.json").read())
This appends a new dictionary:
data.append(new_dict)
And the old file is overwritten:
open("data.json","w").write( json.dumps(data) )
Info: data is a list which contains dicts.
Operating System: The hole process takes place on linux server.
On Windows, you could try to create the file, and bail out if an exception occurs (because file is locked by another script). But on Linux, your approach is bound to fail.
Instead, I would
write one file per new dictionary, suffixing filename by process ID and a counter
consuming process(es) don't read a single file, but the sorted files (according to modification time) and build the data from it
So in each script:
filename = "data_{}_{}.json".format(os.getpid(),counter)
counter+=1
open(filename ,"w").write( json.dumps(new_dict) )
and in the consumers (reading each dict of sorted files in a protected loop):
files = sorted(glob.glob("*.json"),key=os.path.getmtime())
data = []
for f in files:
try:
with open(f) as fh:
data.append(json.load(fh))
except Exception:
# IO error, malformed json file: ignore
pass
I will post my own solution, since it works for me:
Every single python script checks (before opening and writing the data file) whether a file called data_check exists. If so, the pyhthon script does not try to read and write the file and dismisses the data, that was supposed to be written into the file. If not, the python script creates the file data_check and then starts to read and wirte the file. After the writing process is done the file data_check is removed.

cPickle.load( ) error

I am working with cPickle for the purpose to convert the structure data into datastream format and pass it to the library. The thing i have to do is to read file contents from manually written file name "targetstrings.txt" and convert the contents of file into that format which Netcdf library needs in the following manner,
Note: targetstrings.txt contains latin characters
op=open("targetstrings.txt",'rb')
targetStrings=cPickle.load(op)
The Netcdf library take the contents as strings.
While loading a file it stuck with the following error,
cPickle.UnpicklingError: invalid load key, 'A'.
Please tell me how can I rectify this error, I have googled around but did not find an appropriate solution.
Any suggestions,
pickle is not for reading/writing generic text files, but to serialize/deserialize Python objects to file. If you want to read text data you should use Python's usual IO functions.
with open('targetstrings.txt', 'r') as f:
fileContent = f.read()
If, as it seems, the library just wants to have a list of strings, taking each line as a list element, you just have to do:
with open('targetstrings.txt', 'r') as f:
lines=[l for l in f]
# now in lines you have the lines read from the file
As stated - Pickle is not meant to be used in this way.
If you need to manually edit complex Python objects taht are to be read and passed as Python objects to another function, there are plenty of other formats to use - for example XML, JSON, Python files themselves. Pickle uses a Python specific protocol, that while note being binary (in the version 0 of the protocol), and not changing across Python versions, is not meant for this, and is not even the recomended method to record Python objects for persistence or comunication (although it can be used for those purposes).

How to copy a JSON file in another JSON file, with Python

I want to copy the contents of a JSON file in another JSON file, with Python
Any ideas ?
Thank you :)
Given the lack of research effort, I normally wouldn't answer, but given the poor suggestions in comments, I'll bite and give a better option.
Now, this largely depends on what you mean, do you wish to overwrite the contents of one file with another, or insert? The latter can be done like so:
with open("from.json", "r") as from, open("to.json", "r") as to:
to_insert = json.load(from)
destination = json.load(to)
destination.append(to_insert) #The exact nature of this line varies. See below.
with open("to.json", "w") as to:
json.dump(to, destination)
This uses python's json module, which allows us to do this very easily.
We open the two files for reading, then open the destination file again in writing mode to truncate it and write to it.
The marked line depends on the JSON data structure, here I am appending it to the root list element (which could not exist), but you may want to place it at a particular dict key, or somesuch.
In the case of replacing the contents, it becomes easier:
with open("from.json", "r") as from, open("to.json", "w") as to:
to.write(from.read())
Here we literally just read the data out of one file and write it into the other file.
Of course, you may wish to check the data is JSON, in which case, you can use the JSON methods as in the first solution, which will throw exceptions on invalid data.
Another, arguably better, solution to this could also be shutil's copy methods, which would avoid actually reading or writing the file contents manually.
Using the with statement gives us the benefit of automatically closing our files - even if exceptions occur. It's best to always use them where we can.
Note that in versions of Python before 2.7, multiple context managers are not handled by the with statement, instead you will need to nest them:
with open("from.json", "r") as from:
with open("to.json", "r+") as to:
...

Categories