Requests.get function not retrieving any data. What's wrong? - python

Trying to loop through some license ids to get data from a website. Example: when I enter id "E09OS0018" in the search box, I get a list of one school/daycare. But when I type the following code in my python script (website link and arguments obtained from developer tools), I get no data in the file. What's wrong with this requests.get() command? If I should use requests.post() instead, what arguments would I use with the requests.post() command (not very familiar with this approach).
flLicenseData = requests.get('https://cares.myflfamilies.com/PublicSearch/SuggestionSearch?text=E09OS0018&filter%5Bfilters%5D%5B0%5D%5Bvalue%5D=e09os0018&filter%5Bfilters%5D%5B0%5D%5Boperator%5D=contains&filter%5Bfilters%5D%5B0%5D%5Bfield%5D=&filter%5Bfilters%5D%5B0%5D%5BignoreCase%5D=true&filter%5Blogic%5D=and')
openFile = open('fldata', 'wb')
for chunk in flLicenseData.iter_content(100000):
openFile.write(chunk)

do openFile.flush() before checking the file's content.
Most likely, you are reading the file immediately before the contents are (Actually) written to the file.
There could be a lag between the contents writen to the file handler and contents actually transfered to the physical file, Due to the levels of buffers between the programming language API, OS and the actual physical file.
use openFile.flush() to ensure that the data is written into the file.
An excellent explanation of flush can be found here.
Or alternatively close the open file with openFile.close() or use a context manager
with open('fldata', 'wb') as open_file:
for chunk in flLicenseData.iter_content(100000):
openFile.write(chunk)

Related

Problems with requests Python 3 retrieving an excel file from a WP site

Here's my problem, I am trying to download excel xlsx files from a WP site, if I type
the url I assigned to a variable in my code, called stock, directly in browser, Firefox downloads it perfectly.
I'm trying to do this with Python so I've made a script using requests and then Pandas for processing and manipulation.
However even though the file seems to download it returns an error, I tried using both open and with open as suggested on similar problems I've found here, but in my case it returns an error 'ValueError: Seek of closed file', I attempted several variations to the code, with no result, the outcome was always the error.
Here is my code
import pandas as pd
import requests, os
import http.client
http.client.HTTPConnection._http_vsn = 10
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
# Url of the same link I used to manually fetch the file
stock = 'https://filmar.com/wp-content/uploads/2021/05/Apple-Lot-5-14-21.xlsx'
resp = requests.get(stock) # passed the GET method to the http request with the URL
print("Downloading...") # This works
# When I try to retrieve the file it fails
with open('Apple-Lot-5-14-21.xlsx', 'wb') as output:
output.write(resp.content)
print('The file has been downloaded') # this is printed
# The error happens when I try to assign the file to the pd.read_excel method in Pandas
apple = pd.read_excel(output)
Addendum
After entering the code resp - objectprovided by #MattDMo, apparently there's a permission problem or something, because upon analysis of the response object, models.response it returned a 404, not found, so either it's a protection or some redirection that takes place on the server, so requests retrieves an empty file.
You can't pass output to pd.read_excel(), because when the with context manager exits, the reference to the file (output) is destroyed. One option here, if you don't really need to save the Excel file for anything else, is to pass resp.content directly to read_excel(). Alternatively, if you want the Excel file for backup or other purposes, create a filename variable like so:
xl_file = 'Apple-Lot-5-14-21.xlsx'
then use that variable both when you're calling with open(... and when you're calling read_excel(), as that function can take both file names and file-like objects.
As an extra note, I'm not sure why you're using http.client, as requests doesn't look at any of those values to my knowledge.

Can't open and read content of an uploaded zip file with FastAPI

I am currently developing a little backend project for myself with the Python Framework FastAPI. I made an endpoint, where the user should be able to upload 2 files, while the first one is a zip-file (which contains X .xmls) and the latter a normal .xml file.
The code is as follows:
#router.post("/sendxmlinzip/")
def create_upload_files_with_zip(files: List[UploadFile] = File(...)):
if not len(files) == 2:
raise Httpex.EXPECTEDTWOFILES
my_file = files[0].file
zfile = zipfile.ZipFile(my_file, 'r')
filelist = []
for finfo in zfile.infolist():
print(finfo)
ifile = zfile.open(finfo)
line_list = ifile.readlines()
print(line_list)
This should print the content of the files, that are in the .zip file, but it raises the Exception
AttributeError: 'SpooledTemporaryFile' object has no attribute 'seekable'
In the row ifile = zfile.open(finfo)
Upon approximately 3 days research with a lot of trial and error involved, trying to use different functions such as .read() or .extract(), I gave up. Because the python docs literally state, that this should be possible in this way...
For you, who do not know about FastAPI, it's a backend fw for Restful Webservices and is using the starlette datastructure for UploadFile. Please forgive me, if I have overseen something VERY obvious, but I literally tried to check every corner, that may have been the possible cause of the error such as:
Check, whether another implementation is possible
Check, that the .zip file is correct
Check, that I attach the correct file (lol)
Debug to see, whether the actual data, that comes to the backend is indeed the .zip file
This is a known Python bug:
SpooledTemporaryFile does not fully satisfy the abstract for IOBase.
Namely, seekable, readable, and writable are missing.
This was discovered when seeking a SpooledTemporaryFile-backed lzma
file.
As #larsks suggested in his comment, I would try writing the contents of the spooled file to a new TemporaryFile, and then operate on that. As long as your files aren't too large, that should work just as well.
This is my workaround
with zipfile.ZipFile(io.BytesIO(file.read()), 'r') as zip:

How to prevent multi python scripts to overwrite same file?

I use multiple python scripts that collect data and write it into one single json data file.
It is not possible to combine the scripts.
The writing process is fast and it happens often that errors occur (e.g. some chars at the end duplicate), which is fatal, especially since I am using json format.
Is there a way to prevent a python script to write into a file if there are other script currently trying to write into the file? (It would be absolutely ok, if the data that the python script tries to write into the file gets lost, but it is important that the file syntax does not get somehow 'injured'.)
Code Snipped:
This opens the file and retrieves the data:
data = json.loads(open("data.json").read())
This appends a new dictionary:
data.append(new_dict)
And the old file is overwritten:
open("data.json","w").write( json.dumps(data) )
Info: data is a list which contains dicts.
Operating System: The hole process takes place on linux server.
On Windows, you could try to create the file, and bail out if an exception occurs (because file is locked by another script). But on Linux, your approach is bound to fail.
Instead, I would
write one file per new dictionary, suffixing filename by process ID and a counter
consuming process(es) don't read a single file, but the sorted files (according to modification time) and build the data from it
So in each script:
filename = "data_{}_{}.json".format(os.getpid(),counter)
counter+=1
open(filename ,"w").write( json.dumps(new_dict) )
and in the consumers (reading each dict of sorted files in a protected loop):
files = sorted(glob.glob("*.json"),key=os.path.getmtime())
data = []
for f in files:
try:
with open(f) as fh:
data.append(json.load(fh))
except Exception:
# IO error, malformed json file: ignore
pass
I will post my own solution, since it works for me:
Every single python script checks (before opening and writing the data file) whether a file called data_check exists. If so, the pyhthon script does not try to read and write the file and dismisses the data, that was supposed to be written into the file. If not, the python script creates the file data_check and then starts to read and wirte the file. After the writing process is done the file data_check is removed.

Python basics - request data from API and write to a file

I am trying to use "requests" package and retrieve info from Github, like the Requests doc page explains:
import requests
r = requests.get('https://api.github.com/events')
And this:
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
I have to say I don't understand the second code block.
filename - in what form do I provide the path to the file if created? where will it be saved if not?
'wb' - what is this variable? (shouldn't second parameter be 'mode'?)
following two lines probably iterate over data retrieved with request and write to the file
Python docs explanation also not helping much.
EDIT: What I am trying to do:
use Requests to connect to an API (Github and later Facebook GraphAPI)
retrieve data into a variable
write this into a file (later, as I get more familiar with Python, into my local MySQL database)
Filename
When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.
Mode
The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.
The Loop
This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.
Conclusion
The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:
import requests
r = requests.get('https://api.github.com/events')
with open('events.txt','w') as fd:
fd.write(r.text)
filename is a string of the path you want to save it at. It accepts either local or absolute path, so you can just have filename = 'example.html'
wb stands for WRITE & BYTES, learn more here
The for loop goes over the entire returned content (in chunks incase it is too large for proper memory handling), and then writes them until there are no more. Useful for large files, but for a single webpage you could just do:
# just W becase we are not writing as bytes anymore, just text.
with open(filename, 'w') as fd:
fd.write(r.content)

how to delete a tempfile later

On python 2.7, I am currently using the following code to send data via a post request to a webpage (unfortunately, I cannot really change this). I prepare a string data which I prepare according to http://everydayscripting.blogspot.co.at/2009/09/python-jquery-open-browser-and-post.html, then write it to a file, and then open the file with webbrowser.open:
f = tempfile.NamedTemporaryFile(delete=False)
f.write(data)
f.close()
webbrowser.open(f.name)
time.sleep(1)
f.unlink(f.name)
However, I had to learn that sleeping a little sometimes is a little too little: I might delete the file before the data were submitted.
How can I avoid this?
One idea is, of course, to delete the file later, but when could this be? The whole thing is a method in a class - is there a method that is relieably executed on destruction? Or is it somehow possible to start the browser in a way so that it does not return, until the tab is closed?

Categories