If I go to this website:
https://covid.cdc.gov/covid-data-tracker/#ed-visits
and click the "download" button (on the right), a .csv file is downloaded.
I can't find the address of that csv file, so that I could fetch it automatically with pd.read_csv(). I had a snoop around the web inspector thing, but I don't really know what I'm doing, and nothing jumped out at me as being the obvious answer. I've also looked around various other sites to try to find an API that gives me access to this data, bat there doesn't appear to be such thing.
Can anyone help me with that?
Thanks so much!
You might want to open your web inspector and go to the "Network"-Tab and then reload the page. You are going to see, that there's never a csv actually being loaded.
Also the export button doesn't link to any file. Rather it has some javascript binding, that exports the existing data in your client (the browser) as a csv to your filesystem. In other words: There isn't an address for that file. Its being created in your browser.
So even better, you can read the json directly. Just find the correct data in the Network-Tab, I think it might be this: https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data
So instead you could directly read the json:
pd.read_json('https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data') and then filter for the data that you need.
Related
I'm sorry if this is the wrong place for this, but I not exactly sure where is the right place to pose this issue.
I'm being asked to make a web page or spread sheet with a table of PDFs and add a button where each PDF can be attached to an email. As in, a one-click solution to add the actual PDF into an email, not a link to the file. I don't think this is possible since the file is hosted on the web... BUT...
If the file is hosted locally on our internal server is there a code or function that would be able to do that?
Basically, the company owner is old and doesn't want to have to search for files to add to emails. He wants to just be able to open a list (web, excel, whatever), click on the one(s) he wants, and have them add to an email. I need to be able to manage and update the PDF files on the back end and update the table.
Of course the easy thing is to have a shared folder, but it lacks organization of an easily readable table. Alternately, he want to attach the actual file, not send a link.
I can't think of a way to do this, but plenty of people are smarter than I.
And advice or help is appreciated.
Hi there's a button in the web, if you click it, it'll download a file.
Say the corresponding url is like this
http://www.mydata.com/data/filedownload.aspx?e=MyArgu1&k=kfhk22wykq
If i put this url in the address bar in the browser, it can download the file as well properly.
Now i do this in the python,
urllib.urlretrieve(url, "myData.csv")
The csv file is empty. Any suggestions please ?
This may not be possible with every website. If a link has a token then python is unlikely to be able to use the link as it is tied to your browser.
i need to create a web scraper for this website
However I need to get the links for the counties, stored in the interactive map
Unfortunately, for some reason, their search engine doesn't provide all the results as the interactive map does.
My question:
Could anyone tell me how to get all the links for all the counties, without manually accessing them?
Thanks
Technically you can use a decompiler to do this job.
There are free (e.g.: ActionScript Extractor) and paid (e.g.: Sothink
SWF Decompiler) tools out there.
you can reference this answer
Edit :
Most swf content gets external records from either a .xml or .json file.
Without decompiling and just using the browser's Developer Tools we can see that an xml file is indeed accessed (maybe it contains what you want) :
http://www.allpetservices.co.uk/uk_ir_locator.xml.
Put view-source: in front of the link to read it (if there's an error message).
In that xml you want to extract the contents (the xyz) of each & every <link> xyz </link> tag. This will give you the links of every entry on the map.
The short answer to your question: There's no way to get the links from the site.
The solution: The structure of the links you are trying to retrieve are very predictable. They follow the same structure:
http://www.allpetservices.co.uk/search_map.asp?ccounty={COUNTY_NAME}
So, if you can use another site or data source to get the names of each of the counties, you can formulate each of the links that you need.
There is a stock screening website called finviz. You can set up specific parameters for it to screen, and then there is a button in the bottom right corner that lets you export the results as a .cvs file.
I would like to create a script, in python 2.7, that will download and analyze the file. I imagine I can use urllib2 to access the website, but how can I trigger the export, and then read from that resulting file? Using the standard urllib2.urlopen(url).read(), returns an html file for the entire site, and not the export I need.
So it turns out, at least in this case, the export button is really a link to a different url. So where the screener's url might be: http://finviz.com/screener.ashx?v=111&f=sh_price_u1.
The export version of the url is: http://finviz.com/export.ashx?v=111&f=sh_price_u1.
The second url has the funcitonality of triggering download, so instead of urllib2.urlopen("http://finviz.com/screener.ashx?v=111&f=sh_price_u1").read() I need
urllib2.urlopen("http://finviz.com/export.ashx?v=111&f=sh_price_u1").read()
This one does the job in python. Have a look. https://github.com/nicolamr/trending-value
I am trying to grab a PNG image which is being dynamically generated with JSP in a web service.
I have tried visiting the web page it is contained in and grabbing the image src attribute; but the link leads to a .jsp file. Reading the response with urllib2 just shows a lot of gibberish.
I also need to do this while logged into the web service in question, using mechanize. This seems to exclude the option of grabbing a screenshot with webkit2png or similar.
Thanks for any suggestions.
If you use urllib correctly (for example, making sure your User-Agent resembles a browser etc), the "gibberish" you get back is the actual file, so you just need to write it out to disk (open the file with "wb" for writing in binary mode) and re-read it with some image-manipulation library if you need to play with it. Or you can use urlretrieve to save it directly on the filesystem.
If that's a jsp, chances are that it takes parameters, which might be appended by the browser via javascript before the request is done; you should look at the real request your browser makes, before trying to reproduce it. You can do that with the Chrome Developer Tools, Firefox LiveHTTPHeaders, etc etc.
I do hope you're not trying to break a captcha.