Trigger a websites file download with python - python

There is a stock screening website called finviz. You can set up specific parameters for it to screen, and then there is a button in the bottom right corner that lets you export the results as a .cvs file.
I would like to create a script, in python 2.7, that will download and analyze the file. I imagine I can use urllib2 to access the website, but how can I trigger the export, and then read from that resulting file? Using the standard urllib2.urlopen(url).read(), returns an html file for the entire site, and not the export I need.

So it turns out, at least in this case, the export button is really a link to a different url. So where the screener's url might be: http://finviz.com/screener.ashx?v=111&f=sh_price_u1.
The export version of the url is: http://finviz.com/export.ashx?v=111&f=sh_price_u1.
The second url has the funcitonality of triggering download, so instead of urllib2.urlopen("http://finviz.com/screener.ashx?v=111&f=sh_price_u1").read() I need
urllib2.urlopen("http://finviz.com/export.ashx?v=111&f=sh_price_u1").read()

This one does the job in python. Have a look. https://github.com/nicolamr/trending-value

Related

How can I find the address of this web downloadable .csv file?

If I go to this website:
https://covid.cdc.gov/covid-data-tracker/#ed-visits
and click the "download" button (on the right), a .csv file is downloaded.
I can't find the address of that csv file, so that I could fetch it automatically with pd.read_csv(). I had a snoop around the web inspector thing, but I don't really know what I'm doing, and nothing jumped out at me as being the obvious answer. I've also looked around various other sites to try to find an API that gives me access to this data, bat there doesn't appear to be such thing.
Can anyone help me with that?
Thanks so much!
You might want to open your web inspector and go to the "Network"-Tab and then reload the page. You are going to see, that there's never a csv actually being loaded.
Also the export button doesn't link to any file. Rather it has some javascript binding, that exports the existing data in your client (the browser) as a csv to your filesystem. In other words: There isn't an address for that file. Its being created in your browser.
So even better, you can read the json directly. Just find the correct data in the Network-Tab, I think it might be this: https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data
So instead you could directly read the json:
pd.read_json('https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData?id=ed_trend_data') and then filter for the data that you need.

urllib.urlretrieve() cannot download the file

Hi there's a button in the web, if you click it, it'll download a file.
Say the corresponding url is like this
http://www.mydata.com/data/filedownload.aspx?e=MyArgu1&k=kfhk22wykq
If i put this url in the address bar in the browser, it can download the file as well properly.
Now i do this in the python,
urllib.urlretrieve(url, "myData.csv")
The csv file is empty. Any suggestions please ?
This may not be possible with every website. If a link has a token then python is unlikely to be able to use the link as it is tied to your browser.

Trying to get links of an interactive map (Web scraping .swf)

i need to create a web scraper for this website
However I need to get the links for the counties, stored in the interactive map
Unfortunately, for some reason, their search engine doesn't provide all the results as the interactive map does.
My question:
Could anyone tell me how to get all the links for all the counties, without manually accessing them?
Thanks
Technically you can use a decompiler to do this job.
There are free (e.g.: ActionScript Extractor) and paid (e.g.: Sothink
SWF Decompiler) tools out there.
you can reference this answer
Edit :
Most swf content gets external records from either a .xml or .json file.
Without decompiling and just using the browser's Developer Tools we can see that an xml file is indeed accessed (maybe it contains what you want) :
http://www.allpetservices.co.uk/uk_ir_locator.xml.
Put view-source: in front of the link to read it (if there's an error message).
In that xml you want to extract the contents (the xyz) of each & every <link> xyz </link> tag. This will give you the links of every entry on the map.
The short answer to your question: There's no way to get the links from the site.
The solution: The structure of the links you are trying to retrieve are very predictable. They follow the same structure:
http://www.allpetservices.co.uk/search_map.asp?ccounty={COUNTY_NAME}
So, if you can use another site or data source to get the names of each of the counties, you can formulate each of the links that you need.

starting with python scripts

i need to write a python script , the script should access a webpage , which has a "upload" button , normally when you upload a photo with that button a new page opens . and once that page opens i need to look for a string there
so the script should upload there a photo , which i provide to the script and then check the output page for a string
i have no background in that sort of coding (i know basic python ) .
can i get a reference or some pointers on what reading should i do to perform that task? thank you very much
While this question is not specific enough to give you a good answer, I can make a couple of suggestions. I would look into using a library for sending requests to pages, such as requests. I would also look into libraries for parsing html, such as Beautiful Soup. Essentially you will need to use requests to get the page's html, and then you'll need to parse that html using Beautiful Soup to find what you're looking for on the page.
You should do some reading about these libraries and/or other similar ones and try to get a better understanding of your problem. Afterward, come back to Stack Overflow once you have more specific questions or problems you've run into.

Python code for navigating through existing url

I am searching for python code to navigate through url and download the resultant file i.e. The site has multiple selections to make before i get the file to be downloaded, I want all those choosing process to be done inside a code and download the resultant file
You should definitely use Python Mechanize, which implements commands like "click link" or "fill form".

Categories