I'm pretty inexperienced with any form of http request more complicated than a basic GET query. I've tried to do research online, but I'm having trouble figuring out where to start because I don't know all the required terminology.
For several years I've worked a side job for a data entry company. Basically what I do is Google several things, find the results from a few specific webpages, and copy those URLs into the company's system. About two years ago I wrote a very basic Python program to do the Googling part for me, and now I want to rewrite it and expand it to do the rest of it as well.
The website uses a combination of POST and PATCH requests to update the information on the database, and because the information is attached to my account I assume there is some form of authentication involved. I don't have access to the system's backend so the best I can do is head to the Network tab under Inspect Element. I can't find anything in the requests' headers that seem to attach to my account.
What do I need to do to authenticate, and if it's not that simple, where's the best place to start learning?
Let me know if you need more information and I'll try to give you what you need--I don't know exactly what's required.
Related
As the title states I'm looking for a way to get backlinks for a given url / website using Google APIs, since I already have an api key and I'd rather use it instead of relying on other services.
I already tested services like ahrefs, majestic, moz, serpstat etc and actually they can give me the infomation I need, but I was wondering if there was a way to do it with Google.
For what I've read during my past researches I saw that Google offered a way to do it, but then it became deprecated, so no more usable. Do they really took away this feature for good?
I've also noticed that Google offers a similar service with his Google Search Console, but it can just be used for your own website, I'd like to get those kind of information for a random given url.
Actually I will be using Python in my project, but I don't think there's a package able to deliver me these kind of data, or at least I looked for it and didn't find anything.
Any help would be appreciated.
I'm currently trying to design a filter that with it I can block certain URLs and also
block based on keywords that may be at the data that came back from the http response.
Just for clarification I'm working on a Windows 10 x64 machine for this project.
In order to be able to do such thing, I quickly understood that I would need a web proxy.
I checked out about 6 proxies written in python that I found on github.
These are the project I tried to use (some are Python3 some 2):
https://github.com/abhinavsingh/proxy.py/blob/develop/proxy.py
https://github.com/inaz2/proxy2/blob/master/proxy2.py
https://github.com/inaz2/SimpleHTTPProxy - this one is the earlier version of the top one
https://github.com/FeeiCN/WebProxy
Abhinavsingh's Proxy (first in the list):
what I want to happen
I want the proxy to be able to block sites based on the requests and the content came back, I also need the filter to be in a separated file and to be generic
so I can apply it on every site and every request/response.
I'd like understand where is the correct place to put a filter on this proxy
and how to do redirect or just send a block page back when the client tries to access
sites that has specific urls, or when the response is a page that contains some keywords.
what I tried
I enabled the proxy on google chrome's 'Open proxy settings'
and executed the script. It looked pretty promising, and I noticed that I can insert a filterer's function call in line 383 at the _process_request function so that I can return
to it maybe another host to redirect to or just block. It worked partially for me.
The problems
First of all, I couldn't fully redirect/block the sites. Sometimes it worked, Sometimes it
didn't.
Another problem I ran into was that I realized that I can't access the content of site that returned, if it is https.
Also, filter the response was unfortunately not clear for me.
I noticed also that proxy2 (the second in the list) can solve that problem I had
of filtering https page's content, but I couldn't find out how to make this feature work (and also I think it requires linux utilities anyhow).
The process I described above was pretty much the one that I tried to work on every proxy in the list. At some proxies, like proxy2.py I couldn't understand at all what I needed to do.
If anybody managed to make a filter on top of this proxy or any other from this list and can help me understand how to do it, I'll be grateful if you'll comment down below.
Thank you.
I am currently pulling data from a public series data from https://www3.bcb.gov.br/expectativas/publico/en/serieestatisticas
This is a public page that uses apache wicket I believe.
I usually am ok with scraping, whether GET or POST. Here I and my colleagues are stuck. Can anyone help understand what URL needs to be used to actually make the request. Here's what I've got so far:
The form with inputs:
The Fiddler capture manually executed:
Text View:
form19_hf_0=&indicador=0&calculo=0&linhaPeriodicidade%3Aperiodicidade=0&tfDataInicial=11%2F10%2F2015&tfDataFinal=11%2F24%2F2015&divPeriodoRefereEstatisticas%3AgrupoAnoReferencia%3AanoReferenciaInicial=16&divPeriodoRefereEstatisticas%3AgrupoAnoReferencia%3AanoReferenciaFinal=16&btnCSV=Generate+CSV
Form data I'm passing in the request:
Summary:
I need some help, I can't seem to get the POST working correctly, it takes me to a different page, and I'm not sure of how to work through this one.
NB: I'm trying to grab back a CSV.
The libraries I'm using are primarily Requests (I was going to use LXML but I don't think its going to be applicable here).
I've been trying to figure out the right form with Postman and Fiddler to understand what the request needs to be.
So,
The solution to this was somewhat indirect. We were not able to do a straight POST because the the page incremented the actual POST url in a way that was generally impossible to predict.
The solution that we used was installing Selenium web driver and using that to simulate the dropdown visible values and button clicks.
This worked out very cleanly.
Thanks and HTH anyone else who might have a similar problem.
I am making a web application that will monitor the amount of members and discussions in each one of the groups listed here (http://www.codecademy.com/groups#web) and display that information in nice graphs.
However, as you have already seen, it looks like I need to create an account and login with it.
Having in mind that my project is using Python for the server side, how do I do it? Which API is easier? (Google, FB or twitter?)
I would really love if you could also provide some examples because I am really new at this (and at Python too).
The official wrapper around the Twitter API for Python is this one. I used it and it's very easy. You should first read this page and also register an application to get OAuth keys.
Example:
import twitter
# Remember to put these values
api = twitter.Api(consumer_key="",
consumer_secret="",
access_token_key="",
access_token_secret="")
# Get your timeline
print api.GetHomeTimeline()
Hope it helps.
I've looked at a lot of questions and libs and didn't found exactly what I wanted. Here's the thing, I'm developing an application in python for a user to get all sorts of things from social networks accounts. I'm having trouble with facebook. I would like, if possible, a step-by-step tutorial on the code and libs to use to get a user's information, from posts to photos information (with the user's login information, and how to do it, because I've had a lot of problem with authentication).
Thank you
I strongly encourage you to use Facebook's own APIs.
First of all, check out documentation on Facebook's Graph API https://developers.facebook.com/docs/reference/api/. If you are not familiar with JSON, DO read a tutorial on it (for instance http://secretgeek.net/json_3mins.asp).
Once you grasp the concepts, start using this API. For Python, there are at several alternatives:
facebook/python-sdk https://github.com/facebook/python-sdk
pyFaceGraph https://github.com/iplatform/pyFaceGraph/
It is also semitrivial to write a simple HTTP client that uses the graph API
I would suggest you to check out the Python libraries, try out the examples in their documentation and see if they are working and do the stuff you need.
Only as a last resort, would I write a scraper and try to extract data with screenscraping (it is much more painful and breaks more easily).
I have not used this with Facebook, but in the past when I had to scrape a site that required login I used Mechanize to handle the login and scraping and Beautiful Soup to parse the resulting HTML.