Delete cookies / end session using cookiejar - python

I have written a python script fetching a State of Health web page from some hardware we are using. Looking into the HTML and javascript files, it was relatively easy to fetch what I wanted. The problem is that I am not able to end the session and delete the cookies so that the web interface is usable for other users before my session times out. I have no possibility to change anything on the server side.
What I am doing is basically:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
url=mydataurl
headers={"Authorization:"username:encrytptedpassword","Cookie":"user=username; password="encryptedpassword"}
# Both Authorization and Cookie need to be set to go further
data="something"
req=urllib2.Request(url,data,headers)
connection=urllib2.urlopen(req)
response=connection.read()
# Now I have what I want in response and can work on that.
# But the server thinks I am still active and does not let anybody else in
# So I call what is called when I press logout on the web page:
url=logouturl
headers={}
data=""
req=urllib2.Request(url,None,headers)
connection=urllib2.urlopen(req)
logoutresponse=connection.read()
#and just in case
headers={}
cj.clear()
cj.clear_session_cookies()
url="http://myserver/index.htm"
req=urllib2.Request(url,None,headers)
connection=urllib2.urlopen(req)
logoutresponse=connection.read()
connection.close()
Am I doing something wrong to get rid of the cookies in this session? I have also tried to close all the three connections I have started but to no avail.
I can open the web page in a browser on the computer I am running the script on, then log out and immediately after opening it on another computer. If I run the script I have to wait some minutes to something times out on the server before being able to log on again.
It might, of course, be that the server is doing something else as well to keep the session alive, if so I may be out of luck.
I prefer to use built-in libraries and it is not possible for me at the moment to upgrade to use python 3.

A common way to "delete" cookies is to simply set the expiration date of the cookie to a time in the past (many systems can just set it to time=0 which should work). I am not familiar with cookiejar, but you might want to look into this method.

Related

Using Python library pyodata to access data in Odata

So, I am trying to use the pyodata library in Python to access and download data from Odata.
I tried accessing the Northwind data and it worked. So, i guess the codes i used is ok.
import requests
import pyodata
url_t = 'http://services.odata.org/V2/Northwind/Northwind.svc'
# connection set up
northwind = pyodata.Client(url_t, requests.Session())
# This prints out a single value from the table Customers
for customer in northwind.entity_sets.Customers.get_entities().execute():
print(customer.CustomerID,",", customer.CompanyName)
break
# This will print out - ALFKI , Alfreds Futterkiste
I also tried connecting to Odata in excel to see if the codes above return the correct data, and it did.
Click to see the screenshot in excel for Odata connection
Now, using the same code to connect to the data source where I want to pull the data did not work:
#using this link to connect to Odata worked.
url_1 = 'https://batch.decisionkey.npd.com/odata/dkusers'
session = requests.Session()
session.auth = (user_name, psw)
theservice = pyodata.Client(url_1, session)
The above codes return this error message(is it something about security?):
Click to see error message
Connecting to the data in excel looks like this:
Click the view image
I am thinking about it might be security issue that is blocking me from accessing the data, or it could be something else. Please let me know if anything need to be clarify. Thanks.
First time asking question, so please let me know if anything I did not do right here. ^_^
You got HTTP 404 - Not Found.
The service "https://batch.decisionkey.npd.com/odata/dkusers" is not accessible from outside world for me to try it, so there is something more from networking point of view that happens in the second picture in the Excel import.
You can forget the pyodata at the moment, for your problem it is just wrapper around HTTP networking layer, the Requests library. You need to find a way initialize the Requests session in a way, that will return HTTP 200 OK instead.
Northwind example service is just plain and simple, so no problem during initialization of pyodata.Client
Refer to Requests library documentation- https://docs.python-requests.org/en/latest/user/advanced/
//sample script
url_1 = 'https://batch.decisionkey.npd.com/odata/dkusers'
session = requests.Session()
session.auth = (user_name, psw)
//??? SSL certificate needs to be provided perhaps?
//?? or maybe you are behind some proxy that Excel uses but python not.. try ping in CMD
response = session.get(url_1)
print(response.text)
Usable can be pyodata documentation about initialization, however you will not find there the reason why you get HTTP 404 - https://pyodata.readthedocs.io/en/latest/usage/initialization.html

Can't bypass cloudflare with python cloudscraper

I faced with cloudflare issue when I tried to parse the website.
I got this code
import cloudscraper
url = "https://author.today"
scraper = cloudscraper.create_scraper()
print(scraper.post(url).status_code)
This code prints me
cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.
I searched for workaround, but couldn't find any solution. If visit the website via a browser you could see
Checking your browser before accessing author.today.
Is there any solution to bypass cloudflare in my case?
Install httpx
pip3 install httpx[http2]
Define http2 client
client = httpx.Client(http2=True)
Make request
response = client.get("https://author.today")
Cheers!
Although for this site is does not seem to work, sometimes adding some parameters when initializing the scraper helps:
import cloudscraper
url = "https://author.today"
scraper = cloudscraper.create_scraper(
browser={
'browser': 'chrome',
'platform': 'android',
'desktop': False
}
)
print(scraper.post(url).status_code)
import cfscrape
from fake_useragent import UserAgent
ua = UserAgent()
s = cfscrape.create_scraper()
k = s.post("https://author.today", headers = {"useragent": f"{ua.random}"})
print(k)
I'd try to create a Playwright scraper that mimics a real user, this works for me most of the time, just need to find the right settings (they can vary from website to website).
Otherwise, if the website has a native App, try to figure out how the App behaves and then mimic it.
I can suggest such workflow to "try" to avoid Cloudflare WAF/bot mitigation:
don't cycle user agents, proxies or weird tunnels to surf
don't use fixed ip addresses, better leased lines like xDSL, home links and 4G/LTE
try to appear as mobile instead of a desktop/tablet
try to reproduce pointer movements like never before AKA record your mouse moves and migrate them 1:1 while scraping (yes u need JS enabled and some headless browser able to make up as "common" one)
don't cycle against different Cloudflare protected entities otherwise the attacker ip will be greylisted in a minute (AKA build your own targets blacklist, never touch such entities or you will go in the CF blacklist flawlessy)
try to reproduce a real life navigation in all aspects, including errors, waitings and more
check your used ip after any scrape against popular blacklists otherwise bad errors will shortly appears (crowdsec is a good starting point)
the usual scrape is a googlebot scrape, a single regex WAF rule on CLoudflare will block 99,99% of the tries then.. avoid to fake as google and try to be LESS evil instead (ex: asking webmasters for APIs or data export if any).
Source: I use Cloudflare with hundreds of domains and thousands of records (Enterprise) from the beginning of the company.
That way you will be closer to the point (and you will help them increasing the overall security).
I used this line:
scraper = cloudscraper.create_scraper(browser={'browser': 'chrome','platform': 'windows','mobile': False})
and then used httpx package after that
with httpx.Client() as s:
//Remaining Code
And I was able to bypass the issue cloudscraper.exceptions.CloudflareChallengeError: Detected a Cloudflare version 2 challenge, This feature is not available in the opensource (free) version.

Session variables like PHP in Python

i'm making a web application with python and I want to save some variables for the session, till the browser closes, like I would do with PHP:
<?php
session_start();
$_SESSION['size']='small';
?>
what's an easy yet safe way?
I'm using both lighttpd and apache so I want something that'll work with both.
also i there will be passwords saved so i need something safe.
When using session_start() in PHP, you are not using "pure code" either, it's also smoke and mirrors...
Leaving out all the caveats: What you can do is using a global dictionary to store session data. Once a client makes a request and passes the "session"-cookie, you look up all the session data in that dictionary. If there is no entry or the client has no session-cookie, you create a new session and pass the cookie to the client.
The session-cookie is made of a random, say sixteen character, string. Other clients are unable to guess another user's session because the keyspace is too large. From time to time, you prune the dictionary from session your server has not seen in a while.
You should really take a look at CherryPy's documentation on using sessions though.
I decided to do it with cookies, which is easier/safer. Here's the code for everyone interested:
# importing the libs
from http import cookies
import os
# setting the cookies
C = cookies.SimpleCookie()
C["cookie1"] = "some_text"
C["cookie2"] = "another_text"
print(C.output())
# sending the html header
print('Content-type: text/html;\n')
# reading the "cookie1" cookie
cookievalue = cookies.SimpleCookie(os.environ["HTTP_COOKIE"])
print (cookievalue["cookie1"].value)

How to send cookie and phpssid with urllib2 in python?

I wonder how can I send cookie and phpssid with urllib2 in python?
Actually I want to read a page I've logged in with my browser, but when I try to read it with this script I encounter a text which seems to say that you've missed something.
My script :
#!/usr/bin/python
import urllib2
f = urllib2.urlopen('http://mywebsite.com/sub/create.php?id=20')
content = f.read()
file = open('file.txt', 'w')
file.write(content)
file.close()
The error message I save instead of the real page :
Warning: session_start() [function.session-start]: Cannot send session cookie - headers already sent by (output started at /home/number/domains/1number.com/public_html/s4/app/mywidgets.php:1) in /home/number/domains/1number.com/public_html/s4/app/mywidgets.php on line 23
Warning: session_start() [function.session-start]: Cannot send session cache limiter - headers already sent (output started at /home/number/domains/1number.com/public_html/s4/app/mywidgets.php:1) in /home/number/domains/1number.com/public_html/s4/app/mywidgets.php on line 23
Warning: Cannot modify header information - headers already sent by (output started at /home/number/domains/1number.com/public_html/s4/app/mywidgets.php:1) in /home/number/domains/1number.com/public_html/s4/lib/webservice.php on line 0
What is the exact problem?(Please give me a simple way to implement what I want)
Thanks in advance
For the SID, one of the ways to send that is as part of the query string, and you're already doing that. At least I assume that's what the id=20 part of your URL is.
For cookies, everything you want is in cookielib.
Just creating a CookieJar to use for a session with the server is trivial. If you want to import cookies from your browser, there are three possibilities:
If your browser uses the old Netscape cookie file format, you can use FileCookieJar.
If your browser uses a sqlite database (as at least Firefox and Safari/Chrome do), use the sqlite3 module to read it, and populate a CookieJar manually.
If worst comes to worst, copy and paste the cookies from your browser into your script as hardcoded strings and popular a CookieJar manually.
If you don't want to read the docs on how to use cookielib, just see the examples at the end, which show how to use a CookieJar with urllib2, which is exactly what you want to do.
If you have a problem, read the docs.
Meanwhile, what you're showing us are (a) warnings, not errors, and (b) obviously a problem on the server side, not your script. The server should never be spewing out a bunch of warnings and an otherwise-blank page. If you, or one of your coworkers, is responsible for the server code, that needs to be fixed first (and your current simple Python script can serve as a great regression test case).

How can I use TOR as a proxy?

I'm trying to use TOR as a generic proxy but it fails
Right now I'm trying with python but I'm pretty sure it would be the same with any other language. I can connect to other proxies with python so I get how it "should" be done.
I found a list of TOR entry nodes
h = httplib.HTTPConnection("one entry node", 80)
h.connect()
h.request("GET", "www.google.com")
resp = h.getresponse()
page = resp.read()
unfortunately that doesnt work, i get redirected to a 404 message.
I'm just not sure of what I'm doing wrong. Probably the list of entry nodes cannot be connected just like that. I'm searching on how to do it properly but i dont get any documentation about how to program applications with tor
edit :
ditch the tor proxy list, i don't know why i should want to know about it.
the "entry node" is yourself, after you've installed the (windows) vidalia client and privoxy (all bundled as one)
httplib.HTTPConnection("one entry node", 80)
becomes
httplib.HTTPConnection("127.0.0.1", 8118)
and voilĂ , everything is routed through TOR
First, make sure you are using the correct node location and port. Most proxies use ports other than 80. Second, specify the protocol to use with the correct URL on your request string.
Under normal circumstances, your code should work if it looks something like this one:
h = httplib.HTTPConnection("138.45.68.134", 8080)
h.connect()
h.request("GET", "http://www.google.com")
resp = h.getresponse()
page = resp.read()
h.close();
You can also use socket as an alternative but that's another issue and it's even more complicated than the one above.
Hope that helps! :-)

Categories