How to ban users from my Django app (with a twist)

How to ban users from my Django app (with a twist) - python

I have a web-based Django app where users congregate and chat with one another, under pseudonyms.
Most of the users hitting this website do so via Opera Mini. Unlike straightforward web browsers, Opera Mini has a twist that it fetches all content through a proxy server, and reformats web pages into a format more suitable for small screens.
I want to implement a banning feature in this app. Some users are terrorizing others - if I manually ban them right now, they simply return under new nicknames. Note that these users aren't very tech savvy - almost all are not more than semi-educated. My question is thus-pronged:
Is banning user IP effective when they're using proxy such as Opera Mini?
Is there any reputable Django plugin available that handles IP blocking elegantly?
If 1 doesn't hold (in which case, 2 won't either), is there any other robust method I can follow to keep out antagonistic users and protect my community?
Currently, I've given these users a "downvote" feature, muting accounts whose posts receive too many downvotes. But that is of virtually no help in flame-wars. The abuser keeps returning under new pseudonyms, undermining the whole community. Maybe I should try hellbanning, if nothing else works?
Note: I'm not an advanced programmer (more of a designer), so I'll prefer swift solutions that have a small time-to-market for someone like me.

Have a look at this section of the Opera Mini documentation. The Opera servers will be sending you an X-Forwarded-For header which contains the original IP of the client, which you can access in Django with request.META['HTTP_X_FORWARDED_FOR'].
That said, some things to bear in mind (I live and build websites in a country where Opera Mini has largest web browser market share):
It sounds like many of your users are connecting from phones. This means that they are highly likely to have dynamic IP addresses. Their IP can change frequently and if you ban one IP now it may end up blocking access for a different user a few minutes/hours later. If you do ban IPs, then it is advisable to set a fairly short timeout.
X-Forwarded-For headers are notoriously unreliable. They will often contain internal IPs and you need to filter those out. There is also no guarantee that you will get the correct upstream IP (consider cases where the user is behind a VPN/Tor node/etc). You also need to account for any reverse proxies that you may have in front of your application.
People who really want to abuse the system will find a way in. A moderation and/or reputation based system is the only way to keep their noise to a minimum.

Related

How can I pretend to be in a certain country during web scraping?

I want to scrape a website, but it should look like I am from a specific (let's say USA for this example) country (to make sure that my results are valid).
I am working in Python (Scrapy). And for scraping, I am using the rotating user agents (see: https://pypi.org/project/scrapy-fake-useragent-fix/).
The user agents are what I need to scrape. But can I use this, in combination with the request to pretend that I am in a specific country?
If there are some possibilities (in scrapy, Python) please let me know. Appreciated!
Example how I used the User Agents in my script
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
}

to pretent a certain country you need an IP from that country. Unfortunately this is nothing you can configure just by scrapy settings etc. But you could use a proxy service like crawlera:
https://support.scrapinghub.com/support/solutions/articles/22000188398-restricting-crawlera-ips-to-a-specific-region
Note: unfortunalty this service is not free and the cheapest plan is about 25 EUR. There are many other cheaper services available. The reason Crawlera is expensive is that they offer ban detection and only serve good IPs for your chosen domain. I've found them useful for the cost on Amazon and Google. Though on lesser domains a cheaper service with unlimited service would be more suitable.

You can do this using Selenium (Don't know about Scrapy), First tell the bot to go to this site :
Proxy Site
And then add your target site to search box and scrape .

Hello #helloworld1990,
Based on your requirement, say if you want to make each request using different IPs i.e. use IP Rotation (used when the site is detecting and blocking you after certain requests) then go for "Proxy Providers" there are many such providers you just need to google them.
If its not the case above, then for short term use you can try using TOR IPs. But TOR IPs are well known and are generally blocked. Else, you can still buy few static IPs from Proxy Providers and make the requests.
if(uniqueIpForEachRequestFromDifferentGeoLocations){
//go for proxy providers - IP Rotation
}else{
if(shortTermUse){
//go for tor nodes
}else{
//go for static IPs`enter code here`
}
}
Cheers! Hope this helps..

How to make sure that my AJAX requests are originating from the same server in Python

I have already asked a question about IP Authentication here: TastyPie Authentication from the same server
However, I need something more! An IP address could be very easily spoofed.
Scenario: My API (TastyPie) and Client App (in javascript) are on the same server/site/domain. My users don't login. I want to consume my API in my javascript client side.
Question: How can I make sure (authentication) that my AJAX requests are originating from the same server?
I'm using Tatypie. I need to authentication that the requests from the client are being made on the same server/domain etc. I cannot use 'logged in sessions' as my users don't login.
I have looked at private keys and generating a signature but they can viewed in the javascript making that method insecure. If I do it in a way to request a signature form the server (hiding the private key in some python code) anyone can make the same http request to get_signature that my javascript makes, thus defeating the point.
I also tried to have the Django view put the signature in the view eliminating the need to make the get_signature call. This is safe, but means that I have to now refresh the page every time to get a new signature. From a users point of view only the first call to the API would work, after which they need to refresh, again pointless.
I cannot believe I'm the only person with this requirement. This is a common scenario I'm sure. Please help :) An example using custom authentication in Tastypie would be welcome too.
Thanks
Added:

Depending on your infrastructure #dragonx's answer might interest you most.
my 2c
You want to make sure that only if a client visits your website can use the api? Hmm does the bot, robot, crawler fall in the same category with the client then? Or am I wrong? This can be easily exploited in case you really want to secure it really.
I cannot believe I'm the only person with this requirement.
Maybe not, but as you can see you are prone to several attacks to your API and that can be a reason for someone not sharing your design and making security stricter with auth.
EDIT
Since we are talking about AJAX requests what does the IP part has to do with this? The IP will always be the Client's IP! So probably, you want a public API...
I would Go with the tokens/session/cookie part.
I 'd go with a generated token that lasts a little while and a flow described below.
I'd go with a limiter per some time, like Github does. Eg 60 requests per hour per ip or more for registered users
To overcome the problem with the refreshing token I would just do this:
Client visits the site
-> server generates API TOKEN INIT
-> Client gets API TOKEN INIT which is valid only for starting 1 request.
Client makes AJAX Request to API
-> Client uses API TOKEN INIT
-> Server checks against API TOKEN INIT and limits
-> Server accepts request
-> Server passes back API TOKEN
-> Client consumes response data and stores API TOKEN for further usage (Will be stored in browser memory via JS)
Client Starts Comm with the API for a limited amount of time or requests. Notice that you know also the init token date so you can use it to check against the 1st visit on the page.
The 1st token is generated via the server when the client visits.
Then the client uses that token in order to obtain a real one, that lasts for some time or something else as of limitation.
This makes someone actually visit the webpage and then he can access the API for a limit amount of time, requests perhaps etc.
This way you don't need refreshing.
Of course the above scenario could be simplified with only one token and a time limit as mentioned above.
Of course the above scenario is prone to advanced crawlers, etc since you have no authentication.
Of course a clever attacker can grab tokens from server and repeat the steps but, then you already had that that problem from start.
Some extra points
As the comments provided please close writes to the API. You don't want to be a victim of DOS attacks with writes if you have doubts about your implementation(if not use auth) or for extra security
The token scenario as described above can also become more complicated eg by constantly exchanging tokens
Just for reference GAE Cloud storage uses signed_urls for kind of the same purpose.
Hope it helps.
PS. regarding IP spoofing and Defense against spoofing attacks wikipedia says so packet's won't be returned to the attacker:
Some upper layer protocols provide their own defense against IP
spoofing attacks. For example, Transmission Control Protocol (TCP)
uses sequence numbers negotiated with the remote machine to ensure
that arriving packets are part of an established connection. Since the
attacker normally can't see any reply packets, the sequence number
must be guessed in order to hijack the connection. The poor
implementation in many older operating systems and network devices,
however, means that TCP sequence numbers can be predicted.

If it's purely the same server, you can verify requests against 127.0.0.1 or localhost.
Otherwise the solution is probably at the network level, to have a separate private subnet that you can check against. It should be difficult for an attacker to spoof your subnet without being on your subnet.

I guess you're a bit confused (or I am, please correct me). That your JS code is published on the same server as your API does not mean AJAX requests will come from your server. The clients download the JS from your server and execute it, which results in requests to your API sent from the clients, not from the same server.
Now if the above scenario correctly describes your case, what you are probably trying to do is to protect your API from bot scraping. The easiest protection is CAPTCHA, and you can find some more ideas on the Wiki page.
If you are concerned that other sites may make AJAX calls to your API to copy your site functionality, you shouldn't be--AJAX requests can only be sent to the same server as the page the JS is running on, unless it is JSONP.

Short answer: It is not possible to prevent a dedicated attacker.
You have no method of identifying a client other than with the information that they give you. For instance, username/password authentication works under the assumption that only a valid client would be able to provide valid credentials. When someone logs in, all you know is that some person provided those credentials -- you assume that this means that this means that they are a legitimate user.
Let's take a look at your scenario here, as I understand it. The only method you have of authenticating a client is IP Address, a very weak form of authentication. As you stated, this can be easily spoofed, and in with some effort your server's response can be received back to the attacker's original IP address. If this happens, you can't do anything about it. The fact is, if you assume someone from a valid IP address is a valid user, then spoofers and legitimate users are indistinguishable. This is just like if someone steals your password and tries to log in to StackOverflow. To StackOverflow, the attacker and you are indistinguishable, since all they have to go on is the username and password.
You can do fancy things with the client as mentioned in other answers, such as tokens, time limits, etc., but an dedicated attacker would be able to mimic the actions of a legitimate client, and you wouldn't be able to tell them apart because they would both appear to be from valid IP addresses. For instance, in your last example, if I was an attacker looking to make API calls, I would spoof a legitimate IP address, get the signature, and use it to make an API call, just as a legitimate client would.
If your application is critical enough to deem this level of thought into security, you should at least think of implementing something like API tokens, public key encryption, or other authentication methods that are more secure than IP addresses to tell your clients apart from any attackers. Authentication by IP address (or other easily forged tokens like hostname or headers) simply won't cut it.

may be you could achieve this by using Same-origin policy
refer http://en.wikipedia.org/wiki/Same_origin_policy

As suggested by Venkatesh Bachu, Same Origin Policy and http://en.wikipedia.org/wiki/Cross-Origin_Resource_Sharing (CORS) could be used as a solution.
In your API, you can check Origin header and respond accordingly.
Need to check if Origin header can be modified by using extensions like tamper data.
A determined hacker can still snoop by pointing browser to a local proxy server.

If this app server is running on an ordinary web server that has configurable listening IP address, set it to 127.0.0.1. With the TCPServer module, it's like
SocketServer.TCPServer(("127.0.0.1", 12345), TheHandlerClass)
Use netstat command to verify the listening address is correct as "127.0.0.1"
tcp4 0 0 127.0.0.1.12345 *.* LISTEN
This will effectively making any connection originated outside the same host impossible on the TCP level.

There are two general solution types: in-band solutions using normal web server/client mechanisms, that are easy to implement but have limitations; and out-of-band solutions that rely on you to configure something externally, that take a little more work but don't have the same limitations as in-band.
If you prefer an in-band solution, then the typical approach used to prevent cross-site request forgery (XSRF) would work well. Server issues a token with a limited life span; client uses the token in requests; privacy of token is (sort of) assured by using an HTTPS connection. This approach is used widely, and works well unless you are worried about man-in-the-middle attacks that could intercept the token, or buggy browsers that could leak data to other client-side code that's being naughty.
You can eliminate those limitations, if you're motivated, by introducing client certificates. These are kind of the flip side to the SSL certificates we all use on web servers -- they operate the same way, but are used to identify the client rather than the server. Because the certificate itself never goes over the wire (you install it locally in the browser or other client), you don't have the same threats from man-in-the-middle and browser leakage. This solution isn't used much in the wild because it's confusing to set up (very confusing for the typical user), but if you have a limited number of clients and they are under your control, then it could be feasible to deploy and manage this limited number of client certificates. The certificate operations are handled by the browser, not in client code (i.e. not in JavaScript) so your concern about key data being visible in JavaScript would not apply in this scenario.
Lastly, if you want to skip over the client configuration nonsense, use the ultimate out-of-band solution -- iptables or a similar tool to create an application-level firewall that only allows sessions that originate from network interfaces (like local loopback) that you know for certain can't be accessed off the box.

Implementing session-like storage in python application

I will keep it short.
Can someone please point me in the right direction in:
How to authenticate users in native applications written in Python?
I know in web there are sessions, but I can't think of a way to implement authentication, that will 'live' for some time and on expiry I can logout the user?
EDIT:
I am referring to desktop type of apps, I am fairly happy with the implementation for Web based development in Twisted
EDIT 2
The application I am thinking about will not authenticate against a server, but a self-contained application, an example the idea is a Cash Register/Point of Sale (my idea is kinda different, but parts of the functionality is the same), in which I need to authenticate the cashier, so I can log the transactions processed by him/her, print name on receipt and etc. All will be based in one single machine, no server communication or anything

It’s not entirely clear what kind of security you are expecting.
In general, if the end user has physical access to the machine and a screwdriver, you’re pretty much screwed—they can do whatever they want on that machine.
If you take hardware security as a given, but want to ensure software security, then you’re going to have to do server communication within the machine’s boundaries. You have to separate the server and the client, and run the server in a security context that is inaccessible to the user. The server will then do both the authentication and whatever operations need authentication (printing out receipts etc.). For example, under a Unix-like OS, you would run a daemon under a dedicated system user or under root; on Windows, you would have a system service running as LOCAL SERVICE or whatever that’s called. In this way, the operating system’s built-in security features will ensure (given proper maintenance, like timely application of security hotfixes) that the user cannot influence the behavior of the software that does the sensitive operations. The protocol between the client and the server can be anything, and you can do authentication in much the same way as in HTTP—indeed, you may even use HTTP itself.
Finally, if you’re certain that your users will not be tampering with your system at all—e.g. because they lack the technical skills, or are being watched by CCTV cameras—you can forget all that stuff and go with Puciek’s answer.

You seem to be very confused and fixated on "sessions" for some reasons, maybe because your background is in the web apps?
Any-who you don't need "sessions" because with desktop application you have no trouble telling who is using the software without needing some elaborate tools. You don't need server, you don't need authentication tools, you don't need anything - just store that user within your single application. That is all really - a variable within your application called "user" and maybe some interface at the boot to pick one from available users.
And if you need it to last between boots, just save it in a file and read from it.

If you're using Unix, rely on the fact that it's a multi user system. That is, the user has already logged in using his own credentials, so you don't need to do anything, just use its home directory to store the data, taking care to block other users from accessing it by using permissions. You can improve this to provide encryption too. For global application data, you can specify a "manager" user or group, with its own directory, where the application can write.
All this might be possible on Windows systems too.

how to hide domain name?

I have developed a web interface for a system in django, which is running on my institution server (abc.edu). So the web address for the interface is http://def.abc.edu:8000/mysystem.
I am going to submit a paper about the system in a double blind conference (reviewers should not know which institution I am from). SO, I can not put the link http://def.abc.edu:8000/mysystem in my paper, I have to hide the domain name. Is there a way to do that in django, or in any other way? Any help will be appreciated.

As stated in the comments, this is not done using Django but using a DNS. The reason is simple: when you type an address in the URL bar of your browser, it asks a DNS to which IP does the domain of the URL correspond, which Django (or any other web framework) is oblivious of. Changing your address in Django will only change the URL on links which will become invalid.
Providing directly the IP of your server, as stated in the comments, won't provide any protection of any kind because universities IP addresses ranges are well known. Finding from which university a given IP comes from is easy.
The easiest way to achieve your need would be to get (for free or purchase) a DNS which redirects to your address. Dyndns.org, noip.com and similar DNS service providers gives you some features such as embedding your website in a frame to hide its address from the URL and similar tricks. Most of these tricks are pretty easy to deceive and discover the origin URL or address, though.
You may also host your project on another server, outside your university. Depending on the requirements of your web interface, some hosts may host you for free.

Google App Engine (Python)- Strange behaviour of REMOTE_ADDR

In order to make the registration process on my website easy, I allow users to enter their email address which I will send a verification code to or alternatively they can solve a captcha.
The problem is that in order to prevent robots from registering accounts (with fake emails) I limit the number of registrations allowed per IP address and if this limit is exceeded I trigger a warning in the logs.
However ... what seems to be happening is that I am using os.environ['REMOTE_ADDR'] to check the remote address -- but it seems that I am triggering warnings on addresses that are owned by Google (66.249.65.XXX). It is possible that this is happening only after I change the version (but not confirmed). Does anyone know how/why this might be happening? Shouldn't the REMOTE_ADDR return the address of the client computer (and hopefully in all cases it would do this)?
I am curious if there is some behind the scenes re-directions going on, and if this is a normal event or if it only happens when a new version is installed (perhaps when a new version is installed the original server then proxies the user to the new server, therefore creating the illusion that the IP address is an internal IP?)

I believe that I have figured out the reason for seeing so many warnings from google server IP addresses. It seems that immediately after a new user registers, the google crawlers are going to the same (registration) webpage (which I send information to as a GET instead of a POST for reasons which I will not get into). Of course, since many users are registering, but there are only a few crawler computers that are checking periodic updates to my website, I am triggering warning messages that a particular (google) IP is accessing a registration area repeatedly.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.