Implementing a "one vote per day" system in pyramid based on cookies - python

Background:
I have the core functionality of a very simple vote-based site setup and working well in pyramid utilizing a sqlite database. The last requirement for this application is to allow only one vote per day, per user. It has been specified that this must be done via cookies, and that no users shall be allowed to vote on Saturdays or Sundays.
I am currently using UnencryptedCookieSessionFactoryConfig for session management and to handle flash messages.
Question:
I've identified that I need the following functionality, but can't determine what modules of pyramid might provide it (or if I should be looking elsewhere):
Create a cookie for each user that persists between browser sessions (I am aware this is insecure as a method of preventing multiple votes. That's fine.)
Allow a single vote to be placed per day, per user.
Give a new vote to a user once 24 hours has elapsed.
Prevent all voting if day of week = saturday or sunday (this should be trivial with the use of a datetime() check placed prior to any cookie-checking logic.
Additional Info:
My current db schema is as follows, and must stay this way:
create table if not exists games (
id integer primary key autoincrement,
title char(100) not null,
owned bool not null,
created char(40) not null
);
create table if not exists votes (
gameId integer,
created char(40) not null,
FOREIGN KEY(gameId) REFERENCES games(id)
);
and the current vote function is as follows:
#view_config(route_name='usevote')
def usevote_view(request):
game_id = int(request.matchdict['id'])
request.db.execute('insert into votes (gameId,created) values (?,?)',
(game_id,now))
request.db.commit()
request.session.flash('Your vote has been counted. You can vote again in 24 hours.')
return HTTPFound(location=request.route_url('list'))
Thanks!

session data on cookies only
To integrate cookie sessions on pyramid, take a look on pyramid_beaker
To guarantee integrity only using cookies (and avoid the user poking into the cookie data), you should use an encrypted cookie (take a look into the Session Based Cookie and the Encryption Options).
Your main configuration will look somewhat like this:
[app:main]
...
session.type = cookie
session.key = SESSION
session.encrypt_key = R9RD9qx7uzcybJt1iBzeMoohyDUbZAnFCyfkWfxOoX8s5ay3pM
session.validate_key = pKs3JDwWiJmt0N0wQjJIqdG5c1XsHSlauM6T2DfB8FqOifsWZN
...
The session.key is just the name of the cookie. Change for whatever you want
The session.encrypt_key and session.validate_key above are just examples of big random strings. You should generate them yourself and keep them private.
Also, to encrypt the cookies properly you will need an AES cipher implementation. Installing pycrypto should do it:
pip install pycryto
Also your main function that creates the wsgi application should be changed to something like this:
from pyramid_beaker import session_factory_from_settings
...
def main(global_config, **settings):
...
config = Configurator(settings=settings)
...
config.set_session_factory(session_factory_from_settings(settings))
Now you can store the cookie data directly into the client browser and avoid data tampering. The simple solution to solve your problem is setting this cookie to never expire, storing the date of the last time he voted inside it and check based on what day is today and what day did he last voted
the main issue
The main problem now is dealing with users that delete the cookie, use another browser or simple use the browser's incognito window (chrome) or private navigation (firefox). This user appears to be a new user to your system and thus can vote again.
IMO to solve that you will need to have a server side control or penalize the user in a way that deleting the cookie will actually make his life harder to the point that deleting the cookie to gain a vote is not desirable anymore.
Security is not about perfect unhackable systems, but building systems that the cost to bypass it is actually higher than the benefit of doing it.

Using cookies for that kind of control will not prevent even the most simple attack (using a different browser for example :)). But you seem to know it and not actually care so it should be fine I guess:
Everytime a user votes, add a field to the cookie (you should also set its age limit to at least a week) with the value of the current date.
Next time the user tries to vote, you check if it's Saturday or Sunday (according to the user time settings),if that field exists and if the value is older than one day.
If you set the cookie validity to the next Saturday, you will have an extra verification mechanism as the cookie won't be valid anyway if it's Saturday :)

Related

Scraping ASPX after login with Python but every login gives you a different URL

I'm trying to get the exam result data from my college website for every Roll No. in my class.
Normally you can POST url (www.example.com/login.aspx)with login information, and GET a fixed url after login(www.example.com/home.aspx).
But the page I'm trying to get has a different URL for every Roll no. entered. The URL of login page look like this: "www.example.com/View.aspx". After login, the URL of the result page looks like: "www.example.com/ovengine.aspx?enc=BunchOfNumbersandAlphabets". And those numbers and alphabets are different for each roll number.
So I can't put a URL in my code to get the final result. I don't know how to get the page that comes automatically after the login, without mentioning it's URL.
But the page I'm trying to get has a different URL for every Roll no. entered
No, it is the same URL, and the URL has a parameter. You see this in URL's all the time.
So, for a temperature site it might look like
www.TheWeatherSite.com/?City=Rome
So, the above URL is always the same, but the web site "city" parameter is for the City of Rome. The web code behind can thus use/get/grab/consume that parameter in the code behind. That way we don't create a web page for EACH weather for each city.
so you create ONE page, and then and then PASS the web page a city value that the code behind can consume and use. (say query temperature data from a database for city = above value).
And thus you have to know ahead of time what city you want the weather for. Of course this approach is great since you don't have to create a new web site page to just show/display the weather in a given city.
You are in effect passing a value to some code behind that will run, and use that passed value.
The same goes for your example URL. You note there is ONE parameter called "enc".
So, the web site code behind would:
Grab, get, set the users ID. However, the users ID would be from the security system and the authentication provider. Unless you logged in as that particular user, then you not get that user id.
So, both a user ID (limited to the internal code).
And the "enc" value as the parameter in the URL you have would be required.
So, note in the above sql, we VERY likely need both a studentID and ALSO the "enc" value that some OTHER code from another page gets/grabs from the database.
Now that funny "GUID" (please do google what a GUID is), from a programmers point of view WOULD be sufficient to pull this one row of data from the database, but by ALSO using in the query the users logged on internal id?
Well, then only a given logged on user would be able to see their own set of values that belong to them.
In other words?
Only a drunken un-employed Rodeo clown would JUST require that GUID for pulling out that data. Since if that was the case, then any user could type in that GUID and see others peoples marks. However, there is "some" security by using a GUID, since a user could never guess that value.
If they used "city" like my first URL and parameter example? Then yes, you could guess and know the city value to type in. Or they could have used say student name, or even student number - those you COULD guess with relative ease.
But, for such data, no doubt the user adopted something MUCH more difficult then a starting number like a row number or PK id from a database. So, when the code added the results to that table? They also added a GUID of some type and saved that as a row in the database also.
So you NOT only need JUST the GUID, but that URL will ONLY work for a given pair of values. (the student ID - which is ONLY internal to the code and pulled FROM the authenticated provider. That was this line of code:
= Membership.GetUser.ProviderUserKey
So that above value is going to be the users logon internal ID.
The enc (external) exposed value in the web URL as a parameter, and ALSO the internal logged on value. So the code behind (asp.net) would look something like this:
Dim strSQL As String
strSQL = "SELECT * from tblStudentMarks where StudentID = #pID " &
" AND TestResultsGID = #GID"
Dim cmdSQL As New SqlCommand(strSQL, GetCon)
cmdSQL.Parameters.Add("#pID", SqlDbType.Int).Value = Membership.GetUser.ProviderUserKey
cmdSQL.Parameters.Add("#GID", SqlDbType.VarChar).Value = Request.QueryString("enc")
Dim dReader As New SqlDataAdapter(cmdSQL)
Dim rstData As DataTable
dReader.Fill(rstData)
Note the code:
Request.QueryString("enc")
That allows the code behind to get/grab the parameter (enc) from the URL. But, as I stated, it is high unlikely that JUST the "enc" number is required here. It is possible that ONLY this value is required to pull the data from the row, but then that would be a security hole the size of a open barn door.
Think of your on-line banking.
www.mybank.com/?CustomerNumber=1234
Well, if we JUST use the above CustomerNumber as the means to pull bank data, then I could go to the site and type in YOUR number, or someone's else's number.
So, for this to work?
You will need to obtain a list of enc values (that messy funny long string). Without that parameter then you not be able to set the parameter in the URL.
However, as I stated, you ALSO very likely need some internal "user" logon id that is NOT included in the public exposed URL to ALSO grab that one row of data from the database.
And, even more important? Such web pages usually cannot be hit UNLESS you are a logged in as an authenticated user. In other words that web page will ONLY be dished out to logged in users - if you not logged in, then the server security will automatic NOT dish out the web page unless you are logged in user.
So, for this to work, you need to contact the web site developers, and obtain that list of "enc" values. Once you have that list, then you can generate some code to process that list and insert the correct parameter in the URL. However, you also need to ask if that URL and parameter value will work for JUST you the logged in user, or if that this URL and parameter ONLY works for a give logged in user. Without these values, and without knowing if the URL and parameter will work for any user? (which I doubt it would), then just using a URL to get these values will not work.
It would be even BETTER to have the web site folks create a web service that you can call and in one command it would return all of the data you need anyway, as opposed to over and over having to send the "enc" value, which you don't have anyway.

Google Analytics pagepath market basket analysis in Python

I would like to prepare a market basket analysis in Python based on Google Analytics data. I would like to examine what the most common paths the user goes through, and on a cookie level. I have encountered two problems: first, when I query the data from BigQuery, the hit number is on a session level and not on a cookie level. How would I be able to show the path a user has gone through (on a cookie and not on a session level)? Second, I do not know how to tweak the data: in R, a transaction class is needed for preparing the data to the apriori algorithm. I know that in Python the solution is to one hot encode the data, however, my problem is that through this solution, the sequence of page paths are lost.
Could somebody please help me? Thank you!
I think you best bet for aggregating page_paths at a cookie level would be to group by visitor_id. The visitor_id is what is assigned by GA as the cookie and should persist through visits unless a user goes incognito or clears cookies. Depending if you are using a Custom Dimension to track users logging on to your website, you will see that a user could have multiple visitor_ids.
Before you aggregate up you can combine all this information by using visit_id to distinguish between different sessions. You can query all hit level data for a given a user and then roll up from there.
I think this could be done by adjusting the WHERE clause in your query in how you're querying the hit level of the session now, keeping the hit number but now you're looking at all sessions.
SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
hits.type="PAGE"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber

Django : How to count number of people viewed

I'm making a simple BBS application in Django and I want it so that whenever someone sees a post, the number of views on that post (post_view_no) is increased.
At the moment, I face two difficulties:
I need to limit the increase in post_view_no so that one user can only increase it once regardless of how many times the user refreshes/clicks on the post.
I also need to be able to track the users that are not logged in.
Regards to the first issue, it seems pretty easy as long as I create a model called 'View' and check the db but I have a feeling this may be an overkill.
In terms of second issue, all I can think of is using cookies / IP address to track the users but IP is hardly unique and I cannot figure out how to use cookies
I believe this is a common feature on forum/bbs solutions but google search only turned up with plugins or 'dumb' solutions that increase the view each time the post is viewed.
What would be the best way to go about this?
I think you can do both things via cookies. For example, when user visits a page, you can
Check if they have “viewed_post_%s” (where %s is post ID) key set in their session.
If they have, do nothing. If they don't, increase view_count numeric field of your corresponding Post object by one, and set the key (cookie) “viewed_post_%s” in their session (so that it won't count in future).
This would work with both anonymous and registered users, however by clearing cookies or setting up browser to reject them user can game the view count.
Now using cookies (sessions) with Django is quite easy: to set a value for current user, you just invoke something like
request.session['viewed_post_%s' % post.id] = True
in your view, and done. (Check the docs, and especially examples.)
Disclaimer: this is off the top of my head, I haven't done this personally, usually when there's a need to do some page view / activity tracking (so that you see what drives more traffic to your website, when users are more active, etc.) then there's a point in using a specialized system (e.g., Google Analytics, StatsD). But for some specific use case, or as an exercise, this should work.
Just to offer a secondary solution, which I think would work but is also prone to gaming (if coming by proxy or different devices). I haven't tried this either but I think it should work and wouldn't require to think about cookies, plus you aggregate some extra data which is noice.
I would make a model called TrackedPosts.
class TrackedPosts(models.Model):
post = models.ForeignKey(Post)
ip = models.CharField(max_length=16) #only accounting for ipv4
user = models.ForeignKey(User) #if you want to track logged in or anonymous
Then when you view a post, you would take the requests ip.
def my_post_view(request, post_id):
#you could check for logged in users as well.
tracked_post, created = TrackedPost.objects.get_or_create(post__pk=id, ip=request.ip, user=request.user) #note, not actual api
if created:
tracked_post.post.count += 1
tracked_post.post.save()
return render_to_response('')

how to prevent multiple votes from a single user

I am writing a web app on google app engine with python. I am using jinja2 as a templating engine.
I currently have it set up so that users can upvote and downvote posts but right now they can vote on them as many times as they would like. I simply have the vote record in a database and then calculate it right after that. How can I efficiently prevent users from casting multiple votes?
I suggest making a toggleVote method, which accepts the key of the item you want to toggle the vote on, and the key of the user making the vote.
I'd also suggest adding a table to record the votes, basically containing two fields:
"keyOfUserVoting", "keyOfItemBeingVotedOn"
That way you can simply do a very query where the keys match, and if an item exists, then you know the user voted on that item. (Query where keyOfUserVoting = 'param1' and keyOfItemVoted='param2', if result != None, then it means the user voted)
For the toggleVote() method the case could be very simple:
toggleVote(keyOfUserVoting, keyOfItemToVoteOn):
if (queryResultExists):
// delete this record from the 'votes' table
else:
// add record to the 'votes' table
That way you'll never have to worry about keeping track on an individual basis of how many times the user has voted or not.
Also this way, if you want to find out how many votes are on an item, you can do another query to quickly count where keyOfItemToVoteOn = paramKeyOfItem. Again, with GAE, this will be very fast.
In this setup, you can also quickly tell how many times a user has voted on one item (count where userKey = value and where itemKey = value), or how many times a user has voted in the entire system (count where userKey = value)...
Lastly, for best reliability, you can wrap the updates in the toggleVote() method in a transaction, especially if you'll be doing other things on the user or item being voted on.
Hope this helps.
Store the voting user with the vote, and check for an existing vote by the current user, using your database.
You can perform the check either before you serve the page (and so disable your voting buttons), or when you get the vote attempt (and show some kind of message). You should probably write code to handle both scenarios if the voting really matters to you.

Generating unique and opaque user IDs in Google App Engine

I'm working on an application that lets registered users create or upload content, and allows anonymous users to view that content and browse registered users' pages to find that content - this is very similar to how a site like Flickr, for example, allows people to browse its users' pages.
To do this, I need a way to identify the user in the anonymous HTTP GET request. A user should be able to type http://myapplication.com/browse/<userid>/<contentid> and get to the right page - should be unique, but mustn't be something like the user's email address, for privacy reasons.
Through Google App Engine, I can get the email address associated with the user, but like I said, I don't want to use that. I can have users of my application pick a unique user name when they register, but I would like to make that optional if at all possible, so that the registration process is as short as possible.
Another option is to generate some random cookie (a GUID?) during the registration process, and use that, I don't see an obvious way of guaranteeing uniqueness of such a cookie without a trip to the database.
Is there a way, given an App Engine user object, of getting a unique identifier for that object that can be used in this way?
I'm looking for a Python solution - I forgot that GAE also supports Java now. Still, I expect the techniques to be similar, regardless of the language.
Your timing is impeccable: Just yesterday, a new release of the SDK came out, with support for unique, permanent user IDs. They meet all the criteria you specified.
I think you should distinguish between two types of users:
1) users that have logged in via Google Accounts or that have already registered on your site with a non-google e-mail address
2) users that opened your site for the first time and are not logged in in any way
For the second case, I can see no other way than to generate some random string (e.g. via uuid.uuid4() or from this user's session cookie key), as an anonymous user does not carry any unique information with himself.
For users that are logged in, however, you already have a unique identifier -- their e-mail address. I agree with your privacy concerns -- you shouldn't use it as an identifier. Instead, how about generating a string that seems random, but is in fact generated from the e-mail address? Hashing functions are perfect for this purpose. Example:
>>> import hashlib
>>> email = 'user#host.com'
>>> salt = 'SomeLongStringThatWillBeAppendedToEachEmail'
>>> key = hashlib.sha1('%s$%s' % (email, salt)).hexdigest()
>>> print key
f6cd3459f9a39c97635c652884b3e328f05be0f7
As hashlib.sha1 is not a random function, but for given data returns always the same result, but it is proven to be practically irreversible, you can safely present the hashed key on the website without compromising user's e-mail address. Also, you can safely assume that no two hashes of distinct e-mails will be the same (they can be, but probability of it happening is very, very small). For more information on hashing functions, consult the Wikipedia entry.
Do you mean session cookies?
Try http://code.google.com/p/gaeutilities/
What DzinX said. The only way to create an opaque key that can be authenticated without a database roundtrip is using encryption or a cryptographic hash.
Give the user a random number and hash it or encrypt it with a private key. You still run the (tiny) risk of collisions, but you can avoid this by touching the database on key creation, changing the random number in case of a collision. Make sure the random number is cryptographic, and add a long server-side random number to prevent chosen plaintext attacks.
You'll end up with a token like the Google Docs key, basically a signature proving the user is authenticated, which can be verified without touching the database.
However, given the pricing of GAE and the speed of bigtable, you're probably better off using a session ID if you really can't use Google's own authentication.

Categories