Best way to prevent view counts from being abused - python

I'm currently using Redis to store a view count for each time a page is loaded. It's working fantastic but my only worry is that it will be abused. When a user is logged in on my site, the view counter will only update if they have not viewed the thread yet (again, tracked by Redis) so that abuse is negated.
My problem is with users I don't have an account for. If I were to let the view count be updated every time the page is loaded if someone created some content they could just log out and refresh the page as many times as they wanted to get their view count up. My first thought was identify every non-account user through a session cookie (I already track users logged in through a session cookie) but if someone cleared that cookie, it would be rendered useless again. Another thought is an IP address but through dynamic IP addresses it wouldn't be very reliable either.
So my question is, what is the most reliable way to track a view count from an unknown user?

Welcome to the world of tracking view counts!
Ill let you in on a couple of trade secrets.
What you want is probably a pixel tracker.
A pixel tracker is a service that serves a 1x1 pixel that does nothing except log the time that it is requested. So if you say you want to track someone visiting a thread, you could, make a sha512 hash of the thread name, sha512(thread_name) and get a hash. Then with this hash request a pixel from your webserver as follows
/px/<sha512hash>.gif
Then you can insert this request into a database, next time that page is rendered, you sha512 the title, you request that pixel, then you realise that this user has already been served a pixel for this title. You ignore the request, you do not increment the view counter.

Related

Scraping ASPX after login with Python but every login gives you a different URL

I'm trying to get the exam result data from my college website for every Roll No. in my class.
Normally you can POST url (www.example.com/login.aspx)with login information, and GET a fixed url after login(www.example.com/home.aspx).
But the page I'm trying to get has a different URL for every Roll no. entered. The URL of login page look like this: "www.example.com/View.aspx". After login, the URL of the result page looks like: "www.example.com/ovengine.aspx?enc=BunchOfNumbersandAlphabets". And those numbers and alphabets are different for each roll number.
So I can't put a URL in my code to get the final result. I don't know how to get the page that comes automatically after the login, without mentioning it's URL.
But the page I'm trying to get has a different URL for every Roll no. entered
No, it is the same URL, and the URL has a parameter. You see this in URL's all the time.
So, for a temperature site it might look like
www.TheWeatherSite.com/?City=Rome
So, the above URL is always the same, but the web site "city" parameter is for the City of Rome. The web code behind can thus use/get/grab/consume that parameter in the code behind. That way we don't create a web page for EACH weather for each city.
so you create ONE page, and then and then PASS the web page a city value that the code behind can consume and use. (say query temperature data from a database for city = above value).
And thus you have to know ahead of time what city you want the weather for. Of course this approach is great since you don't have to create a new web site page to just show/display the weather in a given city.
You are in effect passing a value to some code behind that will run, and use that passed value.
The same goes for your example URL. You note there is ONE parameter called "enc".
So, the web site code behind would:
Grab, get, set the users ID. However, the users ID would be from the security system and the authentication provider. Unless you logged in as that particular user, then you not get that user id.
So, both a user ID (limited to the internal code).
And the "enc" value as the parameter in the URL you have would be required.
So, note in the above sql, we VERY likely need both a studentID and ALSO the "enc" value that some OTHER code from another page gets/grabs from the database.
Now that funny "GUID" (please do google what a GUID is), from a programmers point of view WOULD be sufficient to pull this one row of data from the database, but by ALSO using in the query the users logged on internal id?
Well, then only a given logged on user would be able to see their own set of values that belong to them.
In other words?
Only a drunken un-employed Rodeo clown would JUST require that GUID for pulling out that data. Since if that was the case, then any user could type in that GUID and see others peoples marks. However, there is "some" security by using a GUID, since a user could never guess that value.
If they used "city" like my first URL and parameter example? Then yes, you could guess and know the city value to type in. Or they could have used say student name, or even student number - those you COULD guess with relative ease.
But, for such data, no doubt the user adopted something MUCH more difficult then a starting number like a row number or PK id from a database. So, when the code added the results to that table? They also added a GUID of some type and saved that as a row in the database also.
So you NOT only need JUST the GUID, but that URL will ONLY work for a given pair of values. (the student ID - which is ONLY internal to the code and pulled FROM the authenticated provider. That was this line of code:
= Membership.GetUser.ProviderUserKey
So that above value is going to be the users logon internal ID.
The enc (external) exposed value in the web URL as a parameter, and ALSO the internal logged on value. So the code behind (asp.net) would look something like this:
Dim strSQL As String
strSQL = "SELECT * from tblStudentMarks where StudentID = #pID " &
" AND TestResultsGID = #GID"
Dim cmdSQL As New SqlCommand(strSQL, GetCon)
cmdSQL.Parameters.Add("#pID", SqlDbType.Int).Value = Membership.GetUser.ProviderUserKey
cmdSQL.Parameters.Add("#GID", SqlDbType.VarChar).Value = Request.QueryString("enc")
Dim dReader As New SqlDataAdapter(cmdSQL)
Dim rstData As DataTable
dReader.Fill(rstData)
Note the code:
Request.QueryString("enc")
That allows the code behind to get/grab the parameter (enc) from the URL. But, as I stated, it is high unlikely that JUST the "enc" number is required here. It is possible that ONLY this value is required to pull the data from the row, but then that would be a security hole the size of a open barn door.
Think of your on-line banking.
www.mybank.com/?CustomerNumber=1234
Well, if we JUST use the above CustomerNumber as the means to pull bank data, then I could go to the site and type in YOUR number, or someone's else's number.
So, for this to work?
You will need to obtain a list of enc values (that messy funny long string). Without that parameter then you not be able to set the parameter in the URL.
However, as I stated, you ALSO very likely need some internal "user" logon id that is NOT included in the public exposed URL to ALSO grab that one row of data from the database.
And, even more important? Such web pages usually cannot be hit UNLESS you are a logged in as an authenticated user. In other words that web page will ONLY be dished out to logged in users - if you not logged in, then the server security will automatic NOT dish out the web page unless you are logged in user.
So, for this to work, you need to contact the web site developers, and obtain that list of "enc" values. Once you have that list, then you can generate some code to process that list and insert the correct parameter in the URL. However, you also need to ask if that URL and parameter value will work for JUST you the logged in user, or if that this URL and parameter ONLY works for a give logged in user. Without these values, and without knowing if the URL and parameter will work for any user? (which I doubt it would), then just using a URL to get these values will not work.
It would be even BETTER to have the web site folks create a web service that you can call and in one command it would return all of the data you need anyway, as opposed to over and over having to send the "enc" value, which you don't have anyway.

Redirect loading old content?

a little bit of a weird issue. To simplify the problem for explanation purposes. A user goes onto /Total page and gets the count of the records in the datastore with "steve" as the name, at the moment there is 2, he can then presses increment and the user gets redirected to /Increment so another record is added into the nbd, the user is then once again redirected to /Total however it still shows 2! If he simply refreshes the page, it then shows 3. I assume it's because the redirect back to /Total happens before the entity is fully committed into the datastore? If not, here's the code, please let me know what's wrong. Thank you!
PYTHON:
#app.route("/Total", methods=['GET', 'POST'])
def total():
data = Logins.query(Logins.name == "steve").count()
return render_template('Total.html', count=count)
#app.route("/Increment", methods=['GET', 'POST'])
def incre():
new_data = oAuthLogins()
new_data.name = "steve"
new_data.put()
return redirect(url_for('total'))
Total.html:
{{count}}
<a href={{url_for('incre')}}> Increment! </a>
This can happen for a bunch of reasons: slow db update (eventually consistent), caching, etc.
But, is this a good user experience?
If a user sees a number (2) then presses increment, but that record has changed, is it better to show the user the number they expect (3), or the actual number (which may be much higher than 3 by the time they click increment)?
For UX, I would think the user should see the number 3. He/she can always see the latest number with another refresh.
With that in mind, you solve this by changing the /increment call to be done via AJAX and simply increment the number on the front end when you get a 200 response.
You're obtaining the count from a (Logins kind) query which is always eventually consistent unless it's an ancestor query.
You could display the correct count if you transform the method of getting it into a strongly consistent one, for example by storing it in a property of the user entity which you can always obtain by key lookup (or some other equivalent entity kind always in a 1-to-1 relationship with the user entity, see re-using an entity's ID for other entities of different kinds - sane idea?)
I have tried to replicate the issue and it worked for me. I am not sure if this is what you are looking for but here is my example code in GitHub.
When running the code follow these steps:
Navigating to localhost:8080 will print Hello World
Navigate to localhost:8080/add for the first time to enter the first value in entity. (Will add number 2)
Go to the Entities in the GCP page and get the id from the record
Replace that id with the one in the code ENTITY_KEY
Navigate to localhost:8080/get will print the number saved in the entity
Navigate to localhost:8080/update will get that number, add 1 to that number and will save the number again.
After that it will automatically redirect to localhost:8080/get and will show you the updated value in the entity

Django : How to count number of people viewed

I'm making a simple BBS application in Django and I want it so that whenever someone sees a post, the number of views on that post (post_view_no) is increased.
At the moment, I face two difficulties:
I need to limit the increase in post_view_no so that one user can only increase it once regardless of how many times the user refreshes/clicks on the post.
I also need to be able to track the users that are not logged in.
Regards to the first issue, it seems pretty easy as long as I create a model called 'View' and check the db but I have a feeling this may be an overkill.
In terms of second issue, all I can think of is using cookies / IP address to track the users but IP is hardly unique and I cannot figure out how to use cookies
I believe this is a common feature on forum/bbs solutions but google search only turned up with plugins or 'dumb' solutions that increase the view each time the post is viewed.
What would be the best way to go about this?
I think you can do both things via cookies. For example, when user visits a page, you can
Check if they have “viewed_post_%s” (where %s is post ID) key set in their session.
If they have, do nothing. If they don't, increase view_count numeric field of your corresponding Post object by one, and set the key (cookie) “viewed_post_%s” in their session (so that it won't count in future).
This would work with both anonymous and registered users, however by clearing cookies or setting up browser to reject them user can game the view count.
Now using cookies (sessions) with Django is quite easy: to set a value for current user, you just invoke something like
request.session['viewed_post_%s' % post.id] = True
in your view, and done. (Check the docs, and especially examples.)
Disclaimer: this is off the top of my head, I haven't done this personally, usually when there's a need to do some page view / activity tracking (so that you see what drives more traffic to your website, when users are more active, etc.) then there's a point in using a specialized system (e.g., Google Analytics, StatsD). But for some specific use case, or as an exercise, this should work.
Just to offer a secondary solution, which I think would work but is also prone to gaming (if coming by proxy or different devices). I haven't tried this either but I think it should work and wouldn't require to think about cookies, plus you aggregate some extra data which is noice.
I would make a model called TrackedPosts.
class TrackedPosts(models.Model):
post = models.ForeignKey(Post)
ip = models.CharField(max_length=16) #only accounting for ipv4
user = models.ForeignKey(User) #if you want to track logged in or anonymous
Then when you view a post, you would take the requests ip.
def my_post_view(request, post_id):
#you could check for logged in users as well.
tracked_post, created = TrackedPost.objects.get_or_create(post__pk=id, ip=request.ip, user=request.user) #note, not actual api
if created:
tracked_post.post.count += 1
tracked_post.post.save()
return render_to_response('')

Best way to show a user random data from an SQL database?

I'm working on a web app in Python (Flask) that, essentially, shows the user information from a PostgreSQL database (via Flask-SQLAlchemy) in a random order, with each set of information being shown on one page. Hitting a Next button will direct the user to the next set of data by replacing all data on the page with new data, and so on.
My conundrum comes with making the presentation truly random - not showing the user the same information twice by remembering what they've seen and not showing them those already seen sets of data again.
The site has no user system, and the "already seen" sets of data should be forgotten when they close the tab/window or navigate away.
I should also add that I'm a total newbie to SQL in general.
What is the best way to do this?
The easiest way is to do the random number generation in javascript at the client end...
Tell the client what the highest number row is, then the client page keeps track of which ids it has requested (just a simple js array). Then when the "request next random page" button is clicked, it generates a new random number less than the highest valid row id, and providing that the number isn't in its list of previously viewed items, it will send a request for that item.
This way, you (on the server) only have to have 2 database accessing views:
main page (which gives the js, and the highest valid row id)
display an item (by id)
You don't have any complex session tracking, and the user's browser is only having to keep track of a simple list of numbers, which even if they personally view several thousand different items is still only going to be a meg or two of memory.
For performance reasons, you can even pre-fetch the next item as soon as the current item loads, so that it displays instantly and loads the next one in the background while they're looking at it. (jQuery .load() is your friend :-) )
If you expect a large number of items to be removed from the database (so that the highest number is not helpful), then you can instead generate a list of random ids, send that, and then request them one at a time. Pre-generate the random list, as it were.
Hope this helps! :-)
You could stick the "already seen" data in a session cookie. Selecting random SQL data is explained here

Server side form validation and POST data

I have a user input form here:
http://www.7bks.com/create (Google login required)
When you first create a list you are asked to create a public username. Unfortuantely currently there is no constraint to make this unique. I'm working on the code to enforce unique usernames at the moment and would like to know the best way to do it.
Tech details: appengine, python, webapp framework
What I'm planning is something like this:
first the /create form posts the data to /inputlist/ (this is the same as currently happens)
/inputlist/ queries the datastore for the given username. If it already exists then redirect back to /create
display the /create page with all the info previously but with an additional error message of "this username is already taken"
My question is:
Is this the best way of handling server side validation?
What's the best way of storing the list details while I verify and modify the username?
As I see it I have 3 options to store the list details but I'm not sure which is "best":
Store the list details in the session cookie (I am using GAEsessions for cookies)
Define a separate POST class for /create and post the list data back from /inputlist/ to the /create page (currently /create only has a GET class)
Store the list in the datastore, even though the username is non-unique.
Thank you very much for your help :)
I'm pretty new to python and coding in general so if I've missed something obvious my apologies.
Tom
PS - I'm sure I can eventually figure it out but I can't find any documentation on POSTing data using the webapp appengine framework which I'd need in order to do solution 2 above :s maybe you could point me in the right direction for that too? Thanks!
PPS - It's a little out of date now but you can see roughly how the /create and /inputlist/ code works at the moment here: 7bks.com Gist
I would use Ajax to do an initial validation. For example as soon as the user name input box loses focus I would in the background send a question to the server asking if the user name is free, and clearly signal the result of that to the user.
Having form validation done through ajax is a real user experience delight for the user if done correctly.
Of course before any of the data was saved I would definitely redo the validation server side to avoid request spoofing.
jQuery has a nice form validation plugin if you are interested. http://docs.jquery.com/Plugins/validation.
In my career, I've never gotten around having to validate server side as well as client side though.
About the storing of the list (before you persist it to the datastore). If you use ajax to validate the user name you could keep the other fields disabled until a valid user name is filled in. Don't allow the form to be posted with an invalid user name!
That would perhaps solve your problem for most cases. There is the remote possibility that someone else steals the user name while your first user is still filling in his list of books. If you want to solve that problem I suggest simply displaying the list as you got it from the request from the user. He just sent it to you, you don't have to save it anywhere else.
Can you use the django form validation functionality (which I think should just abstract all this away from you):
http://code.google.com/appengine/articles/djangoforms.html
Search in that page for "adding an item" - it handles errors automatically (which I think could include non-unique username).
Warning: also a beginner... :)

Categories