Best way to show a user random data from an SQL database? - python

I'm working on a web app in Python (Flask) that, essentially, shows the user information from a PostgreSQL database (via Flask-SQLAlchemy) in a random order, with each set of information being shown on one page. Hitting a Next button will direct the user to the next set of data by replacing all data on the page with new data, and so on.
My conundrum comes with making the presentation truly random - not showing the user the same information twice by remembering what they've seen and not showing them those already seen sets of data again.
The site has no user system, and the "already seen" sets of data should be forgotten when they close the tab/window or navigate away.
I should also add that I'm a total newbie to SQL in general.
What is the best way to do this?

The easiest way is to do the random number generation in javascript at the client end...
Tell the client what the highest number row is, then the client page keeps track of which ids it has requested (just a simple js array). Then when the "request next random page" button is clicked, it generates a new random number less than the highest valid row id, and providing that the number isn't in its list of previously viewed items, it will send a request for that item.
This way, you (on the server) only have to have 2 database accessing views:
main page (which gives the js, and the highest valid row id)
display an item (by id)
You don't have any complex session tracking, and the user's browser is only having to keep track of a simple list of numbers, which even if they personally view several thousand different items is still only going to be a meg or two of memory.
For performance reasons, you can even pre-fetch the next item as soon as the current item loads, so that it displays instantly and loads the next one in the background while they're looking at it. (jQuery .load() is your friend :-) )
If you expect a large number of items to be removed from the database (so that the highest number is not helpful), then you can instead generate a list of random ids, send that, and then request them one at a time. Pre-generate the random list, as it were.
Hope this helps! :-)

You could stick the "already seen" data in a session cookie. Selecting random SQL data is explained here

Related

Scraping ASPX after login with Python but every login gives you a different URL

I'm trying to get the exam result data from my college website for every Roll No. in my class.
Normally you can POST url (www.example.com/login.aspx)with login information, and GET a fixed url after login(www.example.com/home.aspx).
But the page I'm trying to get has a different URL for every Roll no. entered. The URL of login page look like this: "www.example.com/View.aspx". After login, the URL of the result page looks like: "www.example.com/ovengine.aspx?enc=BunchOfNumbersandAlphabets". And those numbers and alphabets are different for each roll number.
So I can't put a URL in my code to get the final result. I don't know how to get the page that comes automatically after the login, without mentioning it's URL.
But the page I'm trying to get has a different URL for every Roll no. entered
No, it is the same URL, and the URL has a parameter. You see this in URL's all the time.
So, for a temperature site it might look like
www.TheWeatherSite.com/?City=Rome
So, the above URL is always the same, but the web site "city" parameter is for the City of Rome. The web code behind can thus use/get/grab/consume that parameter in the code behind. That way we don't create a web page for EACH weather for each city.
so you create ONE page, and then and then PASS the web page a city value that the code behind can consume and use. (say query temperature data from a database for city = above value).
And thus you have to know ahead of time what city you want the weather for. Of course this approach is great since you don't have to create a new web site page to just show/display the weather in a given city.
You are in effect passing a value to some code behind that will run, and use that passed value.
The same goes for your example URL. You note there is ONE parameter called "enc".
So, the web site code behind would:
Grab, get, set the users ID. However, the users ID would be from the security system and the authentication provider. Unless you logged in as that particular user, then you not get that user id.
So, both a user ID (limited to the internal code).
And the "enc" value as the parameter in the URL you have would be required.
So, note in the above sql, we VERY likely need both a studentID and ALSO the "enc" value that some OTHER code from another page gets/grabs from the database.
Now that funny "GUID" (please do google what a GUID is), from a programmers point of view WOULD be sufficient to pull this one row of data from the database, but by ALSO using in the query the users logged on internal id?
Well, then only a given logged on user would be able to see their own set of values that belong to them.
In other words?
Only a drunken un-employed Rodeo clown would JUST require that GUID for pulling out that data. Since if that was the case, then any user could type in that GUID and see others peoples marks. However, there is "some" security by using a GUID, since a user could never guess that value.
If they used "city" like my first URL and parameter example? Then yes, you could guess and know the city value to type in. Or they could have used say student name, or even student number - those you COULD guess with relative ease.
But, for such data, no doubt the user adopted something MUCH more difficult then a starting number like a row number or PK id from a database. So, when the code added the results to that table? They also added a GUID of some type and saved that as a row in the database also.
So you NOT only need JUST the GUID, but that URL will ONLY work for a given pair of values. (the student ID - which is ONLY internal to the code and pulled FROM the authenticated provider. That was this line of code:
= Membership.GetUser.ProviderUserKey
So that above value is going to be the users logon internal ID.
The enc (external) exposed value in the web URL as a parameter, and ALSO the internal logged on value. So the code behind (asp.net) would look something like this:
Dim strSQL As String
strSQL = "SELECT * from tblStudentMarks where StudentID = #pID " &
" AND TestResultsGID = #GID"
Dim cmdSQL As New SqlCommand(strSQL, GetCon)
cmdSQL.Parameters.Add("#pID", SqlDbType.Int).Value = Membership.GetUser.ProviderUserKey
cmdSQL.Parameters.Add("#GID", SqlDbType.VarChar).Value = Request.QueryString("enc")
Dim dReader As New SqlDataAdapter(cmdSQL)
Dim rstData As DataTable
dReader.Fill(rstData)
Note the code:
Request.QueryString("enc")
That allows the code behind to get/grab the parameter (enc) from the URL. But, as I stated, it is high unlikely that JUST the "enc" number is required here. It is possible that ONLY this value is required to pull the data from the row, but then that would be a security hole the size of a open barn door.
Think of your on-line banking.
www.mybank.com/?CustomerNumber=1234
Well, if we JUST use the above CustomerNumber as the means to pull bank data, then I could go to the site and type in YOUR number, or someone's else's number.
So, for this to work?
You will need to obtain a list of enc values (that messy funny long string). Without that parameter then you not be able to set the parameter in the URL.
However, as I stated, you ALSO very likely need some internal "user" logon id that is NOT included in the public exposed URL to ALSO grab that one row of data from the database.
And, even more important? Such web pages usually cannot be hit UNLESS you are a logged in as an authenticated user. In other words that web page will ONLY be dished out to logged in users - if you not logged in, then the server security will automatic NOT dish out the web page unless you are logged in user.
So, for this to work, you need to contact the web site developers, and obtain that list of "enc" values. Once you have that list, then you can generate some code to process that list and insert the correct parameter in the URL. However, you also need to ask if that URL and parameter value will work for JUST you the logged in user, or if that this URL and parameter ONLY works for a give logged in user. Without these values, and without knowing if the URL and parameter will work for any user? (which I doubt it would), then just using a URL to get these values will not work.
It would be even BETTER to have the web site folks create a web service that you can call and in one command it would return all of the data you need anyway, as opposed to over and over having to send the "enc" value, which you don't have anyway.

Redirect loading old content?

a little bit of a weird issue. To simplify the problem for explanation purposes. A user goes onto /Total page and gets the count of the records in the datastore with "steve" as the name, at the moment there is 2, he can then presses increment and the user gets redirected to /Increment so another record is added into the nbd, the user is then once again redirected to /Total however it still shows 2! If he simply refreshes the page, it then shows 3. I assume it's because the redirect back to /Total happens before the entity is fully committed into the datastore? If not, here's the code, please let me know what's wrong. Thank you!
PYTHON:
#app.route("/Total", methods=['GET', 'POST'])
def total():
data = Logins.query(Logins.name == "steve").count()
return render_template('Total.html', count=count)
#app.route("/Increment", methods=['GET', 'POST'])
def incre():
new_data = oAuthLogins()
new_data.name = "steve"
new_data.put()
return redirect(url_for('total'))
Total.html:
{{count}}
<a href={{url_for('incre')}}> Increment! </a>
This can happen for a bunch of reasons: slow db update (eventually consistent), caching, etc.
But, is this a good user experience?
If a user sees a number (2) then presses increment, but that record has changed, is it better to show the user the number they expect (3), or the actual number (which may be much higher than 3 by the time they click increment)?
For UX, I would think the user should see the number 3. He/she can always see the latest number with another refresh.
With that in mind, you solve this by changing the /increment call to be done via AJAX and simply increment the number on the front end when you get a 200 response.
You're obtaining the count from a (Logins kind) query which is always eventually consistent unless it's an ancestor query.
You could display the correct count if you transform the method of getting it into a strongly consistent one, for example by storing it in a property of the user entity which you can always obtain by key lookup (or some other equivalent entity kind always in a 1-to-1 relationship with the user entity, see re-using an entity's ID for other entities of different kinds - sane idea?)
I have tried to replicate the issue and it worked for me. I am not sure if this is what you are looking for but here is my example code in GitHub.
When running the code follow these steps:
Navigating to localhost:8080 will print Hello World
Navigate to localhost:8080/add for the first time to enter the first value in entity. (Will add number 2)
Go to the Entities in the GCP page and get the id from the record
Replace that id with the one in the code ENTITY_KEY
Navigate to localhost:8080/get will print the number saved in the entity
Navigate to localhost:8080/update will get that number, add 1 to that number and will save the number again.
After that it will automatically redirect to localhost:8080/get and will show you the updated value in the entity

Keep form result in memory

The image above gives an example of what I hope to achieve with flask.
For now I have a list of tuples such as [(B,Q), (A,B,C), (T,R,E,P), (M,N)].
The list can be any length as well as the tuples. When I submit or pass my form, I receive the data one the server side, all good.
However now I am asked to remember the state of previously submited ad passed forms in order to go back to it and eventually modify the information.
What would be the best way to remember the state of the forms?
Python dictionary with the key being the form number as displayed at the bottom (1 to 4)
Store the result in an SQL table and query it every time I need to access a form again
Other ideas?
Notes: The raw data should be kept for max one day, however, the data are to be processed to generate meaningful information to be stored permanently. Hence, if a modification is made to the form, the final database should reflect it.
This will very much depend on how the application is built.
One option is to simply always return all the answers posted, with each request, but that won't work well if you have a lot of data.
Although you say that you need the data to be accessible for a day. So it seems reasonable to store it to a database. Performing select queries using the indexed key is rather insignificant for most cases.

Best way to prevent view counts from being abused

I'm currently using Redis to store a view count for each time a page is loaded. It's working fantastic but my only worry is that it will be abused. When a user is logged in on my site, the view counter will only update if they have not viewed the thread yet (again, tracked by Redis) so that abuse is negated.
My problem is with users I don't have an account for. If I were to let the view count be updated every time the page is loaded if someone created some content they could just log out and refresh the page as many times as they wanted to get their view count up. My first thought was identify every non-account user through a session cookie (I already track users logged in through a session cookie) but if someone cleared that cookie, it would be rendered useless again. Another thought is an IP address but through dynamic IP addresses it wouldn't be very reliable either.
So my question is, what is the most reliable way to track a view count from an unknown user?
Welcome to the world of tracking view counts!
Ill let you in on a couple of trade secrets.
What you want is probably a pixel tracker.
A pixel tracker is a service that serves a 1x1 pixel that does nothing except log the time that it is requested. So if you say you want to track someone visiting a thread, you could, make a sha512 hash of the thread name, sha512(thread_name) and get a hash. Then with this hash request a pixel from your webserver as follows
/px/<sha512hash>.gif
Then you can insert this request into a database, next time that page is rendered, you sha512 the title, you request that pixel, then you realise that this user has already been served a pixel for this title. You ignore the request, you do not increment the view counter.

Get records before and after current selection in Django query

It sounds like an odd one but it's a really simple idea. I'm trying to make a simple Flickr for a website I'm building. This specific problem comes when I want to show a single photo (from my Photo model) on the page but I also want to show the image before it in the stream and the image after it.
If I were only sorting these streams by date, or was only sorting by ID, that might be simpler... But I'm not. I want to allow the user to sort and filter by a whole variety of methods. The sorting is simple. I've done that and I have a result-set, containing 0-many Photos.
If I want a single Photo, I start off with that filtered/sorted/etc stream. From it I need to get the current Photo, the Photo before it and the Photo after it.
Here's what I'm looking at, at the moment.
prev = None
next = None
photo = None
for i in range(1, filtered_queryset.count()):
if filtered_queryset[i].pk = desired_pk:
if i>1: prev = filtered_queryset[i-1]
if i<filtered_queryset.count(): next = filtered_queryset[i+1]
photo = filtered_queryset[i]
break
It just seems disgustingly messy. And inefficient. Oh my lord, so inefficient. Can anybody improve on it though?
Django queries are late-binding, so it would be nice to make use of that though I guess that might be impossible given my horrible restrictions.
Edit: it occurs to me that I can just chuck in some SQL to re-filter queryset. If there's a way of selecting something with its two (or one, or zero) closest neighbours with SQL, I'd love to know!
You could try the following:
Evaluate the filtered/sorted queryset and get the list of photo ids, which you hold in the session. These ids all match the filter/sort criteria.
Keep the current index into this list in the session too, and update it when the user moves to the previous/next photo. Use this index to get the prev/current/next ids to use in showing the photos.
When the filtering/sorting criteria change, re-evaluate the list and set the current index to a suitable value (e.g. 0 for the first photo in the new list).
I see the following possibilities:
Your URL query parameters contain the sort/filtering information and some kind of 'item number', which is the item number within your filtered queryset. This is the simple case - previous and next are item number minus one and plus one respectively (plus some bounds checking)
You want the URL to be a permalink, and contain the photo primary key (or some unique ID). In this case, you are presumably storing the sorting/filtering in:
in the URL as query parameters. In this case you don't have true permalinks, and so you may as well stick the item number in the URL as well, getting you back to option 1.
hidden fields in the page, and using POSTs for links instead of normal links. In this case, stick the item number in the hidden fields as well.
session data/cookies. This will break if the user has two tabs open with different sorts/filtering applied, but that might be a limitation you don't mind - after all, you have envisaged that they will probably just be using one tab and clicking through the list. In this case, store the item number in the session as well. You might be able to do something clever to "namespace" the item number for the case where they have multiple tabs open.
In short, store the item number wherever you are storing the filtering/sorting information.

Categories