Should I add third party data to web application database? - python

I have a Django application that users fill in some background information, such as "graduated university". These fields are optional, so for some users, they can be empty.
Given that, I have another service that crawls the web for the missing information. This service is completely decoupled from the Django application. It has its own schedule and saves the scraped data to S3 as JSON periodically.
The Django application has admin pages that summarize user information. For these pages, I need to use the real application data which is stored in the application database, as well as the scraped data that resides in S3.
Currently, I have a Django model named ScrapedUser that has a JSON field named data. To populate this model, I manually sync it with the data in S3. (Download files from S3 and create a ScrapedUser instance with it etc.)
My question is, to use those different data sources together, should I populate my real user data in the application database with the third party data that I scraped from the web?
In other words, I wonder if it would be better to map scraped information at ScrapedUser to the real User model.
To better illustrate it, here is a mocked version of the architecture:
I have a standard User model and a ScrapedUser.
# models.py
class User(AbstractUser):
...
university = CharField(...)
class ScrapedUser(Model):
user = ForeignKey(to="User", ...)
data = JSONField(...)
They look like this in the database
User
id
university
1
Harvard
2
NULL
ScrapedUser
user_id
data
2
{"university": "UC Berkeley", ...}
The final report I would like to see in the admin page
user_id
university
1
Harvard
2
UC Berkeley
At the end, should I keep these tables separate and use Django QuerySet features to merge them on view side? Or replace NULL fields in User.university with ScrapedUser.data['university']?

Yes, you will have to add the third party information to your database. If you don't, and the data remains resident in s3, which means every time you need to run a repost, the admin would have to fetch from s3 and process it.
In addition, instead of storing the information from s3 as a single field in ScrapedUser (you have stored it as 'data'), I would 'unpack' that information into fields in the ScrapedUser model.
class ScrapedUser(Model):
user = ForeignKey(to="User", related_name="scrapped_info")
university = CharField(...)
twitter = CharField(...)
facebook = CharField(...)
...
This would make the processing of the information much easier at a later date.
So given the related_name (as shown above), you can easily fetch the information like this:
user = User.objects.create()
user.scrapped_info.university
This displays the university information that was scraped.

Related

Is using UUId as pk a good idea in microservices?

I am working on a microservice project which contains 4 services developed in django i use dj rest auth to handle login and register process , each service has its own database and the information of users are kept in account service and other 3 services get the users information via an api request to account service , in each service i have only access to logged in user pk (dj rest auth handles this) and when i need to save a record for example location of logged in user ,i save a user object which only has pk alongside other info so the record in db will be like this :
user=request.user(which saves logged in user but i only see the pk)
lat = latitue number
lng = longitude number
everything is fine but if i loose the database of account service and restore the backup and some how the records generate a different pk (for example before restoring backup some new records would be added) from the ones saved in other services which makes a huge problem in all services. the solution i tried is to change the pk to uuid filed but is it a good idea? or maybe it is better to add a uuid filed to user model in account database and in other services i saves this uuid alongside the user's pk ?
The answers to this question may be subjective to different perspectives.
Here is my view on this:
There should be an id field of type INT which is a primary key that can auto-increment. Alongside that, you can add a UUID field, let's say uid.
Advantages:
Using id as a primary key makes your schema consistent with the rest of the database tables.
You can use the id field as a foreign key and this will take up less space than UUID.
In the public URLs you can use uid field and this does not expose guessable information. For eg, if you use and id, and in URL the resource id is 5, then the attacker can guess that there might be a resource with an id 6, 7. But using uid field which is UUID field, you are not exposing information related to the database.

Is there a way to analyse single user data in google analytics?

I'm trying to analyze (for business intelligence purpose) some google analytics data in python.
All I get after many tutorials are "aggregated" data... like the number of views in a day the thing I need instead is something capable of tracking the behavior of a single user.. like what page of the web site he visited, his bounce rate if he used the e-commerce and so on.
I saw many CSV already prepared for such analysis but I'm starting from scratch with my web site.
You can use the User-ID feature, when you send Analytics an ID and related data from multiple sessions, your reports tell a more unified, holistic story about a user’s relationship with your business:
https://support.google.com/analytics/answer/3123662?hl=en
Otherwise, you can examine individual-user behavior at the session level in User Explorer report. The User Explorer report lets you isolate and examine individual rather than aggregate user behavior. Individual user behavior is associated with either Client ID or User ID.
https://support.google.com/analytics/answer/6339208?hl=en

How to store and access user data in database in Django?

I'm a newbie in Django and I'm building this web app that allows three different types of users to login. A customer, operator and an accountant. When the customer logs in, he is asked to upload two jpeg documents. When he is done, these documents will be converted into editable text(I'm using Google's Tesseract engine for Optical character recognition for this) and this data is stored in three columns. The first two columns are non editable but the third is editable. In the third column, the user makes changes if the converted text has any errors(since OCR is not 100 % accurate).
At this point an email has to be sent to the operator. The operator logs in and checks whether the customer has uploaded the the right documents or not. If there are any errors, he edits them and hits the save button. At this stage an email is sent to the accountant and he logs in to verify the data for the second time. If he confirms, an email is sent to the customer saying his documents have been verified.
As of now, my app is taking an image and converting it into editable text and displaying it in an HTML template. I need to know how to store this text in a table of three columns and make it available for the operator and accountant to edit. And also, I need to know how to make three different types of logins for three different users.
Please help. I will really appreciate it.
You could've edited your question better but still, I'll try to answer as much as I understood:
Firstly let's start with the login. So, what you want is role-based login which you can easily achieve through Django auth_user and user_group. In this, you'll create a user through Django built-in auth system (django authentication) and after this assign a group to every user you create so that when you log in a user you can redirect him accordingly.
Next, you mentioned that you wanted to save data in DB. For that, you'll need to connect a DB through Django settings (my preference PostgreSQL) and then you have to create models according to your need (django models).
Lastly, for data read and write operations in DB you can look at Django ORM (django ORM)

Use django to expose python functions on the web

I have not worked with Django seriously and my only experience is the tutorials on their site.
I am trying to write my own application now, and what I want is to have some sort of API. My idea is that I will later be able to use it with a client written in any other language.
I have the simplest of all apps, a model that has a name and surname field.
So the idea is that I can now write an app lets say in c++ that will send two strings to my Django app so they can be saved in the database as name, surname respectively.
What I know until now is to create a form so a user can enter that information, or have the information in the url, and of curse adding them myself from the admin menu.
What I want though is some other better way, maybe creating a packet that contains that data. Later my client sends this data to my Django webpage and it will extract the info and save it as needed. But I do not know how to do this.
If my suggested method is a good idea, then I would like an example of how this is done. If not the I would like suggestions for possible things I could try out.
Typically, as stated by #DanielRoseman, you certainly want to:
Create a REST API to get data from another web site
Get data, typically in JSON or XML, that will contain all the required data (name and surname)
In the REST controller, Convert this data to the Model and save the Model to the database
Send an answer.
More information here: http://www.django-rest-framework.org/

Where does a Facebook user likes have been stored when using django_facebook and django-celery?

I am developing a Facebook canvas application.
I'm a little bit confused on the process of storing a user's likes using the frameworks django_facebook and CELERY.
I have set FACEBOOK_CELERY_STORE = True in my settings.py.
and add app djcelery in my INSTALLED_APPS
#facebook_required(canvas=True)
def home(request,graph):
facebook = FacebookUserConverter(graph)
print "facebooklikes",facebook.get_likes() //This lists out all the likes of users
Where does these user's likes stored? There are many celery tables in my MySQL database. But not any of them have stored these data.
Likes are not stored with this method, and Celery is not used for intermediate data processing. Data is requested from Facebook API and returned to you.
Likes can be stored with get_and_store_likes or store_likes methods, where Celery is used for asyncronous calls and inbetween data storage. At the end of a call, likes will be stored in FacebookLike model. Likes will be stored one record per a user like, from user_id to facebook_id field.
As a consequence, a table you are looking for is named django_facebook_facebooklike.
I found django-facebook package poorly documented, so no link to the docs. One can consult source code for details.

Categories