Is using UUId as pk a good idea in microservices? - python

I am working on a microservice project which contains 4 services developed in django i use dj rest auth to handle login and register process , each service has its own database and the information of users are kept in account service and other 3 services get the users information via an api request to account service , in each service i have only access to logged in user pk (dj rest auth handles this) and when i need to save a record for example location of logged in user ,i save a user object which only has pk alongside other info so the record in db will be like this :
user=request.user(which saves logged in user but i only see the pk)
lat = latitue number
lng = longitude number
everything is fine but if i loose the database of account service and restore the backup and some how the records generate a different pk (for example before restoring backup some new records would be added) from the ones saved in other services which makes a huge problem in all services. the solution i tried is to change the pk to uuid filed but is it a good idea? or maybe it is better to add a uuid filed to user model in account database and in other services i saves this uuid alongside the user's pk ?

The answers to this question may be subjective to different perspectives.
Here is my view on this:
There should be an id field of type INT which is a primary key that can auto-increment. Alongside that, you can add a UUID field, let's say uid.
Advantages:
Using id as a primary key makes your schema consistent with the rest of the database tables.
You can use the id field as a foreign key and this will take up less space than UUID.
In the public URLs you can use uid field and this does not expose guessable information. For eg, if you use and id, and in URL the resource id is 5, then the attacker can guess that there might be a resource with an id 6, 7. But using uid field which is UUID field, you are not exposing information related to the database.

Related

Should I add third party data to web application database?

I have a Django application that users fill in some background information, such as "graduated university". These fields are optional, so for some users, they can be empty.
Given that, I have another service that crawls the web for the missing information. This service is completely decoupled from the Django application. It has its own schedule and saves the scraped data to S3 as JSON periodically.
The Django application has admin pages that summarize user information. For these pages, I need to use the real application data which is stored in the application database, as well as the scraped data that resides in S3.
Currently, I have a Django model named ScrapedUser that has a JSON field named data. To populate this model, I manually sync it with the data in S3. (Download files from S3 and create a ScrapedUser instance with it etc.)
My question is, to use those different data sources together, should I populate my real user data in the application database with the third party data that I scraped from the web?
In other words, I wonder if it would be better to map scraped information at ScrapedUser to the real User model.
To better illustrate it, here is a mocked version of the architecture:
I have a standard User model and a ScrapedUser.
# models.py
class User(AbstractUser):
...
university = CharField(...)
class ScrapedUser(Model):
user = ForeignKey(to="User", ...)
data = JSONField(...)
They look like this in the database
User
id
university
1
Harvard
2
NULL
ScrapedUser
user_id
data
2
{"university": "UC Berkeley", ...}
The final report I would like to see in the admin page
user_id
university
1
Harvard
2
UC Berkeley
At the end, should I keep these tables separate and use Django QuerySet features to merge them on view side? Or replace NULL fields in User.university with ScrapedUser.data['university']?
Yes, you will have to add the third party information to your database. If you don't, and the data remains resident in s3, which means every time you need to run a repost, the admin would have to fetch from s3 and process it.
In addition, instead of storing the information from s3 as a single field in ScrapedUser (you have stored it as 'data'), I would 'unpack' that information into fields in the ScrapedUser model.
class ScrapedUser(Model):
user = ForeignKey(to="User", related_name="scrapped_info")
university = CharField(...)
twitter = CharField(...)
facebook = CharField(...)
...
This would make the processing of the information much easier at a later date.
So given the related_name (as shown above), you can easily fetch the information like this:
user = User.objects.create()
user.scrapped_info.university
This displays the university information that was scraped.

Pact-python how to test a get request having UUID in URL

I am using pact-python (0.10.0). I want to make a request to a provider with an entity id:
/entity/6000d04d-d5d6-4a5f-81d3-7d8a72b46174
but this (6000d04d-d5d6-4a5f-81d3-7d8a72b46174) should exist in the database.
what'd be the better solution:
Creating a provider state with the data present in it (but how will the provider verifier work? shouldn't the contract be having the id that's present in real provider?)
Query for all the id's in the database and pick one for making the request (for this I need to somehow update and publish pact with the fetched id)
Or is there any better solution available that i might have missed?
You should create a provider state, given entity 6000d04d-d5d6-4a5f-81d3-7d8a72b46174 exists that will set up the entity with the correct UID before the interaction is replayed.
To use contract testing to its fullest potential, you need to be able to control the data in the provider for each interaction. If you can't, then contract tests are not a good fit for your problem space. Have a read of https://docs.pact.io/documentation/provider_states.html and https://github.com/pact-foundation/pact-ruby/wiki/Why-Pact-may-not-be-the-best-tool-for-testing-public-APIs

how to give some unique id to each anonymous user in django

I want to create a table(postgres) that stores data about what items were viewed by what user. authenticated users are no problem but how can I tell one anonymous user from another anonymous user? This is needed for analysis purposes.
maybe store their IP address as unique ID? How can I do this?
I think you should use cookies.
When a user that is not authenticated makes a request, look for a cookie named whatever ("nonuserid" in this case). If the cookie is not present it means it's a new user so you should set the cookie with a random id. If it's present you can use the id in it to identificate the anonymous user.
Option 1
Use the IP Address. Checkout this answer. There is no need to write the IP into the session because you could always get the client IP when you receive a request.
Option 2
Generate the unique ID by the uuid, as the doc said. And set the ID in the session with a given name, suppose it USER_ID.
When you get the request from a user, check if the USER_ID in session. If so, read the value of it and write a record to database like user id visit page X. If not, generate and set.

Custom properties not saved correctly for Expando models in repeated StructuredProperty

I am trying to use an Expando model as a repeated StructuredProperty in another model. Namely, I would like to add an indefinite number of Accounts to my User model. As Accounts can have different properties depending on their types (Accounts are references to social network accounts, and for example Twitter requires more information than Facebook for its OAuth process), I have designed my Account model as an Expando. I've added all basic information in the model definition, but I plan to add custom properties for specific social networks (e.g., a specific access_token_secret property for Twitter).
1/ Can you confirm the following design (Expando in repeated StructuredProperty) should work?
class Account(ndb.Expando):
account_type = ndb.StringProperty(required=True, choices=['fb', 'tw', 'li'])
account_id = ndb.StringProperty()
state = ndb.StringProperty()
access_token = ndb.StringProperty()
class HUser(User):
email = ndb.StringProperty(required=True, validator=validate_email)
created = ndb.DateTimeProperty(auto_now_add=True)
accounts = ndb.StructuredProperty(Account, repeated=True)
2/ Now the problem I am facing is: when I add a Facebook account to my HUser instance, everything works fine ; however the problem rises when I append a Twitter account to that same instance, and add a new property not declared in the model, like that:
for account in huser.accounts:
if account.state == "state_we_re_looking_for" and account.account_type == 'tw':
# we found the appropriate Twitter account reference
account.access_token_secret = "..." # store the access token secret fetched from Twitter API
huser.put() # save to the Datastore
break
This operation is supposed to save the access token secret in the Twitter Account instance of my User, but in fact it saves it in the Facebook Account instance (at index 0)!
What am I doing wrong?
Thanks.
This is a fundamental problem with how ndb stores the StructuredProperty. Datastore does not currently have a way to store this, so ndb basically explodes your properties.
For example, consider the entity:
HUser(email='test#example.com'.
accounts=(Account(type='fb',
account_id='1',
state='1',
access_token='1'),
Account(type='tw',
account_id='2',
state='2',
access_token='2',
access_token_secret='2')))
This will actually get stored in an entity that looks like:
{
email : 'test#example.com',
accounts.type : ['fb', 'tw'],
accounts.account_id : ['1', '2'],
accounts.state : ['1', '2'],
accounts.access_token : ['1', '2'],
accounts.access_token_secret : ['2']
}
Because you are using an ndb.Expando, ndb doesn't know that it should populate the access_token_secret field with a None for the facebook account. When ndb repopulates your entities, it will fill in the access_token_secret for the first account it sees, which is the facebook account.
Restructuring your data sounds like the right way to go about this, but you may want to make your HUser an ancestor of the Account for that HUser so that you query for a user's accounts using strong consistency.
From what I understand, it seems like App Engine NDB does not support Expando entities containing Expando entities themselves.
One thing that I didn't realize at first is that my HUser model inherits from Google's User class, which is precisely an Expando model!
So without even knowing it, I was trying to put a repeated StructuredProperty of Expando objects inside another Expando, which seemingly is not supported (I didn't find anything clearly written on this limitation, however).
The solution is to design the data model in a different way. I put my Account objects in a separate entity kind (and this time, they are truly Expando objects!), and I've added a KeyProperty to reference the HUser entity. This involves more read/write ops, but the code is actually much simpler to read now...
I'll mark my own question as answered, unless someone has another interesting input regarding the limitation found here.

Modify a Google App Engine entity id?

I'm using Google App Engine NDB. Sometimes I will want to get all users with a phone number in a specified list. Using queries is extremely expensive for this, so I thought I'll just make the id value of the User entity the phone number of the user so I can fetch directly by ids.
The problem is that the phone number field is optional, so initially a User entity is created without a phone number, and thus no value for id. So it would be created user = User() as opposed to user = User(id = phone_number).
So when a user at a later point decides to add a phone number to his account, is there anyway to modify that User entity's id value to the new phone number?
The entity ID forms part of the primary key for the entity, so there's no way to change it. Changing it is identical to creating a new entity with the new key and deleting the old one - which is one thing you can do, if you want.
A better solution would be to create a PhoneNumber kind that provides a reference to the associated User, allowing you to do lookups with get operations, but not requiring every user to have exactly one phone number.

Categories