Pymongo - Search and leave duplicates fast

Pymongo - Search and leave duplicates fast - python

I have some data in my MongoDB - a lot of data.
And I want to add new data with pymongo, but dont append data which already exists in DB based on phone_number. I'm using this code, but its only a loop in loop, so its slow as hell...
connected_db = self.main_mongo_terema # Connect to DB
data_to_insert = []
try:
collection = connected_db['terema_data'] # Connect to Collection
except:
self.log("Cant connect to DB")
try:
for dicts_ in data: # list of dicts
for key, val in dicts_.items():
if key == "phone_number":
match = collection.find({}, {"phone_number": val })
for x in match:
if x:
continue
else:
data_to_insert.append(dicts_)
except Exception as e:
self.log(f"Loop problem - {str(e)}")
try:
collection.insert_many(data_to_insert)
# Empty list
except TypeError:
pass
Something like this:
dict_ = [{'phone_number':0123, 'col1':'abc'}, {'phone_number':'456', 'col1':'def'}]
When dic['phone_number']: 'value' from dict_ not in DB add dic to DB, else do nothing

You are using three nested loops, so in simple words time complexity will be O(n^3), which will be like hell, if there is more data.
You can speed up the insert operation with bulk_write(). $setOnInsert operator will be helpful here.
from pymongo import UpdateOne
requests = []
try:
for dict_ in data:
requests.append(UpdateOne({'phone_number': dict_['phone_number']},
{
'$setOnInsert': {
'field1': dict_['field1'],
'field2': dict_['field2'],
# Rest of the fields.
}
},
upsert=True))
db.collection.bulk_write(requests)
except:
self.log(f"Loop problem - {str(e)}")

Create an unique index so that you do not have to check whether the number exists or not.
Create a index by the following command:
collection.createIndex( { "phone_number": 1 }, { unique: true } )
PS: You only need to do this once. After that all the phone_number(s) in the database will always be unique.
After that you can insert the data object directly. You do not need to check if the value is unique or not, mongodb will handle that for you.
collection.insert_many(data)

Related

return populated fields from a join table in SQLalchemy Flask

UserService is a join table connecting Users and Services tables. I have a query that returns all the tables that have a user_id = to the id passed in the route.
#bp.route('/user/<id>/services', methods=['GET'])
def get_services_from_user(id):
user = db_session.query(User).filter(id == User.id).first()
if not user:
return jsonify({'Message': f'User with id: {id} does not exist', 'Status': 404})
user_services = db_session.query(UserService).filter(user.id == UserService.user_id).all()
result = user_services_schema.dump(user_services)
for service in result:
user_service = db_session.query(Service).filter(service['service_id'] == Service.id).all()
result = services_schema.dump(user_service)
return jsonify(result)
result holds a list that looks as such:
[
{
"id": 1,
"service_id": 1,
"user_id": 1
},
{
"id": 2,
"service_id": 2,
"user_id": 1
}
]
how could I then continue this query or add another query to get all the actual populated services (Service class) instead of just the service_id and return all of them in a list? The for loop is my attempt at that but currently failing. I am only getting back a list with one populated service, and not the second.

You could try something like this:
userServies = db_session.query(Users, Services).filter(Users.id == Services.id).all()
userServices would be an iterable. You should use:
for value, index in userServices:
to iterate through it. (Could be index, value I'm not 100% sure of the order)
There is another way using .join() and adding the columns that you need with .add_columns().
There is also another way using
db_session.query(Users.id, Services.id, ... 'all the columns that you need' ...).filter(Users.id == Services.id).all()

How to write multiple items in dynamo db using s3?

So I'm trying to insert multiple items in a dynamodb table. I'm reading my data from a csv file and everything's going write(My logs in aws cloudwatch shows me that I'm correctly extracting my data from the csv file).
I've first try in a loop to write each element in the table like this:
for item in Items:
response = dynamodb.put_item(
TableName = 'some_table_name',
Item = item)
For this syntax I'm using the dynamodb client like this:
dynamodb = boto3.client('dynamodb', region_name=region)
After this attempt, I've tried to used to batch write in a loop like this:
for item in Items:
response = dynamodb.batch_write_item(
RequestItems={
'some_table': [
{
'PutRequest': {
'Item':item
}
}
]
}
)
print('Successfully uploaded to DynamoDB')
And I'm still using the dynamodb client.
After those two attempts, I've tried the same functions with the dynamodb resource (dynamodb = boto3.client('dynamodb', region_name=region)).
The problem is that my list Items has 42 items, and all those tentatives put different number of items (25,27,29) but never 42. So where am I doing wrong? Can u guys help me please?

Try implementing retry mechanism for your bachwrite
result, batchError := svc.DynamoDBBatchWrite(input)
if result != nil && len(result.UnprocessedItems) > 0 {
input = &dynamodb.BatchWriteItemInput{
RequestItems: result.UnprocessedItems,
}
}
and then retry till the result.UnprocessedItems == 0

How to check if key is present in nested list in json?

I have a JSON file where each object looks like the following example:
[
{
"timestamp": 1569177699,
"attachments": [
],
"data": [
{
"post": "\u00f0\u009f\u0096\u00a4\u00f0\u009f\u0092\u0099"
},
{
"update_timestamp": 1569177699
}
],
"title": "firstName LastName"
}
]
I want to check if, there is the key post, nested within the key data. I wrote this, but it doesn't work:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
if 'post' in post['data']
print post['data']['post']

Here is my solution. I see from your sample data that post["data"] is a list, so the program should iterate over it:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
#THIS IS THE NEW LINE to iterate list
for d in post["data"]:
if 'post' in d:
print d['post']

Try:
posts = json.loads(open(file).read())
for data in posts:
for key, value in data.items():
if key == 'data':
for item in value:
if 'post' in item:
print(key, item['post'])

Try this answer this works!
Elegant way to check if a nested key exists in a python dict
def keys_exists(element, *keys):
'''
Check if *keys (nested) exists in `element` (dict).
'''
if not isinstance(element, dict):
raise AttributeError('keys_exists() expects dict as first argument.')
if len(keys) == 0:
raise AttributeError('keys_exists() expects at least two arguments, one given.')
_element = element
for key in keys:
try:
_element = _element[key]
except KeyError:
return False
return True

You could do it generically by adapting my answer to the question How to find a particular json value by key?.
It's generic in the sense that it doesn't care much about the details of how the JSON data is structured, it just checks every dictionary it finds inside it.
import json
def find_values(id, json_file):
results = []
def _decode_dict(a_dict):
try:
results.append(a_dict[id])
except KeyError:
pass
return a_dict
json.load(json_file, object_hook=_decode_dict) # Return value ignored.
return len(results) > 0 # If there are any results, id was found.
with open('find_key_test.json', 'r') as json_file:
print(find_values('post', json_file)) # -> True

please try the following:
posts = json.loads(open(file).read())
for post in posts:
if 'data' in post:
for data in post['data']:
if 'post' in data:
print(data['post'])

Is there a dynamic query builder for Flask using Sqlalchemy?

A simple query looks like this
User.query.filter(User.name == 'admin')
In my code, I need to check the parameters that are being passed and then filter the results from the database based on the parameter.
For example, if the User table contains columns like username, location and email, the request parameter can contain either one of them or can have combination of columns. Instead of checking each parameter as shown below and chaining the filter, I'd like to create one dynamic query string which can be passed to one filter and can get the results back. I'd like to create a separate function which will evaluate all parameters and will generate a query string. Once the query string is generated, I can pass that query string object and get the desired result. I want to avoid using RAW SQL query as it defeats the purpose of using ORM.
if location:
User.query.filter(User.name == 'admin', User.location == location)
elif email:
User.query.filter(User.email == email)

You can apply filter to the query repeatedly:
query = User.query
if location:
query = query.filter(User.location == location)
if email:
query = query.filter(User.email == email)
If you only need exact matches, there’s also filter_by:
criteria = {}
# If you already have a dict, there are easier ways to get a subset of its keys
if location: criteria['location'] = location
if email: criteria['email'] = email
query = User.query.filter_by(**criteria)
If you don’t like those for some reason, the best I can offer is this:
from sqlalchemy.sql.expression import and_
def get_query(table, lookups, form_data):
conditions = [
getattr(table, field_name) == form_data[field_name]
for field_name in lookups if form_data[field_name]
]
return table.query.filter(and_(*conditions))
get_query(User, ['location', 'email', ...], form_data)

Late to write an answer but if anyone is looking for the answer then sqlalchemy-json-querybuilder can be useful. It can be installed as -
pip install sqlalchemy-json-querybuilder
e.g.
filter_by = [{
"field_name": "SomeModel.field1",
"field_value": "somevalue",
"operator": "contains"
}]
order_by = ['-SomeModel.field2']
results = Search(session, "pkg.models", (SomeModel,), filter_by=filter_by,order_by=order_by, page=1, per_page=5).results

https://github.com/kolypto/py-mongosql/
MongoSQL is a query builder that uses JSON as the input.
Capable of:
Choosing which columns to load
Loading relationships
Filtering using complex conditions
Ordering
Pagination
Example:
{
project: ['id', 'name'], // Only fetch these columns
sort: ['age+'], // Sort by age, ascending
filter: {
// Filter condition
sex: 'female', // Girls
age: { $gte: 18 }, // Age >= 18
},
join: ['user_profile'], // Load the 'user_profile' relationship
limit: 100, // Display 100 per page
skip: 10, // Skip first 10 rows
}

How to decode dataTables Editor form in python flask?

I have a flask application which is receiving a request from dataTables Editor. Upon receipt at the server, request.form looks like (e.g.)
ImmutableMultiDict([('data[59282][gender]', u'M'), ('data[59282][hometown]', u''),
('data[59282][disposition]', u''), ('data[59282][id]', u'59282'),
('data[59282][resultname]', u'Joe Doe'), ('data[59282][confirm]', 'true'),
('data[59282][age]', u'27'), ('data[59282][place]', u'3'), ('action', u'remove'),
('data[59282][runnerid]', u''), ('data[59282][time]', u'29:49'),
('data[59282][club]', u'')])
I am thinking to use something similar to this really ugly code to decode it. Is there a better way?
from collections import defaultdict
# request.form comes in multidict [('data[id][field]',value), ...]
# so we need to exec this string to turn into python data structure
data = defaultdict(lambda: {}) # default is empty dict
# need to define text for each field to be received in data[id][field]
age = 'age'
club = 'club'
confirm = 'confirm'
disposition = 'disposition'
gender = 'gender'
hometown = 'hometown'
id = 'id'
place = 'place'
resultname = 'resultname'
runnerid = 'runnerid'
time = 'time'
# fill in data[id][field] = value
for formkey in request.form.keys():
exec '{} = {}'.format(d,repr(request.form[formkey]))

This question has an accepted answer and is a bit old but since the DataTable module seems being pretty popular among jQuery community still, I believe this approach may be useful for someone else. I've just wrote a simple parsing function based on regular expression and dpath module, though it appears not to be quite reliable module. The snippet may be not very straightforward due to an exception-relied fragment, but it was only one way to prevent dpath from trying to resolve strings as integer indices I found.
import re, dpath.util
rxsKey = r'(?P<key>[^\W\[\]]+)'
rxsEntry = r'(?P<primaryKey>[^\W]+)(?P<secondaryKeys>(\[' \
+ rxsKey \
+ r'\])*)\W*'
rxKey = re.compile(rxsKey)
rxEntry = re.compile(rxsEntry)
def form2dict( frmDct ):
res = {}
for k, v in frmDct.iteritems():
m = rxEntry.match( k )
if not m: continue
mdct = m.groupdict()
if not 'secondaryKeys' in mdct.keys():
res[mdct['primaryKey']] = v
else:
fullPath = [mdct['primaryKey']]
for sk in re.finditer( rxKey, mdct['secondaryKeys'] ):
k = sk.groupdict()['key']
try:
dpath.util.get(res, fullPath)
except KeyError:
dpath.util.new(res, fullPath, [] if k.isdigit() else {})
fullPath.append(int(k) if k.isdigit() else k)
dpath.util.new(res, fullPath, v)
return res
The practical usage is based on native flask request.form.to_dict() method:
# ... somewhere in a view code
pars = form2dict(request.form.to_dict())
The output structure includes both, dictionary and lists, as one could expect. E.g.:
# A little test:
rs = jQDT_form2dict( {
'columns[2][search][regex]' : False,
'columns[2][search][value]' : None,
'columns[2][search][regex]' : False,
} )
generates:
{
"columns": [
null,
null,
{
"search": {
"regex": false,
"value": null
}
}
]
}
Update: to handle lists as dictionaries (in more efficient way) one may simplify this snippet with following block at else part of if clause:
# ...
else:
fullPathStr = mdct['primaryKey']
for sk in re.finditer( rxKey, mdct['secondaryKeys'] ):
fullPathStr += '/' + sk.groupdict()['key']
dpath.util.new(res, fullPathStr, v)

I decided on a way that is more secure than using exec:
from collections import defaultdict
def get_request_data(form):
'''
return dict list with data from request.form
:param form: MultiDict from `request.form`
:rtype: {id1: {field1:val1, ...}, ...} [fieldn and valn are strings]
'''
# request.form comes in multidict [('data[id][field]',value), ...]
# fill in id field automatically
data = defaultdict(lambda: {})
# fill in data[id][field] = value
for formkey in form.keys():
if formkey == 'action': continue
datapart,idpart,fieldpart = formkey.split('[')
if datapart != 'data': raise ParameterError, "invalid input in request: {}".format(formkey)
idvalue = int(idpart[0:-1])
fieldname = fieldpart[0:-1]
data[idvalue][fieldname] = form[formkey]
# return decoded result
return data

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pymongo - Search and leave duplicates fast - python

Related

return populated fields from a join table in SQLalchemy Flask

How to write multiple items in dynamo db using s3?

How to check if key is present in nested list in json?

Is there a dynamic query builder for Flask using Sqlalchemy?

How to decode dataTables Editor form in python flask?

Categories

Resources