how to map the string using python - python

this is my code :
a= ''' ddwqqf{x}'''
def b():
...
c=b(a,{'x':'!!!!!'})
print c
i want to get ddwqqf!!!!! ,
so how to create the b function ,
thanks
updated:
but how to do this thing :
a= ''' ddwqqf{x},{'a':'aaaa'}'''
c = a.format(x="!!!!!")
d= open('a.txt','a')
d.write(c)
it show error :
Traceback (most recent call last):
File "d.py", line 8, in <module>
c = a.format(x="!!!!!")
KeyError: "'a'"
updated2:
this is the string:
'''
{
'skill': {x_1},
'power': {x_2},
'magic': {x_3},
'level': {x_4},
'weapon': {
0 : {
'item': {
'weight': 40,
'target': 1,
'defence': 100,
'name': u'\uff75\uff70\uff78\uff7f\uff70\uff84',
'attack': 100,
'type': 1
},
},
1 : {
'item': {
'weight': 40,
'target': 1,
'defence': 100,
'name': u'\uff75\uff70\uff78\uff7f\uff70\uff84',
'attack': 100,
'type': 1
},
},
2 : {
'item': {
'weight': 40,
'target': 1,
'defence': 100,
'name': u'\uff75\uff70\uff78\uff7f\uff70\uff84',
'attack': 100,
'type': 1
},
}
......
}
}
'''

Try
def b(a, d):
return a.format(**d)
This works in Python 2.6 or above. Of course you would not need to define a function for this:
a = " ddwqqf{x}"
c = a.format(x="!!!!!")
will be enough.
Edit regarding your update:
a = " ddwqqf{x},{{'a':'aaaa'}}"
to avoid substitutions for the second pair of braces.
Another Edit: I don't really know where your string comes from and what's the context of all this. One solution might be
import re
d = {"x_1": "1", "x_2": "2", "x_3": "3", "x_4": "4"}
re.sub(r"\{([a-z_0-9]+)\}", lambda m: d[m.group(1)], s)
where s is your string.

Related

Elasticsearch - How to create buckets by using information from two fields at the same time?

My documents are like this:
{'start': 0, 'stop': 3, 'val': 3}
{'start': 2, 'stop': 4, 'val': 1}
{'start': 5, 'stop': 6, 'val': 4}
We can imagine that each document occupies the x-coordinates from 'start' to 'stop',
and has a certain value 'val' ('start' < 'stop' is guaranteed).
The goal is to plot a line showing the sum of these values 'val' from all the
documents which occupy an x-coordinate:
this graph online
In reality there are many documents with many different 'start' and 'stop' coordinates. Speed is important, so:
Is this possible to do with at most a couple of elastic search requests? how?
What I've tried:
With one elastic search request we can get the min_start, and max_stop coordinates. These will be the boundaries of x.
Then we divide the x-coordinates into N intervals, and in a loop for each interval we make an elastic search request: to filter out all the documents which lie completely outside of this interval, and do a sum aggregation of 'val'.
This approach takes too much time because there are N+1 requests, and if we want to have a line with higher precision, the time will increase linearly.
Code:
N = 300 # number of intervals along x
x = []
y = []
data = es.search(index='index_name',
body={
'aggs': {
'min_start': {'min': {'field': 'start'}},
'max_stop': {'max': {'field': 'stop'}}
}
})
min_x = data['aggregations']['min_start']['value']
max_x = data['aggregations']['max_stop']['value']
x_from = min_x
x_step = (max_x - min_x) / N
for _ in range(N):
x_to = x_from + x_step
data = es.search(
index='index_name',
body= {
'size': 0, # to not return any actual documents
'query': {
'bool': {
'should': [
# start is in the current x-interval:
{'bool': {'must': [
{'range': {'start': {'gte': x_from}}},
{'range': {'start': {'lte': x_to}}}
]}},
# stop is in the current x-interval:
{'bool': {'must': [
{'range': {'stop': {'gte': x_from}}},
{'range': {'stop': {'lte': x_to}}}
]}},
# current x-interval is inside start--stop
{'bool': {'must': [
{'range': {'start': {'lte': x_from}}},
{'range': {'stop': {'gte': x_to}}}
]}}
],
'minimum_should_match': 1 # at least 1 of these 3 conditions should match
}
},
'aggs': {
'vals_sum': {'sum': {'field': 'val'}}
}
}
)
# Append info to the lists:
x.append(x_from)
y.append(data['aggregations']['vals_sum']['value'])
# Next x-interval:
x_from = x_to
from matplotlib import pyplot as plt
plt.plot(x, y)
The right way to do this in one single query is to use the range field type (available since 5.2) instead of using two fields start and stop and reimplementing the same logic. Like this:
PUT test
{
"mappings": {
"properties": {
"range": {
"type": "integer_range"
},
"val": {
"type":"integer"
}
}
}
}
Your documents would look like this:
{
"range" : {
"gte" : 0,
"lt" : 3
},
"val" : 3
}
And then the query would simply leverage an histogram aggregation like this:
POST test/_search
{
"size": 0,
"aggs": {
"histo": {
"histogram": {
"field": "range",
"interval": 1
},
"aggs": {
"total": {
"sum": {
"field": "val"
}
}
}
}
}
}
And the results are as expected: 3, 3, 4, 1, 0, 4

Conditional Parameter inside dictionary Python

From my dataframe here:
OrigC OrigZ OrigN Weigh DestC DestZ DestN Mvt
0 PL 97 TP 59 DE 63 SN DD
1 TR 23 GH 66 SN 65 US DP
I want to pass conditional parameter in my dictionary based on the value of a column from my dataframe.
My code looks like this without the condition :
dic = {}
dic['section'] = []
for ix, row in df.iterrows():
in_dict = {
'location': {
'zip_code': {
'OrigC': row['OrigC'],
'OrigZ': row['OrigZ'],
},
'location': {'id': 1},
'OrigN': 'TP',
},
'CarriageParameter': {
'road': {
'truckLoad': 'Auto'}
},
'load': {
'Weigh': str(row['Weigh']),
}
}
dic['section'].append(in_dict)
I want to pass a condition inside my dictionary like this somehow wont work:
dic = {}
dic['section'] = []
for ix, row in df.iterrows():
in_dict = {
'location': {
if row['Mvt'] = 'DP':
return 'zip_code': {
'OrigC': row['OrigC'],
'OrigZ': row['OrigZ'],
}
elif row['Mvt'] = 'DD':
return 'iata_code': {
'OrigC': row['OrigN'],
}
'location': {'id':1},
'OrigN': 'TP',
},
'CarriageParameter': {
'road': {
'truckLoad': 'Auto'}
},
'load': {
'Weigh': str(row['Weigh']),
}
}
dic['section'].append(in_dict)
Have the common key value pairs in the in_dict dictionary setup initially, and later update the dictionary according to the condition.
dic = {}
dic['section'] = []
for ix, row in df.iterrows():
in_dict = {
'location': {
'CarriageParameter': {
'road': {
'truckLoad': 'Auto'
}
},
'load': {
'Weigh': str(row['Weigh']),
}
}
if row['Mvt'] == 'DP':
in_dict['location']['zip_code'] = {
'OrigC': row['OrigC'],
'OrigZ': row['OrigZ'],
}
elif row['Mvt'] == 'DD':
in_dict['location']['iata_code'] = {
'OrigC': row['OrigN']
}
in_dict['location']['location'] = {'id':1}
in_dict['location']['OrigN'] = 'TP'
dic['section'].append(in_dict)

How to best diplay a random string, with some strings weighted heavier than others

I am trying to display a random string but would like some strings to occur more often than others. My current strategy is with nested dictionaries for ease of updating and the 'choices' function.
msg_list = {
'msg_1': {
'msg': 'Hi',
'weight': 40,
},
'msg_2': {
'msg': 'hello',
'weight': 50,
},
'msg_3': {
'msg': "What's up",
'weight': 10,
},
}
message = choices(msg_list['msg'], msg_list['weight'])
string = message['msg']
This obviously doesn't work, and I imagine I could build the lists with a loop, but I am curious if there is a faster way of doing this. Thanks!
You're almost there.
You just need to create lists for the 2 parameters of random.choices.
msg_list = {
'msg_1': {
'msg': 'Hi',
'weight': 40,
},
'msg_2': {
'msg': 'hello',
'weight': 50,
},
'msg_3': {
'msg': "What's up",
'weight': 10,
},
}
weights = [msg_list[key]['weight'] for key in msg_list.keys()]
messages = [msg_list[key]['msg'] for key in msg_list.keys()]
message = choices(messages, weights)
string = message[0]

mongodb query takes too long time

I have following documents in my mongodb collection:
{'name' : 'abc-1','parent':'abc', 'price': 10}
{'name' : 'abc-2','parent':'abc', 'price': 5}
{'name' : 'abc-3','parent':'abc', 'price': 9}
{'name' : 'abc-4','parent':'abc', 'price': 11}
{'name' : 'efg', 'parent':'', 'price': 10}
{'name' : 'efg-1','parent':'efg', 'price': 5}
{'name' : 'abc-2','parent':'efg','price': 9}
{'name' : 'abc-3','parent':'efg','price': 11}
I want to perform following action:
a. Group By distinct parent
b. Sort all the groups based on price
c. For each group select a document with minimum price
i. check each record's parent sku exists as a record in name field
ii. If the name exists, do nothing
iii. If the record does not exists, insert a document with parent as empty and other values as the value of the record selected previously (minimum value).
I tired to do use for each as follows:
db.file.find().sort([("price", 1)]).forEach(function(doc){
cnt = db.file.count({"sku": {"$eq": doc.parent}});
if (cnt < 1){
newdoc = doc;
newdoc.name = doc.parent;
newdoc.parent = "";
delete newdoc["_id"];
db.file.insertOne(newdoc);
}
});
The problem with it is it takes too much time. What is wrong here? How can it be optimized? Would aggregation pipeline be a good solution, if yes how can it be done?
Retrieve a set of product names ✔
def product_names():
for product in db.file.aggregate([{$group: {_id: "$name"}}]):
yield product['_id']
product_names = set(product_names())
Retrieve product with minimum
price from group ✔
result_set = db.file.aggregate([
{
'$sort': {
'price': 1,
}
},
{
'$group': {
'_id': '$parent',
'name': {
'$first': '$name',
},
'price': {
'$min': '$price',
}
}
},
{
'$sort': {
'price': 1,
}
}
])
Insert products retrieved in 2 if name not in set
of product names retrieved in 1. ✔
from pymongo.operations import InsertOne
def insert_request(product):
return InsertOne({
name: product['name'],
price: product['price'],
parent: ''
})
requests = (
insert_request(product)
for product in result_set
if product['name'] not in product_names
)
db.file.bulk_write(list(requests))
Steps 2 and 3 can be implemented in the aggregation pipeline.
db.file.aggregate([
{
'$sort': {'price': 1}
},
{
'$group': {
'_id': '$parent',
'name': {
'$first': '$name'
},
'price': {
'$min': '$price'
},
}
},
{
'$sort': {
'price': 1
}
},
{
'$project': {
'name': 1,
'price': 1,
'_id': 0,
'parent':''
}
},
{
'$match': {
'name': {
'$nin': list(product_names())
}
}
},
{
'$out': 'file'
}
])

replace information in Json string based on a condition

I have a very large json file with several nested keys. From whaat I've read so far, if you do:
x = json.loads(data)
Python will interpret it as a dictionary (correct me if I'm wrong). The fourth level of nesting in the json file contains several elements named by an ID number and all of them contain an element called children, something like this:
{"level1":
{"level2":
{"level3":
{"ID1":
{"children": [1,2,3,4,5]}
}
{"ID2":
{"children": []}
}
{"ID3":
{"children": [6,7,8,9,10]}
}
}
}
}
What I need to do is to replace all items in all the "children" elements with nothing, meaning "children": [] if the ID number is in a list called new_ids and then convert it back to json. I've been reading on the subject for a few hours now but I haven't found anything similar to this to try to help myself.
I'm running Python 3.3.3. Any ideas are greatly appreciated!!
Thanks!!
EDIT
List:
new_ids=["ID1","ID3"]
Expected result:
{"level1":
{"level2":
{"level3":
{"ID1":
{"children": []}
}
{"ID2":
{"children": []}
}
{"ID3":
{"children": []}
}
}
}
}
First of all, your JSON is invalid. I assume you want this:
{"level1":
{"level2":
{"level3":
{
"ID1":{"children": [1,2,3,4,5]},
"ID2":{"children": []},
"ID3":{"children": [6,7,8,9,10]}
}
}
}
}
Now, load your data as a dictionary:
>>> with open('file', 'r') as f:
... x = json.load(f)
...
>>> x
{u'level1': {u'level2': {u'level3': {u'ID2': {u'children': []}, u'ID3': {u'children': [6, 7, 8, 9, 10]}, u'ID1': {u'children': [1, 2, 3, 4, 5]}}}}}
Now you can loop over the keys in x['level1']['level2']['level3'] and check whether they are in your new_ids.
>>> new_ids=["ID1","ID3"]
>>> for key in x['level1']['level2']['level3']:
... if key in new_ids:
... x['level1']['level2']['level3'][key]['children'] = []
...
>>> x
{u'level1': {u'level2': {u'level3': {u'ID2': {u'children': []}, u'ID3': {u'children': []}, u'ID1': {u'children': []}}}}}
You can now write x back to a file like this:
with open('myfile', 'w') as f:
f.write(json.dumps(x))
If your new_ids list is large, consider making it a set.
If you have simple dictionary like this
data_dict = {
"level1": {
"level2":{
"level3":{
"ID1":{"children": [1,2,3,4,5]},
"ID2":{"children": [] },
"ID3":{"children": [6,7,8,9,10]},
}
}
}
}
than you need only this:
data_dict = {
"level1": {
"level2":{
"level3":{
"ID1":{"children": [1,2,3,4,5]},
"ID2":{"children": [] },
"ID3":{"children": [6,7,8,9,10]},
}
}
}
}
new_ids=["ID1","ID3"]
for idx in new_ids:
if idx in data_dict['level1']["level2"]["level3"]:
data_dict['level1']["level2"]["level3"][idx]['children'] = []
print data_dict
'''
{
'level1': {
'level2': {
'level3': {
'ID2': {'children': []},
'ID3': {'children': []},
'ID1': {'children': []}
}
}
}
}
'''
but if you have more complicated dictionary
data_dict = {
"level1a": {
"level2a":{
"level3a":{
"ID2":{"children": [] },
"ID3":{"children": [6,7,8,9,10]},
}
}
},
"level1b": {
"level2b":{
"level3b":{
"ID1":{"children": [1,2,3,4,5]},
}
}
}
}
new_ids =["ID1","ID3"]
for level1 in data_dict.values():
for level2 in level1.values():
for level3 in level2.values():
for idx in new_ids:
if idx in level3:
level3[idx]['children'] = []
print data_dict
'''
{
'level1a': {
'level2a': {
'level3a': {
'ID2': {'children': []},
'ID3': {'children': []}
}
}
},
'level1b': {
'level2b': {
'level3b': {
'ID1': {'children': []}
}
}
}
}
'''

Categories