Python: Mutability and dictionaries in config - python

I want to keep some large, static dictionaries in config to keep my main application code clean. Another reason for doing that is so the dicts can be occasionally edited without having to touch the application.
I thought a good solution was using a json config a la:
http://www.ilovetux.com/Using-JSON-Configs-In-Python/
JSON is a natural, readable format for this type of data. Example:
{
"search_dsl_full": {
"function_score": {
"boost_mode": "avg",
"functions": [
{
"filter": {
"range": {
"sort_priority_inverse": {
"gte": 200
}
}
},
"weight": 2.4
}
],
"query": {
"multi_match": {
"fields": [
"name^10",
"search_words^5",
"description",
"skuid",
"backend_skuid"
],
"operator": "and",
"type": "cross_fields"
}
},
"score_mode": "multiply"
}
}
The big problem is, when I import it into my python app and set a dict equal to it like this:
with open("config.json", "r") as fin:
config = json.load(fin)
...
def create_query()
query_dsl = config['search_dsl_full']
return query_dsl
and then later, only when a certain condition is met, I need to update that dict like this:
if (special condition is met):
query_dsl['function_score']['query']['multi_match']['operator'] = 'or'
Since query_dsl is a reference, it updates the config dictionary too. So when I call the function again, it reflects the updated-for-special-condition version ("or") rather than the the desired config default ("and").
I realize this is a newb issue (yes, I'm a python newb), but I can't seem to figure out a 'pythonic' solution. I'm trying to not be a hack.
Possible options:
When I set query_dsl equal to the config dict, use copy.deepcopy()
Figure out how to make all nested slices of the config dictionary immutable
Maybe find a better way to accomplish what I'm trying to do? I'm totally open to this whole approach being a preposterous newbie mistake.
Any help appreciated. Thanks!

Related

How to collect specific values in a deeply nested structure with Python

I'm trying to get a list of instance IDs that I get from the describe_instances call using boto3 api in my python script. For those of you who aren't aware of aws, I can post a detailed code after removing the specifics if you need it. I'm trying to access a item from a structure like this
u'Reservations':[
{
u'Instances':[
{
u'InstanceId':'i-0000ffffdd'
},
{ }, ### each of these dict contain a id like above
{ },
{ },
{ }
]
},
{
u'Instances':[
{ },
{ },
{ },
{ },
{ }
]
},
{
u'Instances':[
{ }
]
}
]
I'm currently accessing it like
instanceLdict = []
instanceList = []
instances = []
for r in reservations:
instanceList.append(r['Instances'])
for ilist in instanceList:
for i in ilist:
instanceLdict.append(i)
for i in instanceLdict:
instances.append(i['InstanceId']) ####i need them in a list
print instances
fyi: my reservations variable contains the whole list of u'Reservations':
I feel this is inefficient and since I'm a python newbie I really think there must be some better way to do this rather than the multiple for and if. Is there a better way to do this? Kindly point to the structure/method etc., that might be useful in my scenario
Your solution is not actually that inefficient, except you don't really have to create all those top level lists just to save the instance ids in the end. What you could do is a nested loop and keep only what you need:
instances = list()
for r in reservations:
for ilist in r['Instances']:
for i in ilist:
instances.append(i['InstanceId']) # That's what you looping for
Yes, there are ways to do this with shorter code, but explicit is better than implicit and stick to what you can read best. Python is quite good with iterations and remember maintainability first, performance second. Also, this part is hardly the bottleneck of what you doing after all those API calls, DB lookups etc.
But if you really insist to do fancy one-liner, go have a look at itertools helpers, chain.from_iterable() is what you need:
from itertools import chain
instances = [i['InstanceId'] for i in chain.from_iterable(r['Instances'] for r in reservations)]

EventRegistry.org api query string for multiple keywords

In my project I have to use the Eventregistry.org events API to search for specific articles with specific keywords.
The problem is that if I add more than one keyword, it seems to perform an "AND" sort of search instead of an "OR". (searched for ipad alone ~8k results, searched for surface alone ~40k results, searched for ipad surface together got 9 results)
I am using cakephp 3, but I think the language is not the problem, I think is the final url. I went went through the Python project and find some Query.AND(params) and Query.OR(params) so I asume that this can be done?, but I don't know Python.
This is my url:
http://eventregistry.org/json/article?ignoreKeywords=&keywords=surface%20ipad&lang=eng&action=getArticles&articlesSortBy=date&resultType=articles&articlesCount=20
Here you can test the API
This is the Python repo on github
Well, their documentation is not overly informative, to say the least.
Looks like they're using some kind of query language, you could probably figure out what things look like by debugging the request generated by the Python script, but if you're not familiar with Python, try using their web interface instead, apparently it supports boolean conditions (OR, AND, NOT, the latter being expressed as -), which are being composed into a JSON structure:
http://blog.eventregistry.org/.../phrase-search-boolean-keyword-queries-web-interface
http://blog.eventregistry.org/2017/05/15/number-changes-api-users
Check your browsers network console to inspect the generated URLs, they'll contain a query key that holds a JSON string like this:
{"$query":{"$and":[{"$or":[{"keyword":{"$and":["ipad"]}},{"keyword":{"$and":["surface"]}}]}]}}
{
"$query": {
"$and": [
{
"$or": [
{
"keyword": {
"$and": [
"ipad"
]
}
},
{
"keyword": {
"$and": [
"surface"
]
}
}
]
}
]
}
}
That looks a little different to what the blog post shows, but it seems that the more compact variant shown there works too:
{"$query":{"keyword":{"$or":["ipad","surface"]}}}
{
"$query": {
"keyword": {
"$or": [
"ipad",
"surface"
]
}
}
}
So the final URL could look like this:
http://eventregistry.org/json/article?action=getArticles&articlesCount=20&articlesSortBy=date&resultType=articles&query={"$query":{"keyword":{"$or":["ipad","surface"]}}}
http://eventregistry.org/json/article
?action=getArticles
&articlesCount=20
&articlesSortBy=date
&resultType=articles
&query={"$query":{"keyword":{"$or":["ipad","surface"]}}}

How to copy a python script which includes dictionaries to a new python script?

I have a python script which contains dictionaries and is used as input from another python script which performs calculations. I want to use the first script which is used as input, to create more scripts with the exact same structure in the dictionaries but different values for the keys.
Original Script: Car1.py
Owner = {
"Name": "Jim",
"Surname": "Johnson",
}
Car_Type = {
"Make": "Ford",
"Model": "Focus",
"Year": "2008"
}
Car_Info = {
"Fuel": "Gas",
"Consumption": 5,
"Max Speed": 190
}
I want to be able to create more input files with identical format but for different cases, e.g.
New Script: Car2.py
Owner = {
"Name": "Nick",
"Surname": "Perry",
}
Car_Type = {
"Make": "BMW",
"Model": "528",
"Year": "2015"
}
Car_Info = {
"Fuel": "Gas",
"Consumption": 10,
"Max Speed": 280
}
So far, i have only seen answers that print just the keys and the values in a new file but not the actual name of the dictionary as well. Can someone provide some help? Thanks in advance!
If you really want to do it that way (not recommended, because of the reasons statet in the comment by spectras and good alternatives) and import your input Python file:
This question has answers on how to read out the dictionaries names from the imported module. (using the dict() on the module while filtering for variables that do not start with "__")
Then get the new values for the dictionary entries and construct the new dicts.
Finally you need to write a exporter that takes care of storing the data in a python readable form, just like you would construct a normal text file.
I do not see any advantage over just storing it in a storage format.
read the file with something like
text=open('yourfile.py','r').read().split('\n')
and then interpret the list of strings you get... after that you can save it with something like
new_text = open('newfile.py','w')
[new_text.write(line) for line in text]
new_text.close()
as spectras said earlier, not ideal... but if that's what you want to do... go for it

How to define and select groups of values using configobj?

I would like to define several groups of values where the values of a particular group are used if that group is selected.
Here's a an example to make that clearer:
[environment]
type=prod
[prod]
folder=data/
debug=False
[dev]
folder=dev_data/
debug=True
Then to use it:
print config['folder'] # prints 'data/' because config['environment']=='prod'
Is there a natural or idiomatic way to do this in configobj or otherwise?
Additional Info
My current thoughts are overwriting or adding to the resulting config object using some logic post parsing the config file. However, this feels contrary to the nature of a config file, and feels like it would require somewhat complex logic to validate.
I know this is maybe not exactly what you're searching for, but have you considered using json for easy nested access?
For example, if your config file looks like
{
"environment": {
"type": "prod"
},
"[dev]": {
"debug": "True",
"folder": "dev_data/"
},
"[prod]": {
"debug": "False",
"folder": "data/"
}
}
you can access it with [dev] or [prod] key to get your folder:
>>> config = json.loads(config_data)
>>> config['[dev]']['folder']
'dev_data/'
>>> config['[prod]']['folder']
'data/'

How to get a list of all document types

I use elesticserach_dsl in Python to do searching, and I really like it. But the thing I do not know how to impement, is how to get a list of all different document types. The catch is type field plays for me almost the same role as table name in SQL, and what I want to do, is to somehow mimic SHOW TABLES command.
I don't know how to do this in Python, but from Elasticsearch point of view, this is how the request looks like:
GET /_all/_search?search_type=count
{
"aggs": {
"NAME": {
"terms": {
"field": "_type",
"size": 100
}
}
}
}

Categories