Snakemake expand() arguments - python

I inherited a complicated Snakemake setup. It uses a configfile that contains
{
"sub": [
1234,
],
"ses": [
"1"
],
"task": [
"fake"
],
"run": [
"1"
],
"acq": [
"mb"
],
"bids_dir": "../../bids"
In the all recipe, it uses for input calls to expand() that look like this.
expand('data/{task}/preproc/acq-{acq}/sub-{sub}/ses-{ses}/run-{run}/bold.nii', **config)
Then, I have a recipe that looks like this:
rule getRawFunc:
input:
rawFunc = config['bids_dir'] + '/sub-{sub}/ses-{ses}/func/sub-{sub}_ses-{ses}_task-{task}_acq-{acq}_run-{run}_bold.nii.gz'
output:
func = temp('data/{task}/preproc/acq-{acq}/sub-{sub}/ses-{ses}/run-{run}/bold.nii')
shell:
'gunzip -c {input} > {output}'
I am not understanding why it needs config['bids_dir'] to get the value for that, but it seemingly does not need that to expand the values for {sub} and the like.
I looked at the section about expand at
https://snakemake.readthedocs.io/en/latest/snakefiles/configuration.html#standard-configuration
and that and the tutorials explain the use of config['bids_dir'] well, it's just that **config that I am not quite getting.
Further explication would be most appreciated!

In general expand function requires the template and keyword arguments to use when filling in the template, like so:
expand('{a}_{b}', a='some', b='test')
# this will return 'some_test'
Now, in Python one can do dictionary unpacking by placing two asterisks before the dictionary '**some_dict'. What this does is unpack the contents of the dictionary as 'key=value'. In the example above, we can get the same result by unpacking a dictionary:
some_dict = {'a': 'some', 'b': 'test'}
expand('{a}_{b}', **some_dict)
# this will return 'some_test'
You can read more details in this answer.

Related

List in Python help needed: how to Avoid calling repetitive code

I have a function shown below:
def _extract_parent(matched_list, json_data, info_type):
return [json_data[match_lst][info_type] for match_lst in matched_list]
parent_lst = list(map(lambda x: _extract_parent(x,json_data, "parent_info"), flashtext_matched_pattern))
family_lst = list(map(lambda x: _extract_parent(x, json_data, "family_info"),flashtext_matched_pattern))
This code works for me. But I wish to optimize it
I keep calling the map from the outside instead I want to do the mapping inside the _extract_parent function and call the function like
res = _extract_parent(flashtext_matched_pattern, json_data, "parent_info")
My JSON looks like this:
{ "power series": {
"parent_info" : "abc",
"family_info" : "xyz",
"base_info" : "pqr"
}
}
flashtext_matched_pattern is a list of list which looks like below. It can have empty list as well as values shown in the example
[[],
['power series'],
[],
[],
[],
[]]
So if the power series is matched it will give me the info I requested example parent_info or family_info.
Now I wish to optimize the code. I want to augment _extract_parent function to include mapping part too and call the _extract_parent function like
parent_lst = _extract_parent(flashtext_matched_pattern, json_data, "parent_info")
family_lst = _extract_parent(flashtext_matched_pattern, json_data, "family_info")
How can I do the same? Any help in this would be highly appreciated
Or am I thinking in the wrong direction would it be a better solution? If for each matching entry, return a tuple containing metadata parent, family and base info. (If yes how to do that)
Your long statements where you call your code via map and a lambda are just an inconvinient way of doing another list comprehension:
parent_lst = [_extract_parent(x,json_data, "parent_info")
for x in flashtext_matched_pattern]
If you want to do everything in to the function, you can make that happen with a nested list comprehension:
def _extract_parent(list_of_match_lists, json_data, info_type):
return [
[json_data[match][info_type] for match in match_list]
for match_list in list_of_match_lists
]

How to iterate over a python dictionary, setting the key of the dictionary as another dictionary's value

I come from a C++ background, I am new to Python, and I suspect this problem has something to do with [im]mutability.
I am building a JSON representation in Python that involves several layers of nested lists and dictionaries in one "object". My goal is to call jsonify on the end result and have it look like nicely structured data.
I hit a problem while building out an object:
approval_groups_list = list()
approval_group_dict = dict()
for groupMemKey, groupvals in groupsAndMembersDict.items():
approval_group_dict["group_name"] = groupMemKey
approval_group_dict["name_dot_numbers"] = groupvals # groupvals is a list of strings
approval_groups_list.append(approval_group_dict)
entity_approval_unit["approval_groups"] = approval_groups_list
The first run does as expected, but after, whatever groupMemkey is touched last, that is what all other objects mirror.
groupsAndMembersDict= {
'Art': ['string.1', 'string.2', 'string.3'],
'Math': ['string.10', 'string.20', 'string.30']
}
Expected result:
approval_groups:
[
{
"group_name": "Art",
"name_dot_numbers": ['string.1', 'string.2', 'string.3']
},
{
"group_name": "Math",
"name_dot_numbers": ['string.10', 'string.20', 'string.30']
}
]
Actual Result:
approval_groups:
[
{
"group_name": "Math",
"name_dot_numbers": ['string.10', 'string.20', 'string.30']
},
{
"group_name": "Math",
"name_dot_numbers": ['string.10', 'string.20', 'string.30']
}
]
What is happening, and how do I fix it?
Your problem is not the immutability, but the mutability of objects. I'm sure you would have ended up with the same result with the equivalent C++ code.
You construct approval_group_dict before the for loop and keep reusing it. All you have to do is to move the construction inside for so that a new object is created for each loop:
approval_groups_list = list()
for groupMemKey, groupvals in groupsAndMembersDict.items():
approval_group_dict = dict()
...
Through writing this question, it dawned on me to try a few things including this, which fixed my problem - however, I still don't know exactly why this works. Perhaps it is more like a pointer/referencing problem?
approval_groups_list = list()
approval_group_dict = dict()
for groupMemKey, groupvals in groupsAndMembersDict.items():
approval_group_dict["group_name"] = groupMemKey
approval_group_dict["name_dot_numbers"] = groupvals
approval_groups_list.append(approval_group_dict.copy()) # <== note, here is the difference ".copy()"
entity_approval_unit["approval_groups"] = approval_groups_list
EDIT: The problem turns out to be that Python is Pass by [object] reference all the time. If you are new to Python like me, this means: "pass by reference, except when the thing you are passing is immutable, then its pass by value". So in a way it did have to do with [im]mutability. Mostly it had to do with my lack of understanding how Python passes references.

How to extract nested JSON data?

I am trying to get a value from a data JSON. I have successfully traversed deep into the JSON data and almost have what I need!
Running this command in Python :
autoscaling_name = response['Reservations'][0]['Instances'][0]['Tags']
Gives me this :
'Tags': [{'Key': 'Name', 'Value': 'Trove-Dev-Inst : App WebServer'}, {'Key': 'aws:autoscaling:groupName', 'Value': 'CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT'}, {'Key': 'CodeDeployProvisioningDeploymentId', 'Value': 'd-4WTRTRTRT'}, {'Key': 'Environment', 'Value': 'ernie-dev'}]
I only want to get the value "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT". This is from the key "aws:autoscaling:groupName".
How can I further my command to only return the value "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT"?
Is this the full output? This a dictionary containing a list with nested dictionaries, so you should treat it that way. Suppose it is called:
A = {
"Tags": [
{
"Key": "Name",
"Value": "Trove-Dev-Inst : App WebServer"
},
{
"Key": "aws:autoscaling:groupName",
"Value": "CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT"
},
{
"Key": "CodeDeployProvisioningDeploymentId",
"Value": "d-4WTRTRTRT"
},
{
"Key": "Environment",
"Value": "ernie-dev"
}
]
}
Your first adress the object, then its key in the dictionary, the index within the list and the key for that dictionary:
print(A['Tags'][1]['Value'])
Output:
CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT
EDIT: Based on what you are getting then you should try:
autoscaling_name = response['Reservations'][0]['Instances'][0]['Tags'][1]['Value']
You could also use glom it's great for deeply nested functions and has sooo many uses that make complicated nested tasks easy.
For example translating #Celius's answer:
glom(A, 'Tags.1.Value')
Returns the same thing:
CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT
So to answer your original question you'd use:
glom(response, 'Reservations.0.Instances.0.Tags.1.Value')
The final code for this is -
tags = response['Reservations'][0]['Instances'][0]['Tags']
autoscaling_name = next(t["Value"] for t in tags if t["Key"] == "aws:autoscaling:groupName")
This also ensures that if the order of the data is moved in the JSON data it will still find the correct one.
For anyone struggling to get their heads around list comprehensions and iterators, the cherrypicker package (pip install --user cherrypicker) does this sort of thing for you pretty easily:
from cherrypicker import CherryPicker
tags = CherryPicker(response['Reservations'][0]['Instances'][0]['Tags'])
tags(Key="aws:autoscaling:groupName")[0]["Value"].get()
which gives you 'CodeDeploy_Ernie-dev-Autoscaling-Deploy_d-4WTRTRTRT'. If you're expecting multiple values, omit the [0] to get back a list of all values that have an associated "aws:autoscaling:groupName" key.
This is probably all a bit overkill for your question, which can be solved easily with a simple list comprehension. But this approach might come in handy if you need to do more complicated things later, like matching on partial keys only (e.g. aws:* or something more complicated like a regular expression), or you need to filter based on the values in an intermediate layer of the nested object. This sort of task could lead to lots of complicated nested for loops or list comprehensions, whereas with CherryPicker it stays as a simple, potentially one-line command.
You can find out more about advanced usage at https://cherrypicker.readthedocs.io.

Modifying a python dictionary from user inputted dot notation

I'm trying to provide an API like interface in my Django python application that allows someone to input an id and then also include key/values with the request as form data.
For example the following field name and values for ticket 111:
ticket.subject = Hello World
ticket.group_id = 12345678
ticket.collaborators = [123, 4567, 890]
ticket.custom_fields: [{id: 32656147,value: "something"}]
On the backend, I have a corresponding Dict that should match this structure (and i'd do validation). Something like this:
ticket: {
subject: "some subject I want to change",
group_id: 99999,
collaborator_ids: [ ],
custom_fields: [
{
id: 32656147,
value: null
}
]
}
1) I'm not sure exactly the best way to parse the dot notation there, and
2) Assuming I am able to parse it, how would I be able to change the values of the Dict to match what was passed in. I'd imagine maybe something like a class with these inputs?
class SetDictValueFromUserInput(userDotNotation, userNewValue, originalDict)
...
SetDictValueFromUserInput("ticket.subject", "hello world", myDict)
Fastest way is probably splitting the string and indexing based on seperation. For example:
obj = "ticket.subject".split(".")
actual_obj = eval(obj[0]) # this is risky, they is a way around this if you just use if statements and predifined variables.
actual_obj[obj[1]] = value
To have further indexing where an object like ticket.subject.name might work try using a for loop as so.
for key in obj[1:-2]: # basically for all the values in between the object name and the defining key
actual_obj = actual_obj[key] # make the new object based on the value in-between.
actual_obj[obj[-1]] = value

Temporary names within an expression

I'm looking for a way to name a value within an expression to use it multiple times within that expression. Since the value is found inside the expression, I can't save it as a variable using a typical assign statement. I also want its use to be in the same function as the rest of the expression, so I would rather not break it out into a separate function.
More specifically, I enjoy comprehension. List/dictionary comprehension is my favorite Python feature. I'm trying to use both to coerce a dictionary of untrusted structure into a trusted structure (all fields exist, and their values are of the correct type). Without what I'm looking for, it would look something like this:
{
...
'outer': [{
...
'inner': {
key: {
...
'foo': {
'a': get_foo_from_value(value)['a'],
'b': get_foo_from_value(value)['b'],
...
}
} for key, value in get_inner_from_outer(outer)
}
} for outer in get_outer_from_dictionary(dictionary)]
}
Those function calls are actually expressions, but I would like to only evaluate get_foo_from_value(value) once. Ideally there would be something like this:
'foo': {
'a': foo['a'],
'b': foo['b'],
...
} with get_foo_from_value(value) as foo
So far the options I've come up with are single-item generators and lambda expressions. I'm going to include an example of each as an answer so they can be discussed separately.
lambda solution
'foo': (lambda foo: {
'a': foo['a'],
'b': foo['b'],
...
})(get_foo_from_value(value))
I feel like this one isn't as readable as it could be. I also don't like creating a lambda that only gets called once. I like that the name appears before it's used, but I don't like the separation of its name and value.
single-item generator solution
This is currently my favorite solution to the problem (I like comprehension, remember).
'foo': next({
'a': foo['a'],
'b': foo['b'],
...
} for foo in [get_foo_from_value(value)])
I like it because the generator expression matches the rest of the comprehension in the expression, but I'm not a huge fan of the next and having to wrap get_foo_from_value(value) in brackets.

Categories