How to remove duplicated output when `kur dump mnist.yml`? - python

The problem
When run kur dump mnist.yml, the output has duplicated parts: training is duplicated part of train, testing is for test, evaluation is for evaluate and etc. See example below:
{
"evaluate": {
"data": [
{
"mnist": {
"images": {
"checksum": "8d422c7b0a1c1c79245a5bcf07fe86e33eeafee792b84584aec276f5a2dbc4e6",
"path": "~/kur",
"url": "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz"
},
"labels": {
"checksum": "f7ae60f92e00ec6debd23a6088c31dbd2371eca3ffa0defaefb259924204aec6",
"path": "~/kur",
"url": "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz"
}
}
}
],
"destination": "mnist.results.pkl",
"hooks": [
"mnist"
],
"provider": {
"num_batches": null
},
"weights": "mnist.w"
},
"evaluation": {
"data": [
{
"mnist": {
"images": {
"checksum": "8d422c7b0a1c1c79245a5bcf07fe86e33eeafee792b84584aec276f5a2dbc4e6",
"path": "~/kur",
"url": "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz"
...
So, how do we remove the duplicated parts from the output?

The point of having the duplicate section names is be more user-friendly. That way, someone can use train or training to define their training-related section, rather than thinking, "Hmm... I don't remember what the key is. Is it 'train' or 'training'? I guess I'll look in the documentation."
It is true that the JSON dump can be more confusing, though. On the other hand, it is also a more advanced feature. Maybe a compromise would be to hide the duplicate keys when JSON is dumped, but this could also lead to other forms of confusion (e.g., "why did Kur accept 'training' when it isn't even a valid key, nor does it appear in a dump?").

One solution
comment out parts of the lines like below in kur/kur/kurfile.py:
builtin = {
'settings' : ('settings', ),
'train' : ('train',),# 'training'),
'validate' : ('validate',),# 'validation'),
'test' : ('test', ), #'testing'),
'evaluate' : ('evaluate', ), #'evaluation'),
'templates' : ('templates', ),
'model' : ('model', ),
'loss' : ('loss', )
}
I have experimented through kur.Kurfile.parse and kur.Kurfile._parse_section, so far I don't see any side effect of doing so.

Related

AWS WAF CDK Python How to change rule action

Here is my python cdk code which create 2 rules "AWS-AWSManagedRulesCommonRuleSet" and "AWS-AWS-ManagedRulesAmazonIpReputationList".
In each rule there are child rules that i can change their Rule Actions to Count, the question is how can i add this to my code, i didn't find any good explanation for those child rules.
Added some changes but still doesn't work, i get this error:
Resource handler returned message: "Error reason: You have used none or multiple values for a field that requires exactly one value., field: RULE, parameter: Rule (Service: Wafv2, Status Code: 400, Request ID: 248d9235-bd01-49f4-963b-109bac2776c5, Extended Request ID: null)" (RequestToken: 8bb5****-****-3e95-****-
8e336ae3eed4, HandlerErrorCode: InvalidRequest)
the code:
class PyCdkStack(core.Stack):
def __init__(self, scope: core.Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
web_acl = wafv2.CfnWebACL(
scope_=self, id='WebAcl',
default_action=wafv2.CfnWebACL.DefaultActionProperty(allow={}),
scope='REGIONAL',
visibility_config=wafv2.CfnWebACL.VisibilityConfigProperty(
cloud_watch_metrics_enabled=True,
sampled_requests_enabled=True,
metric_name='testwafmetric',
),
name='Test-Test-WebACL',
rules=[
{
'name': 'AWS-AWSManagedRulesCommonRuleSet',
'priority': 1,
'statement': {
'RuleGroupReferenceStatement': {
'vendorName': 'AWS',
'name': 'AWSManagedRulesCommonRuleSet',
'ARN': 'string',
"ExcludedRules": [
{
"Name": "CrossSiteScripting_QUERYARGUMENTS"
},
{
"Name": "GenericLFI_QUERYARGUMENTS"
},
{
"Name": "GenericRFI_QUERYARGUMENTS"
},
{
"Name": "NoUserAgent_HEADER"
},
{
"Name": "SizeRestrictions_QUERYSTRING"
}
]
}
},
'overrideAction': {
'none': {}
},
'visibilityConfig': {
'sampledRequestsEnabled': True,
'cloudWatchMetricsEnabled': True,
'metricName': "AWS-AWSManagedRulesCommonRuleSet"
}
},
]
)
The Cfn- constructs are a one to one mapping to the cloudformation resources. You can simply check the docs for aws::wafv2::webacl.
For an example on how to exclude in cloudformation, see below. Note that object keys need to start with lowercase in order for CDK to process them.
{
"name": "AWS-AWSBotControl-Example",
"priority": 5,
"statement": {
"managedRuleGroupStatement": {
"vendorName": "AWS",
"name": "AWSManagedRulesBotControlRuleSet",
"excludedRules": [
{
"name": "CategoryVerifiedSearchEngine"
},
{
"name": "CategoryVerifiedSocialMedia"
}
]
},
"visibilityConfig": {
"sampledRequestsEnabled": true,
"cloudWatchMetricsEnabled": true,
"metricName": "AWS-AWSBotControl-Example"
}
}
This actually sets the two mentioned rules to Count mode. See https://docs.aws.amazon.com/waf/latest/developerguide/web-acl-rule-group-settings.html#web-acl-rule-group-rule-to-count. Note it sais:
Rules that you alter like this are described as being excluded rules in the rule group. If you have metrics enabled, you receive COUNT metrics for each excluded rule. This change alters how the rules in the rule group are evaluated.

Save reference to a python dictionary (from a json file) item in a variable

I'm trying to save the reference to a value in a json file where item orders could not be guaranteed. So far, what I have for a dataset like this one:
"Values": [
{
"Object": "DFC_Asset_05",
"Properties": [
{
"Property": "WeightKilograms",
"Value Offset": 5
},
{
"Property": "WeightPounds",
"Value Offset": 10
}
]
},
{
"Object": "DFC_Asset_05",
"Properties": [
{
"Property": "Name",
"Value Offset": 25
},
{
"Property": "ShortName",
"Value Offset": 119
}
]
}
]
and retrieving this object:
{
"Property": "ShortName",
"Value Offset": 119
}
Is a string like this:
reference = "[Object=DFC_Asset_06][Properties][Property=Name]"
Which looks nice and understandable in a string, but it's very unclean to find the referenced value as I must first parse the reference it with a regex then loop in the data to retrieve the matching item.
Am I doing this wrong? Is there a better way to do this? I looked at the reduce() function however it seems like it's made for dictionaries with static data. For example, I could not save the direct keys:
reference = "[1][Properties][1]"
reference_using_reduce = [1, "Properties", 1]
As they might not always be in that order
You can run "queries" on JSONs without referencing specific indexes using the pyjq module:
query = (
'.Values[]' # On all the items in "Values"
'|select(."Object" == "DFC_Asset_06")' # Find key "Object" which holds this value
'|."Properties"[]' # And get all the items of "Properties"
'|select(."Property" == "Name")' # Where the key "Property" holds the value "Name"
)
pyjq.first(query, d)
Result:
{'Property': 'Name', 'Value Offset': 25}
You can read more about jq in the documentations.

Pymongo: Unable to find record from mongodb

I have a collection containing country records, I need to find particular country with uid and it's countryId
Below is the sample collection data:
{
"uid": 15024,
"countries": [{
"countryId": 123,
"popullation": 45000000
},
{
"countryId": 456,
"poppulation": 9000000000
}
]
},
{
"uid": 15025,
"countries": [{
"countryId": 987,
"popullation": 560000000
},
{
"countryId": 456,
"poppulation": 8900000000
}
]
}
I have tried with below query in in python but unable to find any result:
foundRecord = collection.find_one({"uid" : 15024, "countries.countryId": 456})
but it return None.
Please help and suggest.
I think following will work better :
foundRecord = collection.find_one({"uid" : 15024,
"countries" : {"$elemMatch" : { "countryId" : 456 }})
Are you sure you're using the same Database / Collection source?
Seems that you're saving results on another collection.
I've tried to reproduce your problem and it works on my mongodb ( note that I'm using v4)
EDIT: Would be nice to have the piece of code where you're defining "collection"

Parse Json data

a= [{
"data" : {
"check": true,
},
"AMI": {
"status": 1,
"firewall":{
"status": enable
},
"d_suffix": "x.y.com",
"id": 4
},
"tags": [ #Sometime tags could be like "tags": ["default","auto"]
"default"
],
"hostname": "abc.com",
}
]
How to get a hostname on the basis of tags?I am trying to implement it using
for i in a:
if i['tags'] == 'default':
output = i['hostname']
but it's failing because 'tags' is a list which is not mapping to hostname key.Is there any way i can get hostname on the basis of 'tags'?
Use in to test if something is in a list. You also need to put default in quotes to make it a string.
for i in a:
if 'default' in i['tags']:
output = i['hostname']
break
If you only need to find one match, you should break out of the loop once you find it.
If you need to find multiple matches, use #phihag's answer with the list comprehension.
To get all hostnames tagged as default, use a list comprehension:
def_hostnames = [i['hostname'] for i in a if 'default' in i['tags']]
print('Default hostnames: %s' % ','.join(def_hostnames))
If you only want the first hit, either use def_hostnames[0] or the equivalent generator expression:
print('first: %s' % next(i['hostname'] for i in a if 'default' in i['tags']))
Your current code fails because it uses default, which is a variable named default. You want to look for a string default.
Make sure that you have everything in Json format like
a= [{
"data" : {
"check": True,
},
"AMI": {
"status": 1,
"firewall":{
"status": "enable"
},
"d_suffix": "x.y.com",
"id": 4
},
"tags": [
"default"
],
"hostname": "abc.com",
}
]
and then you can easily get it by using in
for i in a:
if 'default' in i['tags']:
output = i['hostname']

Formatting JSON output

I have a JSON file with key value pair data. My JSON file looks like this.
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally",
"helpfullness": "3.3",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=111119",
"reviews": [
{
"attendance": "N/A",
"class": "CHEM 1A",
"textbook_use": "It's a must have",
"review_text": "Tests were incredibly difficult (averages in the 40s) and lectures were essentially useless. I attended both lectures every day and still was unable to grasp most concepts on the midterms. Scope out a good GSI to get help and ride the curve."
},
{
"attendance": "N/A",
"class": "CHEMISTRY1A",
"textbook_use": "Essential to passing",
"review_text": "Saykally really isn't as bad as everyone made him out to be. If you go to his lectures he spends about half the time blowing things up, but if you actually read the texts before his lectures and pay attention to what he's writing/saying, you'd do okay. He posts practice tests that were representative of actual tests and curves the class nicely!"
}]
{
{
"first_name": "Laura",
"last_name": "Stoker",
"helpfullness": "4.1",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=536606",
"reviews": [
{
"attendance": "N/A",
"class": "PS3",
"textbook_use": "You need it sometimes",
"review_text": "Stoker is by far the best professor. If you put in the effort, take good notes, and ask questions, you will be fine in the class. As far as her lecture, she does go a bit fast, but her lecture is in the form of an outline. As long as you take good notes, you will have everything you need for exams. She is funny and super nice if you speak with her"
},
{
"attendance": "Mandatory",
"class": "164A",
"textbook_use": "Barely cracked it open",
"review_text": "AMAZING professor. She has a good way of keeping lectures interesting. Yes, she can be a little everywhere and really quick with her lecture, but the GSI's are useful to make sure you understand the material. Oh, and did I mention she's hilarious!"
}]
}]
So I'm trying to do multiple things.
I'm trying to get the most mentioned ['class'] key under reviews. Then get the class name and the times it was mentioned.
Then I'd like to output my format in this manner. Also under professor array. It's just the info of professors for instance for CHEM 1A, CHEMISTRY1A - It's Richard Saykally.
{
courses:[
{
"course_name" : # class name
"course_mentioned_times" : # The amount of times the class was mentioned
professors:[ #The professor array should have professor that teaches this class which is in my shown json file
{
'first_name' : 'professor name'
'last_name' : 'professor last name'
}
}
So I'd like to sort my json file key-value where I have max to minimum. So far all I've been able to figure out isd
if __name__ == "__main__":
open_json = open('result.json')
load_as_json = json.load(open_json)['professors']
outer_arr = []
outer_dict = {}
for items in load_as_json:
output_dictionary = {}
all_classes = items['reviews']
for classes in all_classes:
arr_info = []
output_dictionary['class'] = classes['class']
output_dictionary['first_name'] = items['first_name']
output_dictionary['last_name'] = items['last_name']
#output_dictionary['department'] = items['department']
output_dictionary['reviews'] = classes['review_text']
with open('output_info.json','wb') as outfile:
json.dump(output_dictionary,outfile,indent=4)
I think this program does what you want:
import json
with open('result.json') as open_json:
load_as_json = json.load(open_json)
courses = {}
for professor in load_as_json['professors']:
for review in professor['reviews']:
course = courses.setdefault(review['class'], {})
course.setdefault('course_name', review['class'])
course.setdefault('course_mentioned_times', 0)
course['course_mentioned_times'] += 1
course.setdefault('professors', [])
prof_name = {
'first_name': professor['first_name'],
'last_name': professor['last_name'],
}
if prof_name not in course['professors']:
course['professors'].append(prof_name)
courses = {
'courses': sorted(courses.values(),
key=lambda x: x['course_mentioned_times'],
reverse=True)
}
with open('output_info.json', 'w') as outfile:
json.dump(courses, outfile, indent=4)
Result, using the example input in the question:
{
"courses": [
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "PS3",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "164A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEM 1A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEMISTRY1A",
"course_mentioned_times": 1
}
]
}

Categories