Error when formatting large string - How to detect error? - python

I am pretty new to Python and I am currently working on large string formatting that I need for a library I am using.
The problem occurs as I do not understand where exactly the error is happening within the large string format. More precisely I get an error of the form
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/IPython/core/interactiveshell.py", line 2731, in run_code
exec code_obj in self.user_global_ns, self.user_ns
File "<ipython-input-3-f6a3bb7fe2f9>", line 13, in <module>
trainCV = trainCV % (train_params)
ValueError: unsupported format character ',' (0x2c) at index 2726
Is there an way to precisely detect the line get where the error occurs ?
My complete code looks like this:
trainCV = open('Conv_Constructor.yaml','r').read()
train_params = {'batch_size': 100,
'output_channels_h2': 64,
'conv_kernel_size': 8,
'pool_size': 2,
'stride_size': 1,
'output_channels_h3': 64,
'num_classes': 6,
'valid_stop': 4200,
'test_start': 4200,
'test_stop': 4400,
'max_epochs': 5}
trainCV = trainCV % (train_params)
print trainCV
And the Conv_Constructor.yaml file I am trying to format as a string is the following
# ---------- INPUTS ---------
#
# batch_size
# output_channels_h2
# conv_kernel_size
# pool_size
# stride_size
# output_channels_h3
# num_classes
# valid_stop
# test_start
# test_stop
# max_epochs
##################################################################
!obj:pylearn2.train.Train {
dataset: !obj:pylearn2.official_train_data.load_data {
start: 0,
stop: 4000
# one_hot: 1,
},
model: !obj:pylearn2.models.mlp.MLP {
batch_size: %(batch_size)i,
input_space: !obj:pylearn2.space.Conv2DSpace {
shape: [32, 32],
num_channels: 1,
axes = ('b',0,1,'c')
},
layers: [ !obj:pylearn2.models.mlp.ConvRectifiedLinear {
layer_name: 'h2',
output_channels: %(output_channels_h2)i,
#params : !pkl: 'dae_layer_1_weights.plk',
irange: .05,
kernel_shape: [%(conv_kernel_size)i, %(conv_kernel_size)i],
pool_shape: [%(pool_size)i, %(pool_size)i],
pool_stride: [%(stride_size)i, %(stride_size)i],
max_kernel_norm: 1.9365
}, !obj:pylearn2.models.mlp.ConvRectifiedLinear {
layer_name: 'h3',
output_channels: %(output_channels_h3)i,
#params : !pkl: 'dae_layer_1_weights.plk',
irange: .05,
kernel_shape: %(conv_kernel_size)i, %(conv_kernel_size)i],
pool_shape:[%(pool_size)i, %(pool_size)i],
pool_stride: [%(stride_size)i, %(stride_size)i],
max_kernel_norm: 1.9365
}, !obj:pylearn2.models.mlp.Softmax {
max_col_norm: 1.9365,
layer_name: 'y',
n_classes: %(num_classes)i,
istdev: .05
}
],
},
algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
batch_size: %(batch_size)i,
learning_rate: .01,
init_momentum: .5,
monitoring_dataset:
{
'valid' : !obj:pylearn2.official_train_data.load_data {
start: 4000,
stop: %(valid_stop)i
#one_hot: 1,
},
'test' : !obj:pylearn2.official_train_data.load_data {
start: %(test_start),
stop: %(test_stop)
#one_hot: 1,
}
},
cost: !obj:pylearn2.costs.cost.SumOfCosts { costs: [
!obj:pylearn2.costs.cost.MethodCost {
method: 'cost_from_X'
}, !obj:pylearn2.costs.mlp.WeightDecay {
coeffs: [ .00005, .00005, .00005 ]
}
]
},
termination_criterion: !obj:pylearn2.termination_criteria.And {
criteria: [
!obj:pylearn2.termination_criteria.MonitorBased {
channel_name: "valid_y_misclass",
prop_decrease: 0.50,
N: 50
},
!obj:pylearn2.termination_criteria.EpochCounter {
max_epochs: %(max_epochs)i
},
]
},
},
extensions:
[ !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
channel_name: 'valid_y_misclass',
save_path: "%(save_path)s/convolutional_network_best.pkl"
}, !obj:pylearn2.training_algorithms.sgd.MomentumAdjustor {
start: 1,
saturate: 10,
final_momentum: .99
}
]
}

You can locate the error more easily by processing each line separately instead of your whole string.
Replace
trainCV = trainCV % (train_params)
with
trainCV = trainCV.split('\n')
t1=[]
try:
for i, t in enumerate(trainCV):
t1.append(t % (train_params))
except :
print 'Error in line {}:'.format(i)
print t[i]
raise
and you will get the following output:
78
start: %(test_start),
meaning your string formating didn't quite work (in this case I think there is the i after the bracket missing). Debug your large string in this way and you should have a working code.
After that is done you can print it by joining the list:
print '\n'.join(t1)

Related

Elasticsearch - How to create buckets by using information from two fields at the same time?

My documents are like this:
{'start': 0, 'stop': 3, 'val': 3}
{'start': 2, 'stop': 4, 'val': 1}
{'start': 5, 'stop': 6, 'val': 4}
We can imagine that each document occupies the x-coordinates from 'start' to 'stop',
and has a certain value 'val' ('start' < 'stop' is guaranteed).
The goal is to plot a line showing the sum of these values 'val' from all the
documents which occupy an x-coordinate:
this graph online
In reality there are many documents with many different 'start' and 'stop' coordinates. Speed is important, so:
Is this possible to do with at most a couple of elastic search requests? how?
What I've tried:
With one elastic search request we can get the min_start, and max_stop coordinates. These will be the boundaries of x.
Then we divide the x-coordinates into N intervals, and in a loop for each interval we make an elastic search request: to filter out all the documents which lie completely outside of this interval, and do a sum aggregation of 'val'.
This approach takes too much time because there are N+1 requests, and if we want to have a line with higher precision, the time will increase linearly.
Code:
N = 300 # number of intervals along x
x = []
y = []
data = es.search(index='index_name',
body={
'aggs': {
'min_start': {'min': {'field': 'start'}},
'max_stop': {'max': {'field': 'stop'}}
}
})
min_x = data['aggregations']['min_start']['value']
max_x = data['aggregations']['max_stop']['value']
x_from = min_x
x_step = (max_x - min_x) / N
for _ in range(N):
x_to = x_from + x_step
data = es.search(
index='index_name',
body= {
'size': 0, # to not return any actual documents
'query': {
'bool': {
'should': [
# start is in the current x-interval:
{'bool': {'must': [
{'range': {'start': {'gte': x_from}}},
{'range': {'start': {'lte': x_to}}}
]}},
# stop is in the current x-interval:
{'bool': {'must': [
{'range': {'stop': {'gte': x_from}}},
{'range': {'stop': {'lte': x_to}}}
]}},
# current x-interval is inside start--stop
{'bool': {'must': [
{'range': {'start': {'lte': x_from}}},
{'range': {'stop': {'gte': x_to}}}
]}}
],
'minimum_should_match': 1 # at least 1 of these 3 conditions should match
}
},
'aggs': {
'vals_sum': {'sum': {'field': 'val'}}
}
}
)
# Append info to the lists:
x.append(x_from)
y.append(data['aggregations']['vals_sum']['value'])
# Next x-interval:
x_from = x_to
from matplotlib import pyplot as plt
plt.plot(x, y)
The right way to do this in one single query is to use the range field type (available since 5.2) instead of using two fields start and stop and reimplementing the same logic. Like this:
PUT test
{
"mappings": {
"properties": {
"range": {
"type": "integer_range"
},
"val": {
"type":"integer"
}
}
}
}
Your documents would look like this:
{
"range" : {
"gte" : 0,
"lt" : 3
},
"val" : 3
}
And then the query would simply leverage an histogram aggregation like this:
POST test/_search
{
"size": 0,
"aggs": {
"histo": {
"histogram": {
"field": "range",
"interval": 1
},
"aggs": {
"total": {
"sum": {
"field": "val"
}
}
}
}
}
}
And the results are as expected: 3, 3, 4, 1, 0, 4

How do i delete an element of a json object with python?

I'm trying to delete elements from _notes that have _type as 1, but i keep getting an error and I'm not sure what it means, nor do I know how to fix it.. can anyone help me?
My trimmed JSON:
{
"_notes": [
{
"_time": 10,
"_lineIndex": 2,
"_lineLayer": 0,
"_type": 0,
"_cutDirection": 7
},
{
"_time": 12,
"_lineIndex": 2,
"_lineLayer": 0,
"_type": 1,
"_cutDirection": 1
},
{
"_time": 14,
"_lineIndex": 2,
"_lineLayer": 1,
"_type": 1,
"_cutDirection": 0
}
]
}
My python script:
#!/usr/bin/python3
import json
obj = json.load(open("ExpertPlusStandardd.dat"))
for i in range(len(obj["_notes"])):
print(obj["_notes"][i]["_type"])
if obj["_notes"][i]["_type"] == 1:
obj.pop(obj["_notes"][i])
open("test.dat", "w").write(
json.dumps(obj, indent=4, separators=(',', ': '))
)
Error:
Traceback (most recent call last): File "C:\programming\python\train_one_hand\run.py", line 9, in <module> obj.pop(obj["_notes"][i]) TypeError: unhashable type: 'dict'
It is usually a bad idea to delete from a list that you're iterating. Reverse iterating avoids some of the pitfalls, but it is much more difficult to follow code that does that, so usually you're better off using a list comprehension or filter.
obj["_notes"] = [x for x in obj["_notes"] if x["_type"] != 1]
This gives us the expected output :
{'_notes':
[
{
'_time': 10,
'_lineIndex': 2,
'_lineLayer': 0,
'_type': 0,
'_cutDirection': 7
}
]
}

Passing dictionary as parameter to a function

So, I am working on a project in which the user gives inputs in the json file and the parser reads data from the json file and then creates a data structure to which gets updated to the inputs mentioned in the data file.
My json file(input_file.json5) looks like this:
{
"clock_frequency": 25000,
"Triggering_Mode": "positive_edge_triggered",
"Mode": "Offline",
"overshoot": 0.05,
"duty_cycle": 0.5,
"amplitude/high_level": 1,
"offset/low_level": 0
}
The data structure(data_struc.py) looks like this:
Parameters={
"Global_parameters": {
"frequency": 3000,
"Triggering_Mode": "positive_edge_triggered"
},
"Executor_param": {
"Mode": "Offline"
},
"Waveform_Settings": {
"overshoot": 0.05,
"duty_cycle": 0.5,
"amplitude/high_level": 1,
"offset/low_level": 0,
}
}
The code for the parser is:
import json5
from data_struc import Parameters
class Parser(object):
def read_input_file(self, path_name, file_name):
input_file = open(path_name + file_name + '.json5')
data = json5.load(input_file)
print(Parameters['Global_parameters'])
parameters = self.parse_parameters(data)
input_file.close()
return parameters
def parser_parameters(self, data):
parameter = {
"Global_parameters": {
"frequency": data[clock_frequency]
"Triggering_Mode": data[Triggering_Mode]
}
}
return parameter
I want to pass data as a parameter to the function and I want to update the contents of the data structure using the value of the data(passed as dictionary) to the function. How do I implement the function parser_parameters?
Here is a one-liner to map the data to a schema if you can change the schema, you could also just go and grab the keys instead of creating a list of items to match. This formats the data to the schema based on matching keys:
EDIT: added 'Data' tag to the schema and output for nested list data
schema = {
'Global_parameters': [
'clock_frequency', # I noticed you had this as just 'clock' in your desired outuput
'Triggering_Mode'
],
'Executor_param': [
'Mode'
],
'Waveform_Settings': [
'overshoot',
'duty_cycle',
'amplitude/high_level',
'offset/low_level'
],
'Data': {
'Packet'
}
}
data = {
"clock_frequency": 25000,
"Triggering_Mode": "positive_edge_triggered",
"Mode": "Offline",
"overshoot": 0.05,
"duty_cycle": 0.5,
"amplitude/high_level": 1,
"offset/low_level": 0,
"Packet": [
{"time_index":0.1, "data":0x110},
{"time_index":1.21, "data":123},
{"time_index":2.0, "data": 0x45}
]
}
# "one line" nested dict comprehension
data_structured = {k0: {k1: v1 for k1, v1 in data.items() if k1 in v0} # in v0.keys() if you are using the structure you have above
for k0, v0 in schema.items()}
import json
print(json.dumps(data_structured, indent=4)) # pretty print in json format
Output:
{
"Global_parameters": {
"clock_frequency": 25000,
"Triggering_Mode": "positive_edge_triggered"
},
"Executor_param": {
"Mode": "Offline"
},
"Waveform_Settings": {
"overshoot": 0.05,
"duty_cycle": 0.5,
"amplitude/high_level": 1,
"offset/low_level": 0
},
"Data": {
"Packet": [
{
"time_index": 0.1,
"data": 272
},
{
"time_index": 1.21,
"data": 123
},
{
"time_index": 2.0,
"data": 69
}
]
}
}

tf.train.SequenceExample with lists at each step

How is a list of values passed into a feature_list? The documentation suggests that this is valid but it is not clear how this is accomplished given that passing a list result in an error.
>>> tf.train.SequenceExample().feature_lists.feature_list["multiple"].feature.add().int64_list.value.append([1,2,3,4])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal /containers.py", line 251, in append
self._values.append(self._type_checker.CheckValue(value))
File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal /type_checkers.py", line 132, in CheckValue
raise TypeError(message)
TypeError: [1, 2, 3, 4] has type <type 'list'>, but expected one of: (<type 'int'>, <type 'long'>)
This is an example given in the example proto file.
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/example/example.proto
// Conditionally conformant FeatureLists, the parser configuration determines
// if the feature sizes must match:
// feature_lists: { feature_list: {
// key: "movie_ratings"
// value: { feature: { float_list: { value: [ 4.5 ] } }
// feature: { float_list: { value: [ 5.0, 6.0 ] } } }
// } }
Is it necessary to use something other than append when adding a list?
An example of a sequence would be...
[[1,2,3],[4,5],[6,7],[8,9,10]]
...where there are four steps in this sequence, and at each step there is a set of values. The desired result would look something like the example below.
feature_lists: { feature_list: {
key: "movie_ratings"
value: { feature: { float_list: { value: [ 1, 2, 3 ] } }
feature: { float_list: { value: [ 4, 5 ] } }
feature: { float_list: { value: [ 6, 7 ] } }
feature: { float_list: { value: [ 8, 9, 10 ] } } }
} }
Use extend instead of append.
tf.train.SequenceExample().feature_lists.feature_list["multiple"].feature.add().int64_list.value.extend([1,2,3,4])

how to map the string using python

this is my code :
a= ''' ddwqqf{x}'''
def b():
...
c=b(a,{'x':'!!!!!'})
print c
i want to get ddwqqf!!!!! ,
so how to create the b function ,
thanks
updated:
but how to do this thing :
a= ''' ddwqqf{x},{'a':'aaaa'}'''
c = a.format(x="!!!!!")
d= open('a.txt','a')
d.write(c)
it show error :
Traceback (most recent call last):
File "d.py", line 8, in <module>
c = a.format(x="!!!!!")
KeyError: "'a'"
updated2:
this is the string:
'''
{
'skill': {x_1},
'power': {x_2},
'magic': {x_3},
'level': {x_4},
'weapon': {
0 : {
'item': {
'weight': 40,
'target': 1,
'defence': 100,
'name': u'\uff75\uff70\uff78\uff7f\uff70\uff84',
'attack': 100,
'type': 1
},
},
1 : {
'item': {
'weight': 40,
'target': 1,
'defence': 100,
'name': u'\uff75\uff70\uff78\uff7f\uff70\uff84',
'attack': 100,
'type': 1
},
},
2 : {
'item': {
'weight': 40,
'target': 1,
'defence': 100,
'name': u'\uff75\uff70\uff78\uff7f\uff70\uff84',
'attack': 100,
'type': 1
},
}
......
}
}
'''
Try
def b(a, d):
return a.format(**d)
This works in Python 2.6 or above. Of course you would not need to define a function for this:
a = " ddwqqf{x}"
c = a.format(x="!!!!!")
will be enough.
Edit regarding your update:
a = " ddwqqf{x},{{'a':'aaaa'}}"
to avoid substitutions for the second pair of braces.
Another Edit: I don't really know where your string comes from and what's the context of all this. One solution might be
import re
d = {"x_1": "1", "x_2": "2", "x_3": "3", "x_4": "4"}
re.sub(r"\{([a-z_0-9]+)\}", lambda m: d[m.group(1)], s)
where s is your string.

Categories