I store data in csv file as follwing:
row = dict_items([('Time ': '2017-12-01T13:54:04'), ('Energy [kWh]': '0.01'), ('Voltage [V]': '221.64'), ('Current [A]': '0.08')])
Now i want to store it like that:
('Time ': '2017-12-01T13:54:04', 'Energy [kWh]': '0.01'),
('Time ': '2017-12-01T13:54:04','Voltage [V]': '221.64'),
('Time ': '2017-12-01T13:54:04','Current [A]': '0.08')])
so i wrote this code bellow and i define
Device=""
Value=""
for key, value in row.items():
print(row.items())
if key == 'Time':
Timevalue = value
print(Zeitvalue)
Device = key
Value = value
doc = {'Device':Device, 'Measure':Value , 'Time':Timevalue }
i got this error:
NameError: name 'Timevalue' is not defined
How can i make the Timevalue variable globale to avoid this Problem?
Thank you
That ('Time ': '2017-12-01T13:54:04', 'Energy [kWh]': '0.01') syntax is not valid Python. Assuming you actually want a list of dictionaries, it's not too hard to perform that conversion.
You first need to grab the time stamp so it can be combined with each of the other items, and then you can drop it from the dict to make it easier to copy the remaining items to the new structure. Note that this modifies the original dict passed to the function. If you don't want that, you can either make a copy of the passed-in dict, or put an if test in the loop that copies the data to the new structure so that the time item is skipped.
def convert(old):
time_key = 'Time '
# Save the time
time_item = (time_key, old[time_key])
# Add remove it
del old[time_key]
# Copy remaining items to new dicts and save them in a list
return [dict([time_item, item]) for item in old.items()]
row = {
'Time ': '2017-12-01T13:54:04',
'Energy [kWh]': '0.01',
'Voltage [V]': '221.64',
'Current [A]': '0.08',
}
new_data = convert(row)
for d in new_data:
print(d)
output
{'Time ': '2017-12-01T13:54:04', 'Energy [kWh]': '0.01'}
{'Time ': '2017-12-01T13:54:04', 'Voltage [V]': '221.64'}
{'Time ': '2017-12-01T13:54:04', 'Current [A]': '0.08'}
Here's how to do it if you don't want to mutate or copy the original dict:
def convert(old):
time_key = 'Time '
# Save the time
time_item = (time_key, old[time_key])
# Copy other items to new dicts and save them in a list
return [dict([time_item, (key, val)])
for key, val in old.items() if key != time_key]
Note that this is less efficient because it has to test every key to ensure it's not the time key.
To save the data in a list of OrderedDicts, we need to change the logic slightly. We also need to create the OrderedDicts properly so that the items are in the desired order.
from collections import OrderedDict
from pprint import pprint
def convert(row):
time_key = 'Time '
time_value = row[time_key]
new_data = []
for key, val in row.items():
if key == time_key:
continue
new_data.append(OrderedDict(Device=key, Measure=val, Time=time_value))
return new_data
row = {
'Time ': '2017-12-01T13:54:04', 'Energy [kWh]': '0.01',
'Voltage [V]': '221.64', 'Current [A]': '0.08'
}
new_data = convert(row)
pprint(new_data)
output
[OrderedDict([('Device', 'Energy [kWh]'),
('Measure', '0.01'),
('Time', '2017-12-01T13:54:04')]),
OrderedDict([('Device', 'Voltage [V]'),
('Measure', '221.64'),
('Time', '2017-12-01T13:54:04')]),
OrderedDict([('Device', 'Current [A]'),
('Measure', '0.08'),
('Time', '2017-12-01T13:54:04')])]
Related
I have the following function which produces results;
myNames = ['ULTA', 'CSCO', ...]
def get_from_min_match(var):
temp = []
count_elem = generate_elem_count()
for item in count_elem:
if var <= count_elem[item]:
temp.append(item)
return set(temp) if len(set(temp)) > 0 else "None"
def generate_elem_count():
result_data = []
for val in mapper.values():
if type(val) == list:
result_data += val
elif type(val) == dict:
for key in val:
result_data.append(key)
count_elem = {elem: result_data.count(elem) for elem in result_data}
return count_elem
I call this function like this;
myNames_dict_1 = ['AME', 'IEX', 'PAYC']
myNames_dict_1 = ['ULTA', 'CSCO', 'PAYC']
mapper = {1: myNames_dict_1, 2: myNames_dict_2}
print(" These meet three values ", get_from_min_match(3))
print(" These meet four values ", get_from_min_match(4))
The output I get from these functions are as follows;
These meet three values {'ULTA', 'CSCO', 'SHW', 'MANH', 'TTWO', 'SAM', 'RHI', 'PAYC', 'AME', 'CCOI', 'RMD', 'AMD', 'UNH', 'AZO', 'APH', 'EW', 'FFIV', 'IEX', 'IDXX', 'ANET', 'SWKS', 'HRL', 'ILMN', 'PGR', 'ATVI', 'CNS', 'EA', 'ORLY', 'TSCO'}
These meet four values {'EW', 'PAYC', 'TTWO', 'AME', 'IEX', 'IDXX', 'ANET', 'RMD', 'SWKS', 'HRL', 'UNH', 'CCOI', 'ORLY', 'APH', 'PGR', 'TSCO'}
Now, I want to insert the output, of the get_from_min_match function into a Sqlite database. Its structure looks like this;
dbase.execute("INSERT OR REPLACE INTO min_match (DATE, SYMBOL, NAME, NUMBEROFMETRICSMET) \
VALUES (?,?,?,?)", (datetime.today(), symbol, name, NUMBEROFMETRICSMET?))
dbase.commit()
So, it's basically a new function to calculate the "NUMBEROFMETRICSMET" parameter rather than calling each of these functions many times. And I want the output of the function inserted into the database. How to achieve this? Here 3, 4 would be the number of times the companies matched.
date ULTA name 3
date EW name 4
...
should be the result.
How can I achieve this? Thanks!
I fixed this by just using my already written function;
count_elem = generate_elem_count()
print("Count Elem: " + str(count_elem))
This prints {'AMPY': 1} and so on.
i use this code to store items of dictionaries in doc variable.
This code works fine but I miss the first element of time because of the if statement.
def convert(old):
time_key = 'Time '
# Save the time
time_item = (time_key, old[time_key])
# Add remove it
del old[time_key]
# Copy remaining items to new dicts and save them in a list
return [dict([time_item, item]) for item in old.items()]
row = {
'Time ': '2017-12-01T13:54:04',
'Energy [kWh]': '0.01',
'Voltage [V]': '221.64',
'Current [A]': '0.08',
}
new_data = convert(row)
#print(new_data)
Zeitvalue= ""
Device=""
Value=""
for d in new_data:
#print(d)
for key, value in d.items():
if key == 'Time ':
Zeitvalue = value
#print(value)
continue
else:
Device = key
Value = value
doc = {'Time ':Zeitvalue,'Device':Device, 'Measure':Value}
print("This is doc variable:",doc) # doc vaiable with missed time element
SO when i print doc i got this
Output:
doc: {'Device': 'Voltage [V]', 'Measure': '221.64', 'Time ': ''} # **ISSUE: variable time is missed here, How to fix it ?**
doc: {'Device': 'Current [A]', 'Measure': '0.08', 'Time ': '2017-12-01T13:54:04'}
doc: {'Device': 'Energy [kWh]', 'Measure': '0.01', 'Time ': '2017-12-01T13:54:04'}
See the below changes in the code. Remove continue statement. Also assign value to doc after the inner loop for dictionary is over as you need all three values.
for d in new_data:
#print(d)
for key, value in d.items():
if key == 'Time ':
Zeitvalue = value
#print(value)
else:
Device = key
Value = value
doc = {'Time ':Zeitvalue,'Device':Device, 'Measure':Value}
print(doc)
if you are just setting values then place the doc assignment outside the for loop
for d in new_data:
for key, value in d.items():
if key == 'Time ':
Zeitvalue = value
continue
else:
Device = key
Value = value
doc = {'Time ':Zeitvalue,'Device':Device, 'Measure':Value}
you have problem in this line:
doc = {'Time ':Zeitvalue,'Device':Device, 'Measure':Value} when you use it inside the for loop! , each iteration overrides the previous assignment, furthermore - you cause unexpected behavior , since dictionary is not order data structure - meaning : if you encountered "tine" key first - it will work fine , but if you did not encountered 'time' first - the value of it is still == "" , since you initiate it to that value and you did not updated it since.
move the doc = {'Time ':Zeitvalue,'Device':Device, 'Measure':Value} to the outer loop , and not the one going over each key and value and you will be fine.
I have a CSV file that I've filtered into a list and grouped. Example:
52713
['52713', '', 'Vmax', '', 'Start Value', '', '\n']
['52713', '', 'Vmax', '', 'ECNumber', '1.14.12.17', '\n']
['52713', 'O2', 'Km', 'M', 'Start Value', '3.5E-5', '\n']
['52713', 'O2', 'Km', 'M', 'ECNumber', '1.14.12.17', '\n']
52714
['52714', '', 'Vmax', '', 'Start Value', '', '\n']
['52714', '', 'Vmax', '', 'ECNumber', '1.14.12.17', '\n']
['52714', 'O2', 'Km', 'M', 'Start Value', '1.3E-5', '\n']
['52714', 'O2', 'Km', 'M', 'ECNumber', '1.14.12.17', '\n']
From this, I create a nested dictionary with the structure:
dict = ID number:{Km:n, Kcat:n, ECNumber:n}
...for every ID in the list.
I use the following code to create this dictionary
dict = {}
for key, items in groupby(FilteredTable1[1:], itemgetter(0)):
#print key
for subitem in items:
#print subitem
dict[subitem[EntryID]] = {}
dict[subitem[EntryID]]['EC'] = []
dict[subitem[EntryID]]['Km'] = []
dict[subitem[EntryID]]['Kcat'] = []
if 'ECNumber' in subitem:
dict[subitem[EntryID]]['EC'] = subitem[value]
if 'Km' in subitem and 'Start Value' in subitem:
dict[subitem[EntryID]]['Km'] = subitem[value]
#print subitem
This works for the ECNumber value, but not the Km value. It can print the line, showing that it identifies the Km value as being present, but doesn't put it in the dictionary.
Example output:
{'Km': [], 'EC': '1.14.12.17', 'Kcat': []}
Any ideas?
Ben
The problem is that your inner for loop keeps reinitializing dict[subitem[EntryID]] even though it may already exist. That's fixed in the following by explicitly checking to see if it's already there:
dict = {}
for key, items in groupby(FilteredTable1[1:], itemgetter(0)):
#print key
for subitem in items:
#print ' ', subitem
if subitem[EntryID] not in dict:
dict[subitem[EntryID]] = {}
dict[subitem[EntryID]]['EC'] = []
dict[subitem[EntryID]]['Km'] = []
dict[subitem[EntryID]]['Kcat'] = []
if 'ECNumber' in subitem:
dict[subitem[EntryID]]['EC'] = subitem[value]
if 'Km' in subitem and 'Start Value' in subitem:
dict[subitem[EntryID]]['Km'] = subitem[value]
#print subitem
However this code could be made more efficient by using something like the following instead, which avoids recomputing values and double dictionary lookups. It also doesn't use the name of a built-in type for a variable name, which goes against the guidelines given in the PEP8 - Style Guide for Python Code. It also suggests using CamelCase only for class names, not for variable names like FilteredTable1 — but I didn't change that.
adict = {}
for key, items in groupby(FilteredTable1[1:], itemgetter(0)):
#print key
for subitem in items:
#print ' ', subitem
entry_id = subitem[EntryID]
if entry_id not in adict:
adict[entry_id] = {'EC': [], 'Km': [], 'Kcat': []}
entry = adict[entry_id]
if 'ECNumber' in subitem:
entry['EC'] = subitem[value]
if 'Km' in subitem and 'Start Value' in subitem:
entry['Km'] = subitem[value]
#print subitem
Actually, since you're building a dictionary of dictionaries, it's not clear that there's any advantage to using groupby to do so.
I'm posting this to follow-up and extend on my previous answer.
For starters, you could streamline the code a little further by eliminating the need to check for preexisting entries simply making the dictionary being created a collections.defaultdict dict subclass instead of a regular one:
from collections import defaultdict
adict = defaultdict(lambda: {'EC': [], 'Km': [], 'Kcat': []})
for key, items in groupby(FilteredTable1[1:], itemgetter(0)):
for subitem in items:
entry = adict[subitem[EntryID]]
if 'ECNumber' in subitem:
entry['EC'] = subitem[value]
if 'Km' in subitem and 'Start Value' in subitem:
entry['Km'] = subitem[value]
Secondly, as I mentioned in the other answer, I don't think you're gaining anything by using itertools.groupby() to do this — except making the process more complicated than needed. This is a because basically what you're doing is making a dictionary-of-dictionaries whose entries can all be randomly accessed, so there's no benefit in going to the trouble of grouping them before doing so. The code below proves this (in conjunction with using a defaultdict as shown above):
adict = defaultdict(lambda: {'EC': [], 'Km': [], 'Kcat': []})
for subitem in FilteredTable1[1:]:
entry = adict[subitem[EntryID]]
if 'ECNumber' in subitem:
entry['EC'] = subitem[value]
if 'Km' in subitem and 'Start Value' in subitem:
entry['Km'] = subitem[value]
I'm struggling with a recursive merge problem.
Let's say I have:
a=[{'name':"bob",
'age':10,
'email':"bob#bla",
'profile':{'id':1, 'role':"admin"}},
{'name':"bob",
'age':10,
'email':"other mail",
'profile':{'id':2, 'role':"dba"},
'home':"/home/bob"
}]
and I need something to recursively merge entries. If value for an existing given key on the same level is different it appends the value to an array.
b = merge(a)
print b
{'name':"bob",
'age':10,
'email':["bob#bla","other mail"],
'profile':{'id':[1,2], 'role'=["admin", "dba"], 'home':"/home/bob"}
I wrote this code:
def merge(items):
merged = {}
for item in items:
for key in item.keys():
if key in merged.keys():
if item[key] != merged[key]:
if not isinstance(merged[key], list):
merged[key] = [merged[key]]
if item[key] not in merged[key]:
merged[key].append(item[key])
else:
merged[key] = item[key]
return merged
The output is:
{'age': 10,
'email': ['bob#bla', 'other mail'],
'home': '/home/bob',
'name': 'bob',
'profile': [{'id': 1, 'role': 'admin'}, {'id': 2, 'role': 'dba'}]}
Which is not what I want.
I can't figure out how to deal with recursion.
Thanks :)
As you iterate over each dictionary in the arguments, then each key and value in each dictionary, you want the following rules:
If there is nothing against that key in the output, add the new key and value to the output;
If there is a value for that key, and it's the same as the new value, do nothing;
If there is a value for that key, and it's a list, append the new value to the list;
If there is a value for that key, and it's a dictionary, recursively merge the new value with the existing dictionary;
If there is a value for that key, and it's neither a list nor a dictionary, make the value in the output a list of the current value and the new value.
In code:
def merge(*dicts):
"""Recursively merge the argument dictionaries."""
out = {}
for dct in dicts:
for key, val in dct.items():
try:
out[key].append(val) # 3.
except AttributeError:
if out[key] == val:
pass # 2.
elif isinstance(out[key], dict):
out[key] = merge(out[key], val) # 4.
else:
out[key] = [out[key], val] # 5.
except KeyError:
out[key] = val # 1.
return out
In use:
>>> import pprint
>>> pprint.pprint(merge(*a))
{'age': 10,
'email': ['bob#bla', 'other mail'],
'home': '/home/bob',
'name': 'bob',
'profile': {'id': [1, 2], 'role': ['admin', 'dba']}}
I am experiencing a strange faulty behaviour, where a dictionary is only appended once and I can not add more key value pairs to it.
My code reads in a multi-line string and extracts substrings via split(), to be added to a dictionary. I make use of conditional statements. Strangely only the key:value pairs under the first conditional statement are added.
Therefore I can not complete the dictionary.
How can I solve this issue?
Minimal code:
#I hope the '\n' is sufficient or use '\r\n'
example = "Name: Bugs Bunny\nDOB: 01/04/1900\nAddress: 111 Jokes Drive, Hollywood Hills, CA 11111, United States"
def format(data):
dic = {}
for line in data.splitlines():
#print('Line:', line)
if ':' in line:
info = line.split(': ', 1)[1].rstrip() #does not work with files
#print('Info: ', info)
if ' Name:' in info: #middle name problems! /maiden name
dic['F_NAME'] = info.split(' ', 1)[0].rstrip()
dic['L_NAME'] = info.split(' ', 1)[1].rstrip()
elif 'DOB' in info: #overhang
dic['DD'] = info.split('/', 2)[0].rstrip()
dic['MM'] = info.split('/', 2)[1].rstrip()
dic['YY'] = info.split('/', 2)[2].rstrip()
elif 'Address' in info:
dic['STREET'] = info.split(', ', 2)[0].rstrip()
dic['CITY'] = info.split(', ', 2)[1].rstrip()
dic['ZIP'] = info.split(', ', 2)[2].rstrip()
return dic
if __name__ == '__main__':
x = format(example)
for v, k in x.iteritems():
print v, k
Your code doesn't work, at all. You split off the name before the colon and discard it, looking only at the value after the colon, stored in info. That value never contains the names you are looking for; Name, DOB and Address all are part of the line before the :.
Python lets you assign to multiple names at once; make use of this when splitting:
def format(data):
dic = {}
for line in data.splitlines():
if ':' not in line:
continue
name, _, value = line.partition(':')
name = name.strip()
if name == 'Name':
dic['F_NAME'], dic['L_NAME'] = value.split(None, 1) # strips whitespace for us
elif name == 'DOB':
dic['DD'], dic['MM'], dic['YY'] = (v.strip() for v in value.split('/', 2))
elif name == 'Address':
dic['STREET'], dic['CITY'], dic['ZIP'] = (v.strip() for v in value.split(', ', 2))
return dic
I used str.partition() here rather than limit str.split() to just one split; it is slightly faster that way.
For your sample input this produces:
>>> format(example)
{'CITY': 'Hollywood Hills', 'ZIP': 'CA 11111, United States', 'L_NAME': 'Bunny', 'F_NAME': 'Bugs', 'YY': '1900', 'MM': '04', 'STREET': '111 Jokes Drive', 'DD': '01'}
>>> from pprint import pprint
>>> pprint(format(example))
{'CITY': 'Hollywood Hills',
'DD': '01',
'F_NAME': 'Bugs',
'L_NAME': 'Bunny',
'MM': '04',
'STREET': '111 Jokes Drive',
'YY': '1900',
'ZIP': 'CA 11111, United States'}