Below code is working fine
class p:
def __init__(self):
self.log={
'name':'',
'id':'',
'age':'',
'grade':''
}
def parse(self,line):
self.log['id']=line[0]
self.log['name']=line[1]
self.log['age']=line[2]
self.log['grade']=line[3].replace('\n',"")
return self.log
obj=p()
with open(r"C:\Users\sksar\Desktop\Azure DE\Datasets\dark.csv",'r') as fp:
line=fp.read()
data=[i.split(',') for i in line.split('\n')]
for i in data:
a=obj.parse(i)
print(a)
Input:
1,jonas,23,A
2,martha,23,B
Output:
{'name': 'jonas', 'id': '1', 'age': '23', 'grade': 'A'}
{'name': 'martha', 'id': '2', 'age': '23', 'grade': 'B'}
Question is: When i make a method call(a=obj.parse(i)) out of the loop, inputs are overwritten and give below as o/p {'name': 'martha', 'id': '2', 'age': '23', 'grade': 'B'} simply missing the previous records.
How to make a method(parse) call without having to iterate through nested loop(Input data) and feed data to the method call? simply how to get the desired output without for loop...
I dont get why you are trying to avoid an explicit loop. I mean, even if you don't see it in your code, if there is something being iterated, there will be a loop somewhere, and if so, "explicit is better than implicit".
In any case, check this:
with open(r"C:\Users\sksar\Desktop\Azure DE\Datasets\dark.csv",'r') as fp:
[print(obj.parse(x.split(','))) for x in fp.readlines()]
Related
If I have a dictionary with data in it like below what process should i enact like an if statement to delete duplicate entries such as nested dictionary 1 and 4. Lets say i wanted to delete 4 because the user entered it and i'm assuming that people are unique so they can't have the same demographics there can't be two John R. Smiths.
people = {1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'},
2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}
3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'},
4: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}}
I am just learning so i wouldn't be surprised if there is something simple I was unable to come up with.
I attempted to compare the entries such as if ['1']['name'] and ['1']['sex'] == ['4']['name'] and ['4']['sex']:
then print['4'] just to test and the error message told me that I need to be using indexes.
I've also turned it into a list which was successfull but was met with another error when trying to compare them in a manner like if person['name'] and person['age'] and person['sex'] is equal to another row within a four loop than print a message and i got nowhere.
I've also tried to turn it into a dataframe and use pandas duplicate function to remove the duplicates in which I got some error
yesterday about 'dict' probably because the dictionaries get nested in the dataframe contrasting to a list with nested
dictionaries which tends to look like this:
[{1: {'name': 'John', 'age': '27', 'sex': 'Male'},
2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}]
You can take advantage of the fact that dict keys are always unique to help de-duplicate. Since dicts are unhashable and can't be used as keys directly, you can convert each sub-dict to a tuple of items first. Use dict.setdefault to keep only the first value for each distinct key:
records = {}
for number, record in people.items():
records.setdefault(tuple(record.items()), (number, record))
print(dict(records.values()))
Given your sample input, this outputs:
{1: {'name': 'John R. Smith', 'age': '27', 'sex': 'Male'}, 2: {'name': 'Marie', 'age': '22', 'sex': 'Female'}, 3: {'name': 'Mariah', 'age': '32', 'sex': 'Female'}}
Demo: https://replit.com/#blhsing/LonelyNumbWatch
One approach is to build a new dictionary by iterating over people and assigning a person to the new dictionary if their data is unique. The following solution uses a set for tracking unique users:
from pprint import pprint
unique_people = {}
unique_ids = set()
for key, data in people.items():
data_id = tuple(data.values())
if data_id in unique_ids:
continue
unique_people[key] = data
unique_ids.add(data_id)
pprint(unique_people)
Output:
{1: {'age': '27', 'name': 'John R. Smith', 'sex': 'Male'},
2: {'age': '22', 'name': 'Marie', 'sex': 'Female'},
3: {'age': '32', 'name': 'Mariah', 'sex': 'Female'}}
I have such a json file that combines all previous data storing versions. An example would be like this:
myList = {1: {'name': 'John', 'age': '27', 'class'= '2', 'drop' = True},
2: {'name': 'Marie', 'other_info': {'age': '22', 'class'= '3', 'dropped'= True }},
3: {'name': 'James', 'other_info': {'age': '23', 'class'= '1', 'is_dropped'= False}},
4: {'name': 'Lucy', 'some_info': {'age': '20', 'class'= '4', 'other_branch': {'is_dropped' = True, 'how_drop'= 'Foo'}}}}
I want to reach the information that contains drop in the key or subkey . I don't know all the dictionary structures, there can be 20 or more. All I know is that they all contain the phrase 'drop'. There might be other phrases that might contain the phrase 'drop', as well but they are not too much. If there are multiple drops, I can manually adjust which one to take.
I tried to flatten, but after flattening every dictionary item had a different key name.
There are other information I'd like to reach but most of these attributes have the similar problem, too.
I want to get the True, True, False, True values in the drop, dropped, and is_dropped keys.
How can I reach this node?
You can use recursion to solve this:
def get_drop(dct):
for key, val in dct.items():
if isinstance(key, str) and 'drop' in key and isinstance(val, bool):
yield val
elif isinstance(val, dict):
yield from get_drop(val)
print(list(get_drop(myList)))
[True, True, False, True]
Create a recursive function to search and adds to the incrementally keys. Without putting in details in safety checks, you can do something like this:
def find(input_dict, base='', search_key='drop'):
found_paths = []
if search_key in input_dict.keys():
found_paths.append(base)
for each_key in input_dict.keys():
if isinstance(input_dict[each_key], dict):
new_base = base + '.' + each_key
found_paths += find(input_dict[each_key], base=new_base, search_key=search_key)
return found_paths
I want to copy a common dictionary
list_common_dictionary = [{'Gender':'M', 'Age':'25'}]
inside a data list dictionary \
list_data_dictionary = [{'name':'john','id':'1'},
{'name':'albert','id':'2'},
{'name':'jasper','id':'3'},
{'name':'guillaume','id':'4'}]
and get an output like :
output_dictionary = [{'Gender':'M', 'Age':'25','name':'john','id':'1'},
{'Gender':'M', 'Age':'25','name':'albert','id':'2'},
{'Gender':'M', 'Age':'25','name':'jasper','id':'3'},
{'Gender':'M', 'Age':'25','name':'guillaume','id':'4'}]
But respect the order of (fields of the common dictionary must be at the beginning of each output dictionary.
Regarding time cpu consumption, is deepcopy the most efficient way ?
Use:
result = [{**list_common_dictionary[0], **d} for d in list_data_dictionary]
print(result)
Output
[{'Gender': 'M', 'Age': '25', 'name': 'john', 'id': '1'}, {'Gender': 'M', 'Age': '25', 'name': 'albert', 'id': '2'}, {'Gender': 'M', 'Age': '25', 'name': 'jasper', 'id': '3'}, {'Gender': 'M', 'Age': '25', 'name': 'guillaume', 'id': '4'}]
Dictionaries keep insertion order in Python 3.6+ so this will guarantee that the keys from the common dictionary are the first ones.
You can use update in-place dictionary like below:
You can read here:
For those coming late to the party, I had put some timing together (Py 3.7), showing that .update() based methods look a bit (~5%) faster when inputs are preserved and noticeably (~30%) faster when just updating in-place.
>>> for ldd in list_data_dictionary:
... ldd.update(*ist_common_dictionary)
>>> list_data_dictionary
[{'name': 'john', 'id': '1', 'Gender': 'M', 'Age': '25'},
{'name': 'albert', 'id': '2', 'Gender': 'M', 'Age': '25'},
{'name': 'jasper', 'id': '3', 'Gender': 'M', 'Age': '25'},
{'name': 'guillaume', 'id': '4', 'Gender': 'M', 'Age': '25'}]
How can I verify the 'context' of a filter statement (normally I would just use a print in a function)? For example:
data=[{'Name':'Greg', 'Age': 10}, {'Name': 'Sarah', 'Age': 20}]
filter(lambda item: item['Name'] == 'Greg', data)
# [{'Age': 10, 'Name': 'Greg'}]
Instead of passing it a lambda, define a function that has the print statement and filter using that.
data=[{'Name':'Greg', 'Age': 10}, {'Name': 'Sarah', 'Age': 20}]
def my_filter(item):
print(f'from inside filter: {item}')
return item['Name'] == 'Greg'
print(list(filter(my_filter, data)))
I'm not going to claim that this is good practice, but for very quick debugging, you can use a tuple inside the lambda to shimmy a call to print in:
data=[{'Name':'Greg', 'Age': 10}, {'Name': 'Sarah', 'Age': 20}]
print(*filter(lambda item: (print(item), item['Name'] == 'Greg')[1], data))
{'Name': 'Greg', 'Age': 10}
{'Name': 'Sarah', 'Age': 20}
{'Name': 'Greg', 'Age': 10}
lambdas expects a single expression that will eventually be returned. If you want to add in "sequencing" of operations, you need to get a bit creative (although I don't recommend it for real, lasting code).
#juanpa.arrivillaga's idea is cleaner than mine:
lambda item: print(item) or item["Name"] == "Greg"
But it's the same idea. You need to put the print inside in such a way that the internal expression will evaluate in the end to item["Name"] == "Greg". I used the evaluation/indexing of a sequence to do that, while they used the behavior of or.
Please help me. I have dataset like this:
my_dict = { 'project_1' : [{'commit_number':'14','name':'john'},
{'commit_number':'10','name':'steve'}],
'project_2' : [{'commit_number':'12','name':'jack'},
{'commit_number':'15','name':'anna'},
{'commit_number':'11','name':'andy'}]
}
I need to sort the dataset based on the commit number in descending order and make it into a new list by ignoring the name of the project using python. The list expected will be like this:
ordered_list_of_dict = [{'commit_number':'15','name':'anna'},
{'commit_number':'14','name':'john'},
{'commit_number':'12','name':'jack'},
{'commit_number':'11','name':'andy'},
{'commit_number':'10','name':'steve'}]
Thank you so much for helping me.
Extract my_dict's values as a list of lists*
Join each sub-list together (flatten dict_values) to form a flat list
Sort each element by commit_number
*list of lists on python2. On python3, a dict_values object is returned.
from itertools import chain
res = sorted(chain.from_iterable(my_dict.values()),
key=lambda x: x['commit_number'],
reverse=True)
[{'commit_number': '15', 'name': 'anna'},
{'commit_number': '14', 'name': 'john'},
{'commit_number': '12', 'name': 'jack'},
{'commit_number': '11', 'name': 'andy'},
{'commit_number': '10', 'name': 'steve'}]
On python2, you'd use dict.itervalues instead of dict.values to the same effect.
Coldspeed's answer is great as usual but as an alternative, you can use the following:
ordered_list_of_dict = sorted([x for y in my_dict.values() for x in y], key=lambda x: x['commit_number'], reverse=True)
which, when printed, gives:
print(ordered_list_of_dict)
# [{'commit_number': '15', 'name': 'anna'}, {'commit_number': '14', 'name': 'john'}, {'commit_number': '12', 'name': 'jack'}, {'commit_number': '11', 'name': 'andy'}, {'commit_number': '10', 'name': 'steve'}]
Note that in the list-comprehension you have the standard construct for flattening a list of lists:
[x for sublist in big_list for x in sublist]
I'll provide the less-pythonic and more reader-friendly answer.
First, iterate through key-value pairs in my_dict, and add each element of value to an empty list. This way you avoid having to flatten out a list of lists:
commits = []
for key, val in my_dict.items():
for commit in val:
commits.append(commit)
which gives this:
In [121]: commits
Out[121]:
[{'commit_number': '12', 'name': 'jack'},
{'commit_number': '15', 'name': 'anna'},
{'commit_number': '11', 'name': 'andy'},
{'commit_number': '14', 'name': 'john'},
{'commit_number': '10', 'name': 'steve'}]
Then sort it in descending order:
sorted(commits, reverse = True)
This will sort based on 'commit_number' even if you don't specify it because it comes alphabetically before 'name'. If you want to specify it for the sake of defensive coding, this would be fastest and cleanest way, to the best of my knowledge :
from operator import itemgetter
sorted(commits, key = itemgetter('commit_number'), reverse = True)