match the column value based on previous paired values in python

match the column value based on previous paired values in python - python

i have one dictionary with pair of keys and values and list
dict={0.000806:1.341382,0.023886:39.63012,7.525935:63.89669,7.571048:62.47208}
list=[7.525935,7.571048,0.000806,0.023886]
with this following list and dictionaries my expected output is
expected output:
{7.525937:63.89669,
7.571048:62.47208,
0.000806:1.341382,
0.023886:0.023886}
thing_category=dict((t,c) for c,t in category_thing.items())
list=[7.525935,7.571048,0.000806,0.023886]
for stuff in list_of_things:
if stuff in category_thing:
print(stuff)
with the help of list values match the dictionary values how can i do that i tried merge and map also but it didn't worked
with help of this 2 columns i have to match the values with anotherexpecting matched values with help of values in first figure

You can use a comprehension to construct a new dict with keys in the order of the list_ entries:
dict_ = {0.000806:1.341382,0.023886:39.63012,7.525935:63.89669,7.571048:62.47208}
list_ = [7.525935,7.571048,0.000806,0.023886]
{k: dict_[k] for k in list_}
However, you might run into problems with float keys if there are slight numeric differences.

Related

Store list of dictionaries as a DataFrame

Suppose I have List of dictionaries as
l = [{'car':'good'},
{'mileage':'high'},
{'interior':'stylish'},
{'car':'bad'},
{'engine':'powerful'},
{'safety':'low'}]
Basically these are noun-adjective pairs.
How can I visualize whats the most associated list of adjective to
lets say car here.
How to convert this to Data frame? , I have tried pd.Dataframe(l)
, but here the key is not the column name so gets little bit tricky
here.
Any help would be appreciated.

Given that you want this to be done column-wise, then you have to re-structure your list of dictionaries. You need to have one dictionary to represent one row. Therefore, your example list should be (I added a second row for better explainability):
l = [
{'car':'good','mileage':'high','interior':'stylish','car':'bad','engine':'powerful','safety':'low'}, # row 1
{'car':'bad','mileage':'low','interior':'old','car':'bad','engine':'powerful','safety':'low'} # row 2
]
At this point, all you have to do is call pd.DataFrame(l).
EDIT: Based on your comments, I think you need to convert the dictionary to a list to get your desired result. Here is a quick way (I'm sure it can be much more efficient):
l = [{'car':'good'},
{'mileage':'high'},
{'interior':'stylish'},
{'car':'bad'},
{'engine':'powerful'},
{'safety':'low'}]
new_list = []
for item in l:
for key, value in item.items():
temp = [key,value]
new_list.append(temp)
df = pd.DataFrame(new_list, columns=['Noun', 'Adjective'])

You can construct your DataFrame by giving a list of tuples. To get tuples from a dict use the method items(). Construct the list of tuples with a list comprehension by taking the first tuple of each items.
import pandas as pd
df=pd.DataFrame(data=[d.items()[0] for d in l],columns=['A','B'])
print df
Gives :
A B
0 car good
1 mileage high
2 interior stylish
3 car bad
4 engine powerful
5 safety low

Can't work out how to apply multi-keyed dictionary vals back to dataframe column

I have a dictionary with 2 keys for every 1 value like so:
Initial Dict
Key : ('106338', '2006-12-27') , Value : []
Dict after populating
Key : ('106338', '2006-12-27') , Value : [8, 7, 9, 8, 7]
The value for each key pair is an array holding some amount of information which I need the length of. I created this dictionary by first itertupling across a dataframe and generating key pairs and empty arrays for each unique record. I then iterated across it again and populated the arrays with the information I need by appending values to the end of each key pair. Key pairs were generated from row values. The first item in the key is the Identification number for the asset and the second item is the date for the asset. Here is code for dict creation:
perm_dict = {}
for row in df_perm.itertuples():
perm_dict[str(row[1]),str(row[3])] = []
for row in df_perm.itertuples():
if row[6].to_datetime().date() < row[9].to_datetime().date() and row[9].to_datetime().date() < row[5].to_datetime().date():
perm_dict[str(row[1]), str(row[3])].append(row[10])
My problem is that I now need to call those values back via the key pairs by iterating through the original dataframe so I can take my array lengths and make a new column out of them. Screenshot of DataFrame:
I am having trouble working out a solution in my head for applying these counts back to the original dataframe as a new column for only the rows with key matches. I can't iterate back through to add them because then I'd be modifying my original DF and I've read that's a big no-no. Any help that you all may be able to provide would be greatly appreciated! Also please lmk if I need to include more information as I can provide more.
Edit1
Here are the outputs after running the dictionary comprehension code provided.

This might be what you are looking for.
import pandas as pd
# sample data
d = {('106338', '2006-12-27'): [8, 7, 9, 8, 7]}
df = pd.DataFrame([['106338', '2006-12-27']], columns=['Key1', 'Key2'])
# first make dictionary mapping to length of list
d_len = {k: len(v) for k, v in d.items()}
# perform mapping
df['Len'] = list(map(d_len.get, (zip(*(df[col] for col in ('Key1', 'Key2'))))))
# output
# Key1 Key2 Len
# 106338 2006-12-27 5

Reordering (ascending) two dictionaries with respect to values after the last _ in a string

l have two dictionaries which called dict_names and dict_values composed of about 20,000 keys.
Here is two elements of the dictionary :
dict_names={'Dunk_g09_c03' : [Dunk_g09_c03_0159,Dunk_g09_c03_005,Dunk_g09_c03_0149,...,Dunk_g09_c03_0001],
'Bulk_g08_c07' : [Bulk_g08_c07_0256,Bulk_g08_c07_0800,...,Bulk_g08_c07_0015]}
dict_values={'Dunk_g09_c03' : [[0.45, 078,...,016],[0.48,0.12,...,0.89],...,[0.12, 0.59,...,0.23]],
'Bulk_g08_c07' : [[0.0.1,0.17,...,0.89],[0.23,0.47,...,0.45],...,[0.12,0.15,...,0.12]]}
l want to reorder the dictionaries in ascending order following the values after the last _ in dict_names values . For instance :
'Dunk_g09_c03' : [Dunk_g09_c03_0159,Dunk_g09_c03_005,Dunk_g09_c03_0149,...,Dunk_g09_c03_0001]
becomes :
'Dunk_g09_c03' : [Dunk_g09_c03_0001,Dunk_g09_c03_0002,Dunk_g09_c03_0003,...,Dunk_g09_c03_LAST]
and
dict_values is reordered respecting the new order in dict_names

We need to, for each key (assuming the d.keys() are identical ), pair the elements by position, sort those pairs, then replace the original values with their sorted counterparts.
def key_func(pair):
return int(pair[0].split('_')[-1])
for k, names in dict_names.items():
sorted_pairs = sorted(zip(names, dict_values[k]), key=key_func)
dict_names[k], dict_values[k] = zip(*sorted_pairs)

count the lines in rdd depended on the lines context, pyspark

I try to understand currently, how RDD works. For example, I want to count the lines based on the context in some RDD object. I have some experince with DataFrames and my code for DF, which has for example columns A, B and probably some other columns, is looking like:
df = sqlContext.read.json("filepath")
df2 = df.groupBy(['A', 'B']).count()
The logical part of this code is clear for me - I do groupBy operation over column name in DF. In RDD I don't have some column name, just similar lines, which could be a tuple or a Row objects... How I can count similar tuples and add it as integer to the unique line? For example my first code is:
df = sqlContext.read.json("filepath")
rddob = df.rdd.map(lambda line:(line.A, line.B))
I do the map operation and create a tuple of the values from the keys A and B. The unique line doesn't have any keys anymore (this is most important difference to the DataFrame, which has column name).
Now I can produce something like this, but it calculate just a total number of lines in RDD.
rddcalc = rddob.distinct().count()
What I want for my output, is just:
((a1, b1), 2)
((a2, b2), 3)
((a2, b3), 1)
...
PS
I have found my personal solution for this question. Here: rdd is initial rdd, rddlist is a list of all lines, rddmod is a final modified rdd and consequently the solution.
rddlist = rdd.map(lambda line:(line.A, line.B)).map(lambda line: (line, 1)).countByKey().items()
rddmod = sc.parallelize(rddlist)

I believe what you are looking for here is a reduceByKey. This will give you a count of how many times each distinct pair of (a,b) lines appears.
It would look like this:
rddob = df.rdd.map(lambda line: (line.A + line.B, 1))
counts_by_key = rddob.reduceByKey(lambda a,b: a+b)
You will now have key, value pairs of the form:
((a,b), count-of-times-pair-appears)
Please note that this only works if A and B are strings. If they are lists, you have to create a "primary key" type of object to perform the reduce on. You can't perform a reduceByKey where the primary key is some complicated object.

Make Dictionary from 2D Array Python?

I have a 2D array as follows.
[['FE0456143', '218.04'], ['FB1357448', '217.52'], ['FB1482960', '222.70'], ['FB1483107', '223.32'], ['FE0456556', '12429.67'], ['FE0456594', '213.71'], ['FB1483056', '218.86'], ['FE0456061', '12392.33'], ['FB1482479', '223.35']]
The first element is the key while the second is the value. I have tried:
keys = zip(*data)[0]
vals = zip(*data)[1]
dic(zip(keys,vals))
However some elements of the array may have duplicate keys, and the elements are not corresponding to them? I want all of the keys to have 3 values associated with it?

Sounds like you want a 1 to many mapping. You can have this if you make your value a list:
from collections import defaultdict
d = defaultdict(list)
for k, v in data:
d[k].append(v)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

match the column value based on previous paired values in python - python

Related

Store list of dictionaries as a DataFrame

Can't work out how to apply multi-keyed dictionary vals back to dataframe column

Reordering (ascending) two dictionaries with respect to values after the last _ in a string

count the lines in rdd depended on the lines context, pyspark

Make Dictionary from 2D Array Python?

Categories

Resources