Understanding initializing an empty dictionary - python

I really do not understand how there was the command (if "entry" in langs_count) is possible when the dictionary was initialized to be empty, so what is inside the dictionary and how did it get there? I'm really confused
`
import pandas as pd
# Import Twitter data as DataFrame: df
df = pd.read_csv("tweets.csv")
# Initialize an empty dictionary: langs_count
langs_count = {}
# Extract column from DataFrame: col
col = df['lang']
# Iterate over lang column in DataFrame
for entry in col:
# If the language is in langs_count, add 1
if entry in langs_count.keys():
langs_count[entry]+=1
# Else add the language to langs_count, set the value to 1
else:
langs_count[entry]=1
# Print the populated dictionary
print(langs_count)
`

You can implement the count functionality using groupby.
import pandas as pd
# Import Twitter data as DataFrame: df
df = pd.read_csv("tweets.csv")
# Populate dictionary with count of occurrences in 'lang' column
langs_count = dict(df.groupby(['lang']).size())
# Print the populated dictionary
print(langs_count)

Related

updating dictionary in a nested loop

In the code below, I would like to update the fruit_dict dictionary with the mean price of each row. But the code is not working as expected. Kindly help.
#!/usr/bin/python3
import random
import numpy as np
import pandas as pd
price=np.array(range(20)).reshape(5,4) #sample data for illustration
fruit_keys = [] # list of keys for dictionary
for i in range(5):
key = "fruit_" + str(i)
fruit_keys.append(key)
# initialize a dictionary
fruit_dict = dict.fromkeys(fruit_keys)
fruit_list = []
# print(fruit_dict)
# update dictionary values
for i in range(price.shape[1]):
for key,value in fruit_dict.items():
for j in range(price.shape[0]):
fruit_dict[key] = np.mean(price[j])
fruit_list.append(fruit_dict)
fruit_df = pd.DataFrame(fruit_list)
print(fruit_df)
Instead of creating the dictionary with the string pattern you can append the values for the means of rows as a string pattern by iterating the rows only.
In case if you have a dictionary with a certain pattern you can update the value in a single loop by assigning the key as the pattern which you need for displaying. you don't need to create an additional list for creating a data frame instead you can refer the documentation for creating data frames from dictionary itself Here. I have provided a sample output which may be suitable for your requirement.
In case you need an output with mean value as a column and fruits as rows you can use the below implementation.
#!/usr/bin/python3
import random
import numpy as np
import pandas as pd
row = 5
column = 4
price = np.array(range(20)).reshape(row, column) # sample data for illustration
# initialize a dictionary
fruit_dict = {}
for j in range(row):
fruit_dict['fruit_'+str(j)] = np.mean(price[j])
fruit_df = pd.DataFrame.from_dict(fruit_dict,orient='index',columns=['mean_value'])
print(fruit_df)
This will provide an output like below. As I already mentioned you can create the data frame as you wish from a dictionary by referring the above data frame documentation.
mean_value
fruit_0 1.5
fruit_1 5.5
fruit_2 9.5
fruit_3 13.5
fruit_4 17.5
`
You shouldn't nest the loop over the range and the dictionary items, you should iterate over them together. You can do this with enumerate().
You're also not using value, so there's no need to use items().
for i, key in enumerate(fruit_dict):
fruit_dict[key] = np.mean(price[j])
Could arrive on a solution based on the answer provided by Sangeerththan. Please find the same below.
#!/usr/bin/python3
fruit_dict = {}
fruit_list =[]
price=np.array(range(40)).reshape(4,10)
for i in range(price.shape[0]):
mark_price = np.square(price[i])
for j in range(mark_price.shape[0]):
fruit_dict['proj_fruit_price_'+str(j)] = np.mean(mark_price[j])
fruit_list.append(fruit_dict.copy())
fruit_df = pd.DataFrame(fruit_list)
You can use this instead of your loops:
fruit_keys = [] # list of keys for dictionary
for i in range(5):
key = "fruit_" + str(i)
fruit_keys.append(key)
out = {fruit_keys[index]: np.mean(price[index]) for index in range(price.shape[0])}
Output:
{'fruit_1': '1.5', 'fruit_2': '5.5', 'fruit_3': '9.5', 'fruit_4': '13.5', 'fruit_5': '17.5'}

Add values from a nested JSON to a pandas dataframe

I have the following JSON object:
{"code":"Ok","matchings":[{"confidence":0.025755,"geometry":"qnp{bBww{kH??~D_I}E_J{EaJ{E{I{AsCoJgQfKuTjJwNtF}HdBuBnAgBpFsF~EeEzAsAt#i#lA}#x#q#lEmCjDuBdDoAvFmAfYmEtAUrJyDj#_#h#m#`#u#T}#J{#B_A?gAGmAM}#Su#]u#wN{QwI{KcA}Aa#gASiAWsBOwCGmDCoJ??cEH?{FA{HgIXuG`#eHrAsLdDkI|CkIfDq#VoDlB_GzDaE`D_A|#kA`AeAx#sI~G}DlDk#j#mClCiOrQwGvJiGxJoFdK_HjP{Pne#aLt\\sK~]oKb_#sG~TeJ`_#q#fD{#dEoBlMwBxQaAbI{Dh\\wKrfAiRbvBy#`KaLjwAyHj_AANM~AUxC}#tKi#bHe#jGfBj#t#V|#\\TFjAXz#HhASxAy#vCcBjX~GvG`BlEjAv\\xJfBf#dThG~Ad#nFrBnCbBdCvBzB`DbCfEr{#b~A","legs":[{"annotation":{"nodes":[330029575,5896466632,330029575,5896466588,5896466587,5896466586,5896466637,330029340,330029339,330029338,1497356855,1880770263,46388213,1880770262,1880770257,2021835257,3306177380,46387099,2021835255,6909770873,46385948,6909770874,46384887,46382454]},"steps":[],"distance":332.2,"duration":93.1,"summary":"","weight":93.1},{"annotation":{"nodes":[46384887,46382454,5888264001,6909802199,3296872014,6909802198,5888264003,6909802197,3296872012,6909802194,6909802195,6909802193,6909802196,3296872013,3296872015]},"steps":[],"distance":88.1,"duration":13.5,"summary":"","weight":13.5},{"annotation":{"nodes":[3296872013,3296872015,6909802186,6909802187,6909770884,3296872017,6909802185,4904066416,3296872018,1614187163]},"steps":[],"distance":62.3,"duration":12.4,"summary":"","weight":12.4},{"annotation":{"nodes":[3296872018,1614187163,2054127599,1614187129,5896479942,6909802219,46384372,1027299576,6909802220,46389815]},"steps":[],"distance":144,"duration":25.2,"summary":"","weight":25.2},{"annotation":{"nodes":[6909802220,46389815,6296436095,6296436094,298079716,6296436096,46391324,1083528076,6909802221,6909802222,46393158]},"steps":[],"distance":90.6,"duration":10.1,"summary":"","weight":10.1},{"annotation":{"nodes":[6909802222,46393158,46393795,6909802223,1027299602,6909802224,46396846,46398397,2054127645,46399502,46400708,1027299589,6712474212,6903665704,46402805,46403163,4374153462]},"steps":[],"distance":422.9,"duration":40.1,"summary":"","weight":40.1},{"annotation":{"nodes":[46403163,4374153462,46404084,1027299603,364146312,2262500170]},"steps":[],"distance":273.6,"duration":24.7,"summary":"","weight":24.7},{"annotation":{"nodes":[364146312,2262500170,5289718695]},"steps":[],"distance":170.9,"duration":15.3,"summary":"","weight":15.3},{"annotation":{"nodes":[2262500170,5289718695,2054127657,1693195716,46408565,6913837768,1693195721,2262500247,1693195714,2262500104,1693195717]},"steps":[],"distance":56.9,"duration":14.2,"summary":"","weight":14.2},{"annotation":{"nodes":[46397705,46401323,46405521]},"steps":[],"distance":86.6,"duration":12.6,"summary":"","weight":12.6},{"annotation":{"nodes":[46401323,46405521,46410773]},"steps":[],"distance":156.5,"duration":22.5,"summary":"","weight":22.5},{"annotation":{"nodes":[46405521,46410773,452003319,452003320]},"steps":[],"distance":95.4,"duration":13.8,"summary":"","weight":13.8},{"annotation":{"nodes":[452003319,452003320,46411428,46414457,46419384,46421801]},"steps":[],"distance":226.4,"duration":32.6,"summary":"","weight":32.6},{"annotation":{"nodes":[46419384,46421801,46421802,46421735]},"steps":[],"distance":69.2,"duration":10,"summary":"","weight":10},{"annotation":{"nodes":[46421802,46421735,46421416]},"steps":[],"distance":34.1,"duration":4.9,"summary":"","weight":4.9},{"annotation":{"nodes":[46421735,46421416,46420466]},"steps":[],"distance":2.7,"duration":0.3,"summary":"","weight":0.3},{"annotation":{"nodes":[46421416,46420466]},"steps":[],"distance":31.4,"duration":4.6,"summary":"","weight":4.6},{"annotation":{"nodes":[46421416,46420466,452003307,452003308,46421260,46422467,5761752102,46423905]},"steps":[],"distance":135.5,"duration":25,"summary":"","weight":25},{"annotation":{"nodes":[5761752102,46423905,46424346,5777055555,5713213408,46425605,5777055050,5777346784,5777055556,5713221227,46426685,46427741,3175895442,3183752428,5826014405,46428227]},"steps":[],"distance":106.5,"duration":14.9,"summary":"","weight":14.9},{"annotation":{"nodes":[5826014405,46428227,3175895443,5826014406,3175895444,5826014368,5826014369,5826014374,46429570,5826014373,5826014375,5826014372,5826014358,5826014371,5826014370,5826014376]},"steps":[],"distance":172.7,"duration":15.7,"summary":"","weight":15.7},{"annotation":{"nodes":[2054127660,2054127638,2054127605,6296435009,2054127599,6909770882,3296872018,4904066416,6909802185,3296872017,6909770884,6909802187,6909802186,3296872015,3296872013,6909802196,6909802193,6909802195,6909802194,3296872012,6909802197,5888264003,6909802198,3296872014,6909802199,5888264001,46382454,46384887,6909770874,46385948,6909770873,2021835255,46387099,3306177380,2021835257]},"steps":[],"distance":317.7,"duration":46.1,"summary":"","weight":46.1},{"annotation":{"nodes":[3306177380,2021835257,1880770257,1880770262,46388213,1880770263,1497356855,330029338,330029339,330029340,5896466637]},"steps":[],"distance":150.4,"duration":29.4,"summary":"","weight":29.4}],"distance":80317.8,"duration":10983.5,"weight_name":"duration","weight":10983.5}],"tracepoints":[{"alternatives_count":0,"waypoint_index":0,"matchings_index":0,"location":[4.929932,52.372217],"name":"Willem Theunisse Blokstraat","distance":10.791613,"hint":"CAkHgHAJBwAlAAAAAAAAAAAAAAAAAAAALCd0QQAAAAAAAAAAAAAAACUAAAAAAAAAAAAAAAAAAAABAAAAjDlLAPkiHwP3OEsAGiMfAwAArxMz7Ejh"},null,{"alternatives_count":0,"waypoint_index":1,"matchings_index":0,"location":[4.932506,52.3709],"name":"Frans de Wollantstraat","distance":11.915926,"hint":"pwUBAPYEAYAHAAAARwAAAAAAAAAAAAAA3_qaQE0JPUIAAAAAAAAAAAcAAABHAAAAAAAAAAAAAAABAAAAmkNLANQdHwPtQksAxB0fAwAA_xUz7Ejh"},{"alternatives_count":0,"waypoint_index":472,"matchings_index":0,"location":[4.932745,52.373288],"name":"Piet Heinkade","distance":0.98867,"hint":"gwUBgMgFAQAFAAAADQAAABoBAABYAAAAQMS3QHTNW0HsWZ1DmZ2WQgUAAAANAAAAGgEAAFgAAAABAAAAiURLACgnHwN9REsAIycfAwoADwkz7Ejh"},null,null,{"alternatives_count":1,"waypoint_index":473,"matchings_index":0,"location":[4.934022,52.371637],"name":"Piet Heinkade","distance":2.713742,"hint":"NA8HADsPB4ACAAAADwAAADoAAAA-AAAAjU82QIAqg0FUpSdCLoWJQgIAAAAPAAAAOgAAAD4AAAABAAAAhklLALUgHwNfSUsAsCAfAwQAvxUz7Ejh"},null,null,{"alternatives_count":1,"waypoint_index":474,"matchings_index":0,"location":[4.93213,52.371794],"name":"Frans de Wollantstraat","distance":10.337677,"hint":"AgUBgAcFAQABAAAABAAAAAwAAAAAAAAA1paeP-KrBUAomAdBAAAAAAEAAAAEAAAADAAAAAAAAAABAAAAIkJLAFIhHwOrQksAeiEfAwIA7xQz7Ejh"},{"alternatives_count":1,"waypoint_index":475,"matchings_index":0,"location":[4.93074,52.372528],"name":"Isaac Titsinghkade","distance":0.65222,"hint":"AwkHgAYJBwA5AAAACwAAAAAAAACMAAAA_Fe_QWP_k0AAAAAA33FqQjkAAAALAAAAAAAAAIwAAAABAAAAtDxLADAkHwOtPEsANCQfAwAADw4z7Ejh"},null,null]}
I want to add all values that belong to the key nodes to one column in a pandas dataframe
When I run:
for i in output["matchings"][0]['legs']:
result = i['annotation']['nodes']
df = pd.DataFrame(result, columns=['node'])
df
only a fraction gets added to the dataframe. What am I doing wrong?
At the end of your for loop, 'df' keeps the last 'node' key of your json. You have to append all 'nodes' keys in a single dataframe instead.
Extending your code:
df = pd.DataFrame({'node':{}})
for i in output["matchings"][0]['legs']:
result = i['annotation']['nodes']
df_temp = pd.DataFrame(result, columns=['node'])
df = df.append(df_temp, ignore_index=True)

How to retrieve the column name and row name with a condition satisfied in a dataframe?

I need to check a condition if the sum of columns is 1 and if satisfies i want to retrieve the column names and row number in a dictionary.
The output should be list1=({8:1004},{9:1001}).
I have tried some python code but couldn't move forward with the code.
list1=[]
for Emp in SkillsA:
sum_row = (SkillsA.sum(axis=0))
#print(sum_row)
# print((Skills_A[0]))
if sum_row[Emp] == 1:
#print(Emp)
for ws in SkillsA:
# if SkillsA[ws][Emp] == 1:
print(SkillsA[ws][Emp])
#list1.update({Emp:ws})
With Pandas you can do it
import pandas as pd
# Import data
df = pd.read_excel("location_file")
# Create a dictionary
dict = dict()
# Iterate in columns
for i in df.columns:
if df[i].sum() == 1:
dict[i] = df.Employee_No[df[i] == 1] # Add filter data to dict
Based on describing the problem as
Find a mapping of columns labels to row labels for the 1s in columns having only a single 1,
it can also be done with a one-line function:
def indices_of_single_ones_by_column(df):
return [{col: df[col].idxmax()} for col in df.columns if df[col].sum() == 1]

How do I fix the For Loop to return a certain character from a DataFrame?

I have imported an excel file and made it into a DataFrame and iterated over a column called "Titles" to spit out titles with certain keywords. I have the list of titles as "match_titles." What I want to do now is to create a For Loop to return the column before "titles" for each title in match_titles." I'm not sure why the code is not working. Any help would be appreciated.
import pandas as pd
data = pd.read_excel(r'C:\Users\bryanmccormack\Downloads\asin_list.xlsx')
df = pd.DataFrame(data, columns=['Track','Asin','Title'])
excludes = ["Chainsaw", "Diaper pail", "Leaf Blower"]
my_excludes = [set(key_word.lower().split()) for key_word in excludes]
match_titles = [e for e in df.Title if
any(keywords.issubset(e.lower().split()) for keywords in my_excludes)]
a = []
for i in match_titles:
a.append(df['Asin'])
print(a)
In your for loop you are appending the unfiltered column df['Asin'] to your list a as many times as there are values in match_titles. But there isn't any filtering of df.
One solution would be to make a column of the match_values then you can return the column Asin after filtering on that match_values column:
# make a function to perform your match analysis.
def is_match(title, excludes=["Chainsaw", "Diaper pail", "Leaf Blower"]):
my_excludes = [set(key_word.lower().split()) for key_word in excludes]
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
# Make a new boolean column for the matches. This applies your
# function to each value in df['Title'] and puts the output in
# the new column.
df['match_titles'] = df['Title'].apply(is_match)
# Filter the df to only matches and return the column you want.
# Because the match_titles column is boolean it can be used as
# an index.
result = df[df['match_titles']]['Asin']

Save data frame from inside for loop

I have a function that takes in a dataframe and returns a (reduced) dataframe, e.g. like this:
def transforming_data(dataframe, col_1, col_2, normalized = True):
''' takes in dataframe, groups col_1 according to col_2 and returns dataframe
'''
df = dataframe[col_1].groupby(dataframe[col_2]).value_counts(normalize = normalized).unstack(fill_value = 0)
return dataframe
For the following code, this gives me:
import pandas as pd
import numpy as np
np.random.seed(12)
def transforming_data(df, col_1, col_2, normalized = True):
''' takes in df, groups col_1 according to col_2 and returns df '''
df = dataframe[col_1].groupby(dataframe[col_2]).value_counts(normalize = normalized).unstack(fill_value = 0)
return df
numrows = 1000
dataframe = pd.DataFrame({'Numerical': np.random.randn(numrows),
'Category': np.random.choice(['Panda', 'Elephant', 'Anaconda'], numrows),
'Response 1': np.random.choice(['Yes', 'Maybe', 'No', 'Don\'t know'], numrows),
'Response 2': np.random.choice(['Very Much', 'Much', 'A bit', 'Not at all'], numrows)})
test = transforming_data(dataframe, 'Response 1', 'Category')
print(test)
# Output
# Response 1 Don't know Maybe No Yes
# Category
# Anaconda 0.275229 0.232416 0.217125 0.275229
# Elephant 0.220588 0.270588 0.255882 0.252941
# Panda 0.258258 0.222222 0.273273 0.246246
So far, so good.
Now I want to use the function transforming_data inside a for loop for every column in dataframe (as I have lots of columns, not just two) and save the resulting dataframe to a new dataframe, e.g. test_response_1 and test_response_2 for this example.
Can someone point me in the right direction - i.e. how to implement the loop correctly?
So far, I am using something like this - but cannot figure out how to save the data frame
for column in dataframe.columns.tolist():
temp_df = transforming_data(dataframe, column, 'Category')
# here, I need to save tmp_df outside of the loop but don't know how to
Thanks a lot for pointers and help. (Note: the most similar question I found does not talk about actually saving the data frame, so it doesn't help me with this.
If you want to save (in memory) all of the temp_df's from your loop, you can append them to a list that you can then index afterwards:
temp_dfs = []
for column in dataframe.columns.tolist(): #you don't actually need the tolist() method here
temp_df = transforming_data(dataframe, column, 'Category')
temp_dfs.append(temp_df)
If you rather be able to access these temp_df's by the column name that was used to transform them, then you could assign each to a dictionary, using the column as the key:
temp_dfs = {}
for column in dataframe.columns.tolist():
temp_df = transforming_data(dataframe, column, 'Category')
temp_dfs[column] = temp_df
If by "save" you meant "write to disk", then you can use one of the many to_<file_format>() methods that pandas provides:
temp_dfs = {}
for column in dataframe.columns.tolist():
temp_df = transforming_data(dataframe, column, 'Category')
temp_df.to_csv('temp_df{}.csv'.format(column))
Here's the to_csv() docs.
The most simple solution would be to save the result dataframes into a list. Assuming that all columns that you want to loop over have the text Response in their column name:
result_dframes = []
for col_name in dataframe.filter(like='Response').columns:
result_dframe = transforming_data(dataframe, col_name, 'Category')
result_dframes.append(result_dframe)
Alternatively you can also obtain the exact same result with a list comprehension instead of a for-loop:
result_dframes = [
transforming_data(dataframe, col_name, 'Category')
for col_name in dataframe.filter(like='Response')
]

Categories