I'm trying to create a dictionary in Python from this output:
["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
I tried with this code:
list_columns = list(df2.columns)
list_dictionary = []
for row in list_columns:
resultado = "'"+str(row)+"'" + "=" + "df2[" + "'" + row + "'" + "]"
list_dictionary.append(resultado)
clean_list_dictionary = ','.join(list_dictionary).replace('"','')
dictionary = dict(clean_list_dictionary)
print(dictionary)
But I get an error:
ValueError: dictionary update sequence element #0 has length 1; 2 is required
Do you have any idea how I can make this work?
Thank you in advance!
Output dictionary should look like this:
{
'a' : df2['a'],
'b' : df2['b'],
'c' : df2['c'],
'd' : df2['d']
}
Method 1: Transforming your list of string for an eval later
As you have mentioned in your comment -
I would like to create a dictionary for with this format: ''' {'a' : df2['a'], 'b' : df2['b'], 'c' : df2['c'], 'd' : df2['d']} ''' I will use it as global variables in an eval() function.
You can use the following to convert your input string
#dummy dataframe
df2 = pd.DataFrame([[1,2,3,4]], columns=['a','b','c','d']) #Dummy dataframe
#your list of strings
l = ["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
#Solution
def dict_string(l):
s0 = [i.split('=') for i in l]
s1 = '{' + ', '.join([': '.join([k,v]) for k,v in s0]) + '}'
return s1
output = dict_string(l)
print(output)
eval(output)
#String before eval
{'a': df2['a'], 'b': df2['b'], 'c': df2['c'], 'd': df2['d']} #<----
#String after eval
{'a': 0 1
Name: a, dtype: int64,
'b': 0 2
Name: b, dtype: int64,
'c': 0 3
Name: c, dtype: int64,
'd': 0 4
Name: d, dtype: int64}
Method 2: Using eval as part of your iteration of the list of strings
Here is a way to do this using list comprehensions and eval, as part of the iteration on the list of strings itself. This will give you the final output that you would get if you were to use eval on the dictionary string you are expecting.
#dummy dataframe
df2 = pd.DataFrame([[1,2,3,4]], columns=['a','b','c','d']) #Dummy dataframe
#your list of strings
l = ["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
#Solution
def eval_dict(l):
s0 = [(eval(j) for j in i.split('=')) for i in l]
s1 = {k:v for k,v in s0}
return s1
output = eval_dict(l)
print(output)
{'a': 0 1
Name: a, dtype: int64,
'b': 0 2
Name: b, dtype: int64,
'c': 0 3
Name: c, dtype: int64,
'd': 0 4
Name: d, dtype: int64}
The output is a dict that has 4 keys, (a,b,c,d) and 4 corresponding values for columns a, b, c, d from df2 respectively.
You can loop over the list,split by charater and convert to dict.
Code:
dic= {}
[dic.update(dict( [l.split('=')])) for l in ls]
dic
I think this is exactly what you want.
data = ["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
dic = {}
for d in data:
k = d.split("=")[0]
v = df2[d.split("=")[1].split("\'")[1]]
dic.update({k: v})
print(dic)
Its not clear what exactly you want to achieve.
If You have a pd.DataFrame() and you want to convert it to a dictionary where column names are keys and column values are dict values you should use df.to_dict('series').
import pandas as pd
# Generate the dataframe
data = {'a': [1, 2, 1, 0], 'b': [2, 3, 4, 5], 'c': [10, 11, 12, 13], 'd': [21, 22, 23, 24]}
df = pd.DataFrame.from_dict(data)
# Convert to dictionary
result = df.to_dict('series')
print(result)
If you have a list of strings that you need to convert to desired output than you should do it differently. What you have are strings 'df' while df in your dict is a variable. So you only need to extract the column names and use the variable df not the string 'df'
import pandas as pd
# Generate the dataframe
data = {'a': [1, 2, 1, 0], 'b': [2, 3, 4, 5], 'c': [10, 11, 12, 13], 'd': [21, 22, 23, 24]}
df = pd.DataFrame.from_dict(data)
# create string list
lst = ["'a'=df2['a']", "'b'=df2['b']", "'c'=df2['c']", "'d'=df2['d']"]
# Convert to dictionary
result = {}
for item in lst:
key = item[1]
result[key] = df[key]
print(result)
The results are the same but in second case list of strings is created for no reason because first example can achieve the same results without it..
Related
I have the following pandas dataframe
data = [{'a': 1, 'b': '[2,3,4,5,6' }, {'a': 10, 'b': '[54,3,40,5'}]
test = pd.DataFrame(data)
display(test)
a b
0 1 [2,3,4,5,6
1 10 [54,3,40,5
I want to list the number in column b, but as the list has the [ only at the beginning, doesnt allow me to create the list, I'm trying to remove the "[" so I can extract the numbers, but I keep getting errors, what I'm doing wrong?
This is how the numbers are stored
test.iloc[1,1]
'[54,3,40,5'
And this is what I've tried to remove the "[".
test.iloc[0,1].replace("[",'', regex=True).to_list()
test.iloc[0,1].str.replace("[\]\[]", "")
What i want to achieve is to have b as a proper list so i can apply other functions.
a b
0 1 [2,3,4,5,6]
1 10 [54,3,40,5]
To make your 'b' column a list you can first delete the open squared bracket at the beginning, and then use the split method on each element of your 'b' column
test['b'] = test['b'].str.replace('[', '').map(lambda x: x.split(','))
test
# a b
# 0 1 [2, 3, 4, 5, 6]
# 1 10 [54, 3, 40, 5]
try it:
def func(col):
return eval(col+']')
test['b'] = test['b'].apply(func)
import pandas as pd
data = [{'a': 1, 'b': '[2,3,4,5,6' }, {'a': 10, 'b': '[54,3,40,5'}]
test = pd.DataFrame(data)
print(test['b'][0][1:])
for i in range(len(test['b'])):
test['b'][i] = test['b'][i][1:]
If a pandas column contains a list, you can use a dictionary to convert all the values using
df['listColumn'] = df['listColumn'].apply(lambda x: [columnDictionary[i] for i in x])
However, there are instances where not all the items in a list are keys to the dictionary. In that case, how do you replace those items with nothing.
For example
columnDictionary = {a:1, b:2, d:7, f:8 }
Specific Pandas row/column: [ a, b, c, d, e]
Specific Pandas row/column after conversion: [ 1, 2, 7]
With simple condition to check if a list value is in target dict keys list:
In [47]: df = pd.DataFrame({'listColumn': ['a', 123, list('abcde')]})
In [48]: repl_dict = {'a':1, 'b':2, 'd':7, 'f':8 }
In [49]: df['listColumn'].apply(lambda x: [repl_dict[v] for v in x if v in repl_dict] if isinstance(x, list) else x)
Out[49]:
0 a
1 123
2 [1, 2, 7]
Name: listColumn, dtype: object
Use "if else" inside the lamdba function :
Method 1: apply lambda on columns, below on one column only ( axis = 0 )
# apply lambda on 1 column (axis = 0)
d = {'col1':[ 'a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data=d)
columnDictionary ={'a':1, 'b':2, 'd':7, 'f':8 }
df['col1'] = df['col1'].apply(lambda x: [columnDictionary[x] if x in columnDictionary else ''])
df
Method 2: apply lambda on rows (axis = 1), row by row (I think it is slower)
d = {'col1':[ 'a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data=d)
columnDictionary ={'a':1, 'b':2, 'd':7, 'f':8 }
df['listColumn'] = df.apply(lambda x: [columnDictionary[i] if i in columnDictionary else '' for i in x],axis=1)
df
Result :
col1 listColumn
0 a [1]
1 b [2]
2 c []
3 d [7]
4 e []
There is a build-in function to check if something is list, it called isinstance(mydata, list) whitch will return True or False respectivelly.
I tried to convert a set column to list in python dataframe, but failed. Not sure what's best way to do so. Thanks.
Here is the example:
I tried to create a 'c' column which convert 'b' set column to list. but 'c' is still set.
data = [{'a': [1,2,3], 'b':{11,22,33}},{'a':[2,3,4],'b':{111,222}}]
tdf = pd.DataFrame(data)
tdf['c'] = list(tdf['b'])
tdf
a b c
0 [1, 2, 3] {33, 11, 22} {33, 11, 22}
1 [2, 3, 4] {222, 111} {222, 111}
You could do:
import pandas as pd
data = [{'a': [1,2,3], 'b':{11,22,33}},{'a':[2,3,4],'b':{111,222}}]
tdf = pd.DataFrame(data)
tdf['c'] = [list(e) for e in tdf.b]
print(tdf)
Use apply:
tdf['c'] = tdf['b'].apply(list)
Because using list is doing to whole column not one by one.
Or do:
tdf['c'] = tdf['b'].map(list)
I want to find duplicates in a selection of columns of a df,
# converts the sub df into matrix
mat = df[['idx', 'a', 'b']].values
str_dict = defaultdict(set)
for x in np.ndindex(mat.shape[0]):
concat = ''.join(str(x) for x in mat[x][1:])
# take idx as values of each key a + b
str_dict[concat].update([mat[x][0]])
dups = {}
for key in str_dict.keys():
dup = str_dict[key]
if len(dup) < 2:
continue
dups[key] = dup
The code finds duplicates of the concatenation of a and b. Uses the concatenation as key for a set defaultdict (str_dict), updates the key with idx values; finally uses a dict (dups) to store any concatenation if the length of its value (set) is >= 2.
I am wondering if there is a better way to do that in terms of efficiency.
You can just concatenate and convert to set:
res = set(df['a'].astype(str) + df['b'].astype(str))
Example:
df = pd.DataFrame({'idx': [1, 2, 3],
'a': [4, 4, 5],
'b': [5, 5,6]})
res = set(df['a'].astype(str) + df['b'].astype(str))
print(res)
# {'56', '45'}
If you need to map indices too:
df = pd.DataFrame({'idx': [1, 2, 3],
'a': [41, 4, 5],
'b': [3, 13, 6]})
df['conc'] = (df['a'].astype(str) + df['b'].astype(str))
df = df.reset_index()
res = df.groupby('conc')['index'].apply(set).to_dict()
print(res)
# {'413': {0, 1}, '56': {2}}
You can filter the column you need before drop_duplicate
df[['a','b']].drop_duplicates().astype(str).apply(np.sum,1).tolist()
Out[1027]: ['45', '56']
Why doesn't a pandas.DataFrame object complain when I rename a column if the new column name already exists?
This makes referencing the new column in the future return a pandas.DataFrame as opposed to a pandas.Series , which can cause further errors.
Secondly, is there a suggested way to handle such a situation?
Example:
import pandas
df = pd.DataFrame( {'A' : ['foo','bar'] ,'B' : ['bar','foo'] } )
df.B.map( {'bar':'foo','foo':'bar'} )
# 0 foo
# 1 bar
# Name: B, dtype: object
df.rename(columns={'A':'B'},inplace=True)
Now, the following will fail:
df.B.map( {'bar':'foo','foo':'bar'} )
#AttributeError: 'DataFrame' object has no attribute 'map'
Let's say you had a dictionary mapping old columns to new column names. When renaming your DataFrame, you could use a dictionary comprehension to test if the new value v is already in the DataFrame:
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
d = {'a': 'B', 'b': 'B'}
df.rename(columns={k: v for k, v in d.iteritems() if v not in df}, inplace=True)
>>> df
a B
0 1 3
1 2 4
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
d = {'a': 'b'}
df.rename(columns={k: v for k, v in d.iteritems() if v not in df}, inplace=True)
>>> df
a b
0 1 3
1 2 4