This should be a pretty simple question, but I'm looking to programmatically insert the name of a pandas DataFrame into that DataFrame's column names.
Say I have the following DataFrame:
name_of_df = pandas.DataFrame({1: ['a','b','c','d'], 2: [1,2,3,4]})
print name_of_df
1 2
0 a 1
1 b 2
2 c 3
3 d 4
I want to have following:
name_of_df = %%some_function%%(name_of_df)
print name_of_df
name_of_df1 name_of_df2
0 a 1
1 b 2
2 c 3
3 d 4
..where, as you can see, the name of the DataFrame is programatically inputted into the column names. I know pandas DataFrames don't have a __name__ attribute, so I'm drawing a blank on how to do this.
Please note that I want to do this programatically, so altering the names of the columns with a hardcoded 'name_of_df' string won't work.
From the linked question, you can do something like this. Multiple names can point to the same DataFrame, so this will just grab the "first" one.
def first_name(obj):
return [k for k in globals() if globals()[k] is obj and not k.startswith('_')][0]
In [24]: first_name(name_of_df)
Out[24]: 'name_of_df'
Related
I need to add the number of unique values in column C (right table) to the related row in the left table based on the values in common column A (as shown in the picture):
thank you in advance
Groupby column A in second dataset and calculate count of each unique value in column C. merge it with first dataset on column A. Rename column C to C-count if needed:
>>> count_df = df2.groupby('A', as_index=False).C.nunique()
>>> output = pd.merge(df1, count_df, on='A')
>>> output.rename(columns={'C':'C-count'}, inplace=True)
>>> output
A B C-count
0 2 22 3
1 3 23 2
2 5 21 1
3 1 24 1
4 6 21 1
Use DataFrameGroupBy.nunique with Series.map for new column in df1:
df1['C-count'] = df1['A'].map(df2.groupby('A')['C'].nunique())
This may not be the most effective way of doing this, so if your databases are too big be careful.
Define the following function:
def c_value(a_value, right_table):
c_ids = []
for index, row in right_table.iterrows():
if row['A'] == a_value:
if row['C'] not in c_ids:
c_ids.append(row['C'])
return len(c_ids)
For this function I'm supposing that the right_table is a pandas.Dataframe.
Now, you do the following to build the new column (assuming that the left table is a pandas.Dataframe):
new_column = []
for index, row in left_table.iterrows():
new_column.append(c_value(row['A'],right_table))
left_table["C-count"] = new_column
After this, the left_table Dataframe should be the one dessired (as far as I understand what you need).
Good day everyone! I had trouble putting a nested dictionary as separate columns. However, I fixed it using the concat and json.normalize function. But for some reason the code I used removed all the column names and returned NaN as values for the columns...
Does someone knows how to fix this?
Code I used:
import pandas as pd
c = ['photo.photo_replace', 'photo.photo_remove', 'photo.photo_add', 'photo.photo_effect', 'photo.photo_brightness',
'photo.background_color', 'photo.photo_resize', 'photo.photo_rotate', 'photo.photo_mirror', 'photo.photo_layer_rearrange',
'photo.photo_move', 'text.text_remove', 'text.text_add', 'text.text_edit', 'text.font_select', 'text.text_color', 'text.text_style',
'text.background_color', 'text.text_align', 'text.text_resize', 'text.text_rotate', 'text.text_move', 'text.text_layer_rearrange']
df_edit = pd.concat([json_normalize(x)[c] for x in df['editables']], ignore_index=True)
df.columns = df.columns.str.split('.').str[1]
Current problem:
Result I want:
df= pd.DataFrame({
'A':[1,2,3],
'B':[3,3,3]
})
print(df)
A B
0 1 3
1 2 3
2 3 3
c=['new_name1','new_name2']
df.columns=c
print(df)
new_name1 new_name2
0 1 3
1 2 3
2 3 3
remember , lenght of column names (c) should be equal to column amount
I need a fast way to extract the right values from a pandas dataframe:
Given a dataframe with (a lot of) data in several named columns and an additional columns whose values only contains names of the other columns, how do I select values from the data-columns with the additional columns as keys?
It's simple to do via an explicit loop, but this is extremely slow with something like .iterrows() directly on the DataFrame. If converting to numpy-arrays, it's faster, but still not fast. Can I combine methods from pandas to do it even faster?
Example: This is the kind of DataFrame structure, where columns A and B contain data and column keys contains the keys to select from:
import pandas
df = pandas.DataFrame(
{'A': [1,2,3,4],
'B': [5,6,7,8],
'keys': ['A','B','B','A']},
)
print(df)
output:
Out[1]:
A B keys
0 1 5 A
1 2 6 B
2 3 7 B
3 4 8 A
Now I need some fast code that returns a DataFrame like
Out[2]:
val_keys
0 1
1 6
2 7
3 4
I was thinking something along the lines of this:
tmp = df.melt(id_vars=['keys'], value_vars=['A','B'])
out = tmp.loc[a['keys']==a['variable']]
which produces:
Out[2]:
keys variable value
0 A A 1
3 A A 4
5 B B 6
6 B B 7
but doesn't have the right order or index. So it's not quite a solution.
Any suggestions?
See if either of these work for you
df['val_keys']= np.where(df['keys'] =='A', df['A'],df['B'])
or
df['val_keys']= np.select([df['keys'] =='A', df['keys'] =='B'], [df['A'],df['B']])
No need to specify anything for the code below!
def value(row):
a = row.name
b = row['keys']
c = df.loc[a,b]
return c
df.apply(value, axis=1)
Have you tried filtering then mapping:
df_A = df[df['key'].isin(['A'])]
df_B = df[df['key'].isin(['B'])]
A_dict = dict(zip(df_A['key'], df_A['A']))
B_dict = dict(zip(df_B['key'], df_B['B']))
df['val_keys'] = df['key'].map(A_dict)
df['val_keys'] = df['key'].map(B_dict).fillna(df['val_keys']) # non-exhaustive mapping for the second one
Your df['val_keys'] column will now contain the result as in your val_keys output.
If you want you can just retain that column as in your expected output by:
df = df[['val_keys']]
Hope this helps :))
I have two (actually many, but stick with two) datasets and I need to merge them together. However, they are not same range and they have different reference values. Lets consider
a 1
b 2
c 3
e 4
and
a 2
b 3
d 7
e 2
I tried to simulate Excel index and match function, but I am not able to get the right result
b = []
f = []
for i in data1["c1"]:
if i in data2["c1"]:
a = d3[data2["c4"].index[i]]
f = b.append(a)
else:
continue
print(f)
Can you please help me how this works? I would also welcome some link with further information about this topic. Thank you
If you want to create a consolidated file from the two above like:
Col1 Col2 Col3
a 1 2
b 2 3
c 3 7
d 4 2
You can simply use dictionaries, with keys as your column 1 values: a, b, c, d and values as list of the 2nd column values from your two DataFrames respectively like:
your_dict = {a:[1,2], b:[2,3], c:[3,7], d:[4,2]}
Then to output that into one DataFrame such as the one above, just use the .from_dict() method in pandas with the orient parameter equal to 'index' see documentation here.
The scenario here is that I've got a dataframe df with raw integer data, and a dict map_array which maps those ints to string values.
I need to replace the values in the dataframe with the corresponding values from the map, but keep the original value if the it doesn't map to anything.
So far, the only way I've been able to figure out how to do what I want is by using a temporary column. However, with the size of data that I'm working with, this could sometimes get a little bit hairy. And so, I was wondering if there was some trick to do this in pandas without needing the temp column...
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,5, size=(100,1)))
map_array = {1:'one', 2:'two', 4:'four'}
df['__temp__'] = df[0].map(map_array, na_action=None)
#I've tried varying the na_action arg to no effect
nan_index = data['__temp__'][df['__temp__'].isnull() == True].index
df['__temp__'].ix[nan_index] = df[0].ix[nan_index]
df[0] = df['__temp__']
df = df.drop(['__temp__'], axis=1)
I think you can simply use .replace, whether on a DataFrame or a Series:
>>> df = pd.DataFrame(np.random.randint(1,5, size=(3,3)))
>>> df
0 1 2
0 3 4 3
1 2 1 2
2 4 2 3
>>> map_array = {1:'one', 2:'two', 4:'four'}
>>> df.replace(map_array)
0 1 2
0 3 four 3
1 two one two
2 four two 3
>>> df.replace(map_array, inplace=True)
>>> df
0 1 2
0 3 four 3
1 two one two
2 four two 3
I'm not sure what the memory hit of changing column dtypes will be, though.