How to modify pandas dataframe format? - python

there is a problem with my pandas dataframe. DF is my original dataframe. Then I select specific columns of my DF:
df1=df[['cod_far','geo_lat','geo_lon']]
Then I set new names for those columns:
df1.columns = ['new_col1', 'cod_far', 'lat', 'lon']
And finally I group by DF1 by specific columns and convert it to a new DF called "occur"
occur = df1.groupby(['cod_far','lat','lon' ]).size()
occur=pd.DataFrame(occur)
The problem is that I am getting this: a dataframe with only ONE column. Rows are fine, but there should be 3 columns! Is there any way to drop that "0" and convert my dataframe "occur" into a dataframe of 3 columns?

Related

Merge Dataframes using List of Columns (Pandas Vlookup)

I'd like to lookup several columns from another dataframe that I have in a list to bring them over to my main dataframe, essentially doing a "v-lookup" of ~30 columns using ID as the key or lookup value for all columns.
However, for the columns that are the same between the two dataframes, I don't want to bring over the duplicate columns but have those values be filled in df1 from df2.
I've tried below:
df = pd.merge(df,df2[['ID', [look_up_cols]]] ,
on ='ID',
how ='left',
#suffixes=(False,False)
)
but it brings in the shared columns from df2 when I want df2's values filled into the same columns in df1.
I've tried also created a dictionary with the column pairs from each df and doing this for loop to lookup each item in the dictionary (lookup_map) in the other df using ID as the key:
for col in look_up_cols:
df1[col] = df2['ID'].map(lookup_map)
but this just returns NaNs.
You should be able to do something like the following:
df = pd.merge(df,df2[look_up_cols + ['ID']] ,
on ='ID',
how ='left')
This just adds the ID column to the look_up_cols list and thereby allows it to be used in the merge function

How to transform a pandas dataframe with two index columns and one value column into a heat map?

I have a df which looks like this when exported to excel. Where columns A and B a index columns.
excel output of df
How to transform into this format?
wanted output
Use pandas.DataFrame.unstack
df = df.unstack(level=-1)

appending in pandas - row wise

I'm trying to append two columns of my dataframe to an existing dataframe with this:
dataframe.append(df2, ignore_index = True)
and this does not seem to be working.
This is what I'm looking for (kind of) --> a dataframe with 2 columns and 6 rows:
although this is not correct and it's using two print statements to print the two dataframes, I thought it might be helpful to have a selection of the data in mind.
I tried to use concat(), but that leads to some issues as well.
dataframe = pd.concat([dataframe, df2])
but that appears to concat the second dataframe in columns rather than rows, in addition to gicing NaN values:
any ideas on what I should do?
I assume this happened because your dataframes have different column names. Try assigning the second dataframe column names with the first dataframe column names.
df2.columns = dataframe.columns
dataframe_new = pd.concat([dataframe, df2], ignore_index=True)

Slicing a set of columns when a pandas dataframe does not include column labels

How to slice the last n columns from pandas dataframe assuming the dataframe does not include column labels? For instance, I want to slice the last 4 columns:
data = np.random.uniform(0,10,(4,10)).astype(np.int)
df = pd.DataFrame(data)
print(df.ix[:,4])
Can someone fix this up?
You can try with iloc instead of ix with -4: columns:
print(df.iloc[:,-4:])

Adding a new column to a pandas dataframe

I have a dataframe df with one column and 500k rows (df with first 5 elements is given below). I want to add new data in the existing column. The new data is a matrix of 200k rows and 1 column. How can I do it? Also I want add a new column named op.
X098_DE_time
0.046104
-0.037134
-0.089496
-0.084906
-0.038594
We can use concat function after rename the column from second dataframe.
df2.rename(columns={'op':' X098_DE_time'}, inplace=True)
new_df = pd.concat([df, new_df], axis=0)
Note: If we don't rename df2 column, the resultant new_df will have 2 different columns.
To add new column you can use
df["new column"] = [list of values];

Categories