I created an empty dataframe (df1) with the python pandas package, only containing the columns: [var1, var2, var3]
I also have another dataframe (df2) which looks like this:
columns: [var 2, var1, var3]
values: [1, 2, 3]
When I append df2 to df1 the orders of the columns in the dataframe change. I tried to reorder the dataframe with the old list of columns with sort_values and sort, but it didn't work. Does anyone know how I can solve it? I am using python version 2.7
If I'm understanding this correctly, append is not a problem. It is only about column order. To change the order of columns in a Dataframe, simply slice with column names.
df1 = pds.DataFrame(columns=['var1', 'var2', 'var3']) # desired order
# generate a disordered df somehow
df_disordered = pds.DataFrame(columns=['var2', 'var1', 'var3'])
df_adjusted = df_disordered[df1.columns] # slice with column names
# or explicitly df_disordered[['var1', 'var2', 'var3']]
# now df_adjusted has the same column order as df1
Related
I have a .csv file with many rows and columns. For analysis purposes, I want to select a row number from the dataset and pass it as a dataframe in pandas.
Instead of writing the column names and input values inside a dict, how can I make it faster?
Right now I have:
df= pd.read_csv('filename.csv')
df2= pd.DataFrame({'var1': 5, 'var2': 10, 'var3': 15})
var1,var2,var3 are df columns. I want to make a seperate dataframe with df data.
You can either select a random row, or a given row number.
Thank you for your help.
df2 = df.iloc[rownum:rownum + 1, :]
If you want to filter out data as new dataframe from existing one you can use something like this -
based on particular rows required
df2 = df.iloc[4:5,:]
or data using some condition
df3 = df[df['var1'] < 10]
I have a dataframe which is the result of a concatenation of dataframe. I use "keys= " option for the title of each blocks when I export in Excel.
And now I want define the ID2 as an index with ID. (For have a multindex)
I tried to use .resetindex, but it didn't work like I want.
I have:
I want:
You can extract your indexes to lists and to create a MultiIndex object, and then simply define the index of your DataFrame with this MultiIndex. This works on my side (pandas imported as pd):
Let's assume your initial DataFrame is this one (just a smaller version of what you have):
df = pd.DataFrame({'ID2': ['b','c','b'], 'name' : ['tomato', 'pizza', 'kebap']}, index = [1,2,4])
Then, we extract the final indices from the index and from the column of the dataframe in order to build a list of tuples, with which you create the multiindex with pandas.MuliIndex method:
ID2 = df.ID2.to_list()
ID1 = df.index.to_list()
indexes = [(id1, id2) for id1,id2 in zip(ID1,ID2)]
final_indices = pd.MultiIndex.from_tuples(indexes, names=["Id1", "Id2"])
Finally, you redefine your index and you can drop the 'ID2' column:
df.index = final_indices
df = df.drop('ID2', axis = 1)
This gives the following DataFrame:
Note: I also tried with the df.reindex method, but the values of the DataFrame became NaN, I do not know why.
I have a DataFrame with columns like:
>>> df.columns
['A_ugly_column_name', 'B_ugly_column_name', ...]
and a Series, series_column_names, with nice column names like:
>>> series_column_names = pd.Series(
data=["A_ugly_column_name", "B_ugly_column_name"],
index=["A", "B"],
)
>>> print(series_column_names)
A A_ugly_column_name
B B_ugly_column_name
...
Name: column_names, dtype: object
Is there a nice way to rename the columns in df according to series_column_names? More specifically, I'd like to rename the columns in df to the index in column_names where value in the series is the old column name in df.
Some context - I have several DataFrames with columns for the same thing, but they're all named slightly differently. I have a DataFrame where, like here, the index is a standardized name and the columns contain the column names used by the various DataFrames. I want to use this "name mapping" DataFrame to rename the columns in the several DataFrames to the same thing.
a solution i have...
So far, the best solution I have is:
>>> df.rename(columns=lambda old: series_column_names.index[series_column_names == old][0])
which works but I'm wondering if there's a better, more pandas-native way to do this.
first create a dictionary out of your series by using .str.split
cols = {y : x for x,y in series_column_names.str.split('\s+').tolist()}
print(cols)
Edit.
If your series has your target column names as the index and the values as the series you can still create a dictionary by inversing the keys and values.
cols = {y : x for x,y in series_column_names.to_dict().items()}
or
cols = dict(zip(series_column_names.tolist(), series_column_names.index))
print(cols)
{'B_ugly_column_name': 'B_nice_column_name',
'C_ugly_column_name': 'C_nice_column_name',
'A_ugly_column_name': 'A_nice_column_name'}
then assign your column names.
df.columns = df.columns.map(cols)
print(df)
A_nice_column_name B_nice_column_name
0 0 0
Just inverse the index/values in series_column_names and use it to rename. It doesn't matter if there are extra names.
series_column_names = pd.Series(
data=["A_ugly", "B_ugly", "C_ugly"],
index=["A", "B", "C"],
)
df.rename(columns=pd.Series(series_column_names.index.values, index=series_column_names))
Wouldn't it be as simple as this?
series_column_names = pd.Series(['A_nice_column_name', 'B_nice_column_name'])
df.columns = series_column_names
My DF has the following columns:
df.columns = ['not-changing1', 'not-changing2', 'changing1', 'changing2', 'changing3', 'changing4']
I want to swap the last 4 columns WITHOUT USING COLUMNS NAMES, but using their index instead.
So, the final column order would be:
result.columns = ['not-changing1', 'not-changing2', 'changing1', 'changing3', 'changing2', 'changing4']
How do I do that?
I am using drop in pandas with an inplace=True set. I am performing this on a duplicate dataframe, but the original dataframe is also being modified.
df1 = df
for col in df1.columns:
if df1[col].sum() > 1:
df1.drop(col,inplace=True,axis=1)
This is modifying my 'df' dataframe and don't seem to understand why.
Use df1 = df.copy(). Otherwise they are the same object in memory.
However, it would be better to generate a new DataFrame directly, e.g.
df1 = df.loc[:, df.sum() <= 0]