Create Dataframe with a certain number of columns - python

I have the following Dataframe:
Now i want to copy the column "Power" as often as i want to another column in the same Dataframe.
The column names should be: Power_1; Power_2; Power_3.....
Creating the Dataframe is too complicated to share, but a simple example how to add the columns with a while-loop would be sufficient.

for i in range(10):
df[f"Power_{i}"] = df["Power"]

Related

How to add a column with values depending on existing rows with lower index in pandas?

Is there a fast way of adding a column to a data frame df with values depending on all the rows of df with smaller index? A very simple example where the new column only depends on the value of one other column would be df["new_col"] = df["old_col"].cumsum() (if df is ordered), but I have something more complicated in mind. Ideally, I'd like to write something like
df["new_col"] = df.[some function here](f),
where [some function] sets the i-th value of df["new_col"] to f(df[df.index <= df.index[i]]). (Ideally [some function] can also be applied to groupby() objects.)
At the moment I loop through rows, add a temporary column containing a dict of relevant values and then apply a function, but this is very slow, memory-inefficient, etc.

Python: Create New Column based on values of other column using len()

My dataframe is a pandas dataframe df with many rows & columns.
Now i want to create a new column (series) based on the values of an object column. e.g.:
df.iloc[0, 'oldcolumn'] Output is 0 should give me 0 in a new column and
df.iloc[1, 'oldcolumn'] Output is 'ab%$.' should give me 5 in the same new column (number of literals incl. space).
in addition, is there a way to avoid loops or own functions?
Thank U
To create a new column based on the length of the value in another column, you should do
df['newcol'] = df['oldcol'].apply(lambda x: len(str(x)))
Although this is a generic way of creating a new column based on data from existing columns, Henry's approach is also a good one.
In addition, is there a way to avoid loops or own functions?
I recommend you take a look at How To Make Your Pandas Loop 71803 Times Faster.
You can try this:
df['strlen'] = df['oldcolumn'].apply(len)
print(df)

How can the following code be used to rearrange column names

Trying to understand that how does the following code rearranges the columns of the resultant dataframe as per the other dataframe.
df_with_intercept = df_with_intercept[df_scorecard['Feature_names'].values]
Please note that 'Feature names' column in df_scorecard has all the column names used in df_with_intercept with some scores against it.
Above code just rearranged the columns in df_with_intercept to match the order of rows in 'Feature names'.
This is being done to enable dot multiplication of relevant variables with each other.
df_scorecard['Feature_names']
inputs_test_with_ref_cat_w_intercept = \
inputs_test_with_ref_cat_w_intercept[df_scorecard['Feature name'].values]
I think this might help explain things a bit.
Updating column names of one Pandas dataframe from the column of another dataframe
Start with a dataframe of data
df_pokemon = pd.DataFrame({
"A": ["Eevee", "Vaporeon", "Flareon"],
"C": ["Pichu", "Pikachu", "Raichu"]})
This produces a dataframe which looks like this
Create the dataframe which has the label names
df_labels = pd.DataFrame({
"X": ["Pikachu_line", "Eevvee_line"]})
This produces a dataframe which looks like this
I can use the X column from df_labels to replace the column names in df_pokemon
df_pokemon.columns = df_labels['X'].tolist()
Thus
How to change the order of columns in one dataframe based one the data in another dataframes column
Let say want to switch the columns in df_pokemon, we can do this.
I've created a new df_labels which has an updated order (pikachu and eevee are switched)
df_labels = pd.DataFrame({"X": ["Pikachu_line", "Eevvee_line"]})
I can use this data in column X to dictate the order in df_pokemon
df_pokemon[df_labels['X'].tolist()]
You will see the order of columns has changed

How to feed new columns every time in a loop to a spark dataframe?

I have a task of reading each columns of Cassandra table into a dataframe to perform some operations. Here I want to feed the data like if 5 columns are there in a table I want:-
first column in the first iteration
first and second column in the second iteration to the same dataframe
and likewise.
I need a generic code. Has anyone tried similar to this? Please help me out with an example.
This will work:
df2 = pd.DataFrame()
for i in range(len(df.columns)):
df2 = df2.append(df.iloc[:,0:i+1],sort = True)
Since, the same column name is getting repeated, obviously df will not have same column name twice and hence it will keep on adding rows
You can extract the names from dataframe's schema and then access that particular column and use it the way you want to.
names = df.schema.names
columns = []
for name in names:
columns.append(name)
//df[columns] use it the way you want

Pandas merge DataFrames based on index/column combination

I have two DataFrames that I want to merge. I have read about merging on multiple columns, and preserving the index when merging. My problem needs to cater for both, and I am having difficulty figuring out the best way to do this.
The first DataFrame looks like this
and the second looks like this
I want to merge these based on the Date and the ID. In the first DataFrame the Date is the index and the ID is a column; in the second DataFrame both Date and ID are part of a MultiIndex.
Essentially, as a result I want a DataFrame that looks like DataFrame 2 with an additional column for the Events from DataFrame 1.
I'd suggest reseting the index (reset_index) and then merging the DataFrame, as you've read. Then you can set the index (set_index) to reproduce your desired MultiIndex.

Categories