Resetting column headings after concat - python

I concatenated a df(6000,13) with a sampleDf(6000,1) and found that my col index in my pandas df as expected ranges from 0 - 12 for the df, and then displays 0 for the concatenated sampleDf.
df = pd.concat([df, sampleDF], axis=1)
I am trying to reset this and have tried the following but nothing seems to have any effect. Any other methods I can try or any thoughts on why this may be happening?
df = df.reset_index(drop=True)
df = df.reindex()
df.index = range(len(df.index))
df.index = pd.RangeIndex(len(df.index))
I have also tried to append .reset_index(drop=True) to my original concat.
The only thing I can think of is that my data frame is 1d in length after processing and should be a pandas series perhaps?
Edit
I found a workaround if I transpose and then transpose again. There has to be a better way than this.
df = pd.concat([df, sampleDF], axis=1)
df = df.transpose()
df.index = range(len(df.index))
df = df.transpose()

You can simply rename your columns directly:
df = pd.concat([df, sampleDF], axis=1)
df.columns = range(len(df.columns))
This will be more efficient than repeatedly transposing df.

Related

How can i add a column that has the same value

I was trying to add a new Column to my dataset but when i did the column only had 1 index
is there a way to make one value be in al indexes in a column
import pandas as pd
df = pd.read_json('file_1.json', lines=True)
df2 = pd.read_json('file_2.json', lines=True)
df3 = pd.concat([df,df2])
df3 = df.loc[:, ['renderedContent']]
görüş_column = ['Milet İttifakı']
df3['Siyasi Yönelim'] = görüş_column
As per my understanding, this could be your possible solution:-
You have mentioned these lines of code:-
df3 = pd.concat([df,df2])
df3 = df.loc[:, ['renderedContent']]
You can modify them into
df3 = pd.concat([df,df2],axis=1) ## axis=1 means second dataframe will add to columns, default value is axis=0 which adds to the rows
Second point is,
df3 = df3.loc[:, ['renderedContent']]
I think you want to write this one , instead of df3=df.loc[:,['renderedContent']].
Hope it will solve your problem.

Sum() returning a copy of the last column in df?

I am new to pandas and seem to get a repeated column rather than the sum of two.CSV
result
df = pd.read_csv('0xb4a0a46d3042a739ec76fd67a3f1b99cc12ac1d9_mcap.csv', sep=',')
df1 = df.copy(deep=True)
df2 = df1.loc[:, ('mcap_token0', 'mcap_token1')]
df2 = df2.reset_index(drop=True)
df2.loc[:, 'sum'] = df2.sum(axis=1)
It's working fine, just you don't see it behind the decimals

Dropping index in DataFrame for CSV file

Working with a CSV file in PyCharm. I want to delete the automatically-generated index column. When I print it, however, the answer I get in the terminal is "None". All the answers by other users indicate that the reset_index method should work.
If I just say "df = df.reset_index(drop=True)" it does not delete the column, either.
import pandas as pd
df = pd.read_csv("music.csv")
df['id'] = df.index + 1
cols = list(df.columns.values)
df = df[[cols[-1]]+cols[:3]]
df = df.reset_index(drop=True, inplace=True)
print(df)
I agree with #It_is_Chris. Also,
This is not true because return is None:
df = df.reset_index(drop=True, inplace=True)
It's should be like this:
df.reset_index(drop=True, inplace=True)
or
df = df.reset_index(drop=True)
Since you said you're trying to "delete the automatically-generated index column" I could think of two solutions!
Fist solution:
Assign the index column to your dataset index column. Let's say your dataset has already been indexed/numbered, then you could do something like this:
#assuming your first column in the dataset is your index column which has the index number of zero
df = pd.read_csv("yourfile.csv", index_col=0)
#you won't see the automatically-generated index column anymore
df.head()
Second solution:
You could delete it in the final csv:
#To export your df to a csv without the automatically-generated index column
df.to_csv("yourfile.csv", index=False)

Merge two dataframe to reduce memory consumption

I am trying to explode a list in my dataframe column and merge it back to the df, but i get a memory error while merging the flatten column with the initial dataframe. I would like to know if i can merge it in chunks, so that i can overcome the memory issue.
def flatten_colum_with_list(df, column, reset_index=False):
column_to_flatten = pd.DataFrame([[i, x] for i, y in df[column].apply(list).iteritems() for x in y], columns=['I', column])
column_to_flatten = column_to_flatten.set_index('I')
df = df.drop(column, axis=1)
df = df.merge(column_to_flatten, left_index=True, right_index=True)
if reset_index:
df = df.reset_index(drop=True)
return df
I would appreciate any support.
Regarding this, you can simply use the following code:
df.explode(*column name here*,ignore_index=True)
The ignore_index set to true will set the index to 0,1,2,....... order.

Pandas xs slow for DataFrame.apply

I have a DataFrame with multi-index ['timestamp', 'symbol'] that contains timeseries data. I merging this data with other samples and my apply function that uses asof is similar to:
df.apply(lambda x: df2.xs(x['symbol'], level='symbol').index.asof(x['timestamp'])), axis=1)
I think the actual xs to filter on symbol is what is causing it to be so slow, so I am instead creating a dict of 'symbol' -> df where the values are already filtered so I can just call index.asof directly. Am I approaching this the wrong way?
Example:
df = pd.read_csv(StringIO("ts,symbol,bid,ask\n2014-03-03T09:30:00,A,54.00,55.00\n2014-03-03T09:30:05,B,34.00,35.00"), parse_dates='ts', index_col=['ts', 'symbol'])
df2 = pd.read_csv(StringIO("ts,eventId,symbol\n2014-03-03T09:32:00,1,A\n2014-03-03T09:33:05,2,B"), parse_dates='ts', index_col=['ts', 'symbol'])
# find ts to join with and use xs so we can use indexof
df2['event_ts'] = df2.apply(lambda x: df.xs(x['symbol'], level='symbol').index.asof(x['ts'])), axis=1)
# merge in fields
df2 = pd.merge(df2, df, left_on=['event_ts', 'symbol'], right_index=True)

Categories