Replace / Append data in Speadsheet

Replace / Append data in Speadsheet - python

I need to edit data in a spreadsheet as below-
Replace: if date already exists in spreadsheet;
Append: if date doesn't exist in spreadsheet
Sample data attached below-
Kindly help.

Use concat with DataFrame.drop_duplicates and DataFrame.sort_values:
df1['Date'] = pd.to_datetime(df1['Date'], dayfirst=True)
df2['Date'] = pd.to_datetime(df2['Date'], dayfirst=True)
df = (pd.concat([df1, df2])
.drop_duplicates('Date', keep='last')
.sort_values('Date', ignore_index=True))

Use:
df = pd.DataFrame({'date':pd.date_range('2021-6-1', '2021-6-15'), 'price': range(15)})
new_df = pd.DataFrame({'date':pd.date_range('2021-6-11', '2021-6-17'), 'price': range(15,22)})
df.merge(new_df, left_on='date', right_on='date', how='outer').apply(lambda x: x['price_y'] if not np.isnan(x['price_y']) else x['price_x'], axis = 1)
Result:

Related

pandas rename multi-level column having the same name

When I use aggregate function, the resulting columns 'price' and 'carat' have the same column name of 'mean'.
How do i rename the mean under the price to price_mean and under carat to carat_mean.
I can't change them individually.
diamonds.groupby('cut').agg({
'price': ['count', 'mean'],
'carat': 'mean'
}).rename(columns={'mean':'price_mean','mean':'carat_mean'}, level = 1)
})

You could try this:
# Rename columns of level 1
df1 = df["price"]
df1.columns = ["count", "carat_mean"]
df2 = df["carat"]
df2.columns = ["carat_mean"]
# Aggregate dfs (with renamed columns) under level 0 columns
df = pd.concat([df1, df2], axis=1, keys=['price', 'carat'])
print(df)
# Outputs
price carat
count carat_mean carat_mean
Fair 0.693995 -0.632283 0.789963
Good 0.099057 1.005623 0.143289
Ideal -0.277984 -0.105138 -0.611168

Python - Unpivotting multiple columns into single column

As the title suggest, I am trying to turn this
into this
The end-goal is ideally so that I'd be able to group-by them up and get a word count.
This is the code that I've tried so far. I am unsure.
df_unpivoted = final_df.melt(df.reset_index(), id_vars= 'index', var_name = 'Count', value_name = 'Value')
df = final_df.rename(columns=lambda x: x + x[-1] if x.startswith('index') else x)
df = pd.wide_to_long(df, ['0'], i='id', j='i')
df = df.reset_index(level=1, drop=True).reset_index()

Python Pandas Split DF

pls review the code below, is there a more efficient way of splitting one DF into two? In the code below, the query is run twice. Would it be faster to just run the query once, and basically say if true send to DF1, else to DF2 ; or maybe after DF1 is created, someway to say that DF2 = DF minus DF1
code:
x1='john'
df = pd.read_csv(file, sep='\n', header=None, engine='python', quoting=3)
df = df[0].str.strip(' \t"').str.split('[,|;: \t]+', 1, expand=True).rename(columns={0: 'email', 1: 'data'})
df1= df[df.email.str.startswith(x1)]
df2= df[~df.email.str.startswith(x1)]

There's no need to compute the mask df.emailclean.str.startswith(x1) twice.
mask = df.emailclean.str.startswith(x1)
df1 = df[mask].copy() # in order not have SettingWithCopyWarning
df2 = df[~mask].copy() # https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas

Sort a Pandas DataFrame using both Date and Time

I'm Trying to sort my dataframe using "sort_value" Im not getting the desired output
df1 = pd.read_csv('raw data/120_FT DDMG.csv')
df2 = pd.read_csv('raw data/120_FT MG.csv')
df3 = pd.read_csv('raw data/120_FT DD.csv')
dconcat = pd.concat([df1,df2,df3])
dconcat['date'] = pd.to_datetime(dconcat['ActivityDates(Individual)']+' '+dconcat['ScheduledStartTime'])
dconcat.sort_values(by='date')
dconcat = dconcat.set_index('date')
print(dconcat)

sort_values returns a data frame which is sorted if inplace=False.
so dconcat=dconcat.sort_values(by='date')
or you can do dconcat.sort_values(by='date', inplace=True)
you can try this;
dconcat = pd.concat([df1,df2,df3])
dconcat['date'] = pd.to_datetime(dconcat['ActivityDates(Individual)']+' '+dconcat['ScheduledStartTime'])
dconcat.set_index('date', inplace=True)
dconcat.sort_index(inplace=True)
print(dconcat)

How to replace a string in a pandas multiindex?

I have a dataframe with a large multiindex, sourced from a vast number of csv files. Some of those files have errors in the various labels, ie. "window" is missspelled as "winZZw", which then causes problems when I select all windows with df.xs('window', level='middle', axis=1).
So I need a way to simply replace winZZw with window.
Here's a very minimal sample df: (lets assume the data and the 'roof', 'window'… strings come from some convoluted text reader)
header = pd.MultiIndex.from_product(['roof', 'window', 'basement'], names = ['top', 'middle', 'bottom'])
dates = pd.date_range('01/01/2000','01/12/2010', freq='MS')
data = np.random.randn(len(dates))
df = pd.DataFrame(data, index=dates, columns=header)
header2 = pd.MultiIndex.from_product(['roof', 'winZZw', 'basement'], names = ['top', 'middle', 'bottom'])
data = 3*(np.random.randn(len(dates)))
df2 = pd.DataFrame(data, index=dates, columns=header2)
df = pd.concat([df, df2], axis=1)
header3 = pd.MultiIndex.from_product(['roof', 'door', 'basement'], names = ['top', 'middle', 'bottom'])
data = 2*(np.random.randn(len(dates)))
df3 = pd.DataFrame(data, index=dates, columns=header3)
df = pd.concat([df, df3], axis=1)
Now I want to xs a new dataframe for all the houses that have a window at their middle level: windf = df.xs('window', level='middle', axis=1)
But this obviously misses the misspelled winZZw.
So, how I replace winZZw with window?
The only way I found was to use set_levels, but if I understood that correctly, I need to feed it the whole level, ie
df.columns.set_levels([u'window',u'window', u'door'], level='middle',inplace=True)
but this has two issues:
I need to pass it the whole index, which is easy in this sample, but impossible/stupid for a thousand column df with hundreds of labels.
It seems to need the list backwards (now, my first entry in the df has door in the middle, instead of the window it had). That can probably be fixed, but it seems weird
I can work around these issues by xsing a new df of only winZZws, and then setting the levels with set_levels(df.shape[1]*[u'window'], level='middle') and then concatting it together again, but I'd like to have something more straightforward analog to str.replace('winZZw', 'window'), but I can't figure out how.

Use rename with specifying level:
header = pd.MultiIndex.from_product([['roof'],[ 'window'], ['basement']], names = ['top', 'middle', 'bottom'])
dates = pd.date_range('01/01/2000','01/12/2010', freq='MS')
data = np.random.randn(len(dates))
df = pd.DataFrame(data, index=dates, columns=header)
header2 = pd.MultiIndex.from_product([['roof'], ['winZZw'], ['basement']], names = ['top', 'middle', 'bottom'])
data = 3*(np.random.randn(len(dates)))
df2 = pd.DataFrame(data, index=dates, columns=header2)
df = pd.concat([df, df2], axis=1)
header3 = pd.MultiIndex.from_product([['roof'], ['door'], ['basement']], names = ['top', 'middle', 'bottom'])
data = 2*(np.random.randn(len(dates)))
df3 = pd.DataFrame(data, index=dates, columns=header3)
df = pd.concat([df, df3], axis=1)
df = df.rename(columns={'winZZw':'window'}, level='middle')
print(df.head())
top roof
middle window door
bottom basement basement basement
2000-01-01 -0.131052 -1.189049 1.310137
2000-02-01 -0.200646 1.893930 2.124765
2000-03-01 -1.690123 -2.128965 1.639439
2000-04-01 -0.794418 0.605021 -2.810978
2000-05-01 1.528002 -0.286614 0.736445

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace / Append data in Speadsheet - python

I need to edit data in a spreadsheet as below- Replace: if date already exists in spreadsheet; Append: if date doesn't exist in spreadsheet Sample data attached below- Kindly help.

Use concat with DataFrame.drop_duplicates and DataFrame.sort_values: df1['Date'] = pd.to_datetime(df1['Date'], dayfirst=True) df2['Date'] = pd.to_datetime(df2['Date'], dayfirst=True) df = (pd.concat([df1, df2]) .drop_duplicates('Date', keep='last') .sort_values('Date', ignore_index=True))

Related

pandas rename multi-level column having the same name

Python - Unpivotting multiple columns into single column

Python Pandas Split DF

Sort a Pandas DataFrame using both Date and Time

How to replace a string in a pandas multiindex?

Categories

Resources