Want MultiIndex for rows and columns with read_csv

Want MultiIndex for rows and columns with read_csv - python

My .csv file looks like:
Area When Year Month Tickets
City Day 2015 1 14
City Night 2015 1 5
Rural Day 2015 1 18
Rural Night 2015 1 21
Suburbs Day 2015 1 15
Suburbs Night 2015 1 21
City Day 2015 2 13
containing 75 rows. I want both a row multiindex and column multiindex that looks like:
Area City Rural Suburbs
When Day Night Day Night Day Night
Year Month
2015 1 5.0 3.0 22.0 11.0 13.0 2.0
2 22.0 8.0 4.0 16.0 6.0 18.0
3 26.0 25.0 22.0 23.0 22.0 2.0
2016 1 20.0 25.0 39.0 14.0 3.0 10.0
2 4.0 14.0 16.0 26.0 1.0 24.0
3 22.0 17.0 7.0 24.0 12.0 20.0
I've read the .read_csv doc at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
I can get the row multiindex with:
df2 = pd.read_csv('c:\\Data\Tickets.csv', index_col=[2, 3])
I've tried:
df2 = pd.read_csv('c:\\Data\Tickets.csv', index_col=[2, 3], header=[1, 3, 5])
thinking [1, 3, 5] fetches 'City', 'Rural', and 'Suburbs'. How do I get the desired column multiindex shown above?

Seems like you need to pivot_table with multiple indexes and multiple columns.
Start with just reading you csv plainly
df = pd.read_csv('Tickets.csv')
Then
df.pivot_table(index=['Year', 'Month'], columns=['Area', 'When'], values=['Tickets'])
With the input data you provided, you'd get
Area City Rural Suburbs
When Day Night Day Night Day Night
Year Month
2015 1 14.0 5.0 18.0 21.0 15.0 21.0
2 13.0 NaN NaN NaN NaN NaN

Related

Generate weeks from column with dates

I have a large dataset which contains a date column that covers from the year 2019. Now I do want to generate number of weeks on a separate column that are contained in those dates.
Here is how the date column looks like:
import pandas as pd
data = {'date': ['2019-09-10', 'NaN', '2019-10-07', '2019-11-04', '2019-11-28',
'2019-12-02', '2020-01-24', '2020-01-29', '2020-02-05',
'2020-02-12', '2020-02-14', '2020-02-24', '2020-03-11',
'2020-03-16', '2020-03-17', '2020-03-18', '2021-09-14',
'2021-09-30', '2021-10-07', '2021-10-08', '2021-10-12',
'2021-10-14', '2021-10-15', '2021-10-19', '2021-10-21',
'2021-10-26', '2021-10-28', '2021-10-29', '2021-11-02',
'2021-11-15', '2021-11-16', '2021-12-01', '2021-12-07',
'2021-12-09', '2021-12-10', '2021-12-14', '2021-12-15',
'2022-01-13', '2022-01-14', '2022-01-21', '2022-01-24',
'2022-01-25', '2022-01-27', '2022-01-31', '2022-02-01',
'2022-02-10', '2022-02-11', '2022-02-16', '2022-02-24']}
df = pd.DataFrame(data)
Now as from the first day this data was collected, I want to count 7 days using the date column and create a week out it. an example if the first week contains the 7 dates, I create a column and call it week one. I want to do the same process until the last week the data was collected.
Maybe it will be a good idea to organize the dates in order as from the first date to current one.
I have tried this but its not generating weeks in order, it actually has repetitive weeks.
pd.to_datetime(df['date'], errors='coerce').dt.week
My intention is, as from the first date the date was collected, count 7 days and store that as week one then continue incrementally until the last week say week number 66.
Here is the expected column of weeks created from the date column
import pandas as pd
week_df = {'weeks': ['1', '2', "3", "5", '6']}
df_weeks = pd.DataFrame(week_df)

IIUC use:
df['date'] = pd.to_datetime(df['date'])
df['week'] = df['date'].sub(df['date'].iat[0]).dt.days // 7 + 1
print (df.head(10))
date week
0 2019-09-10 1.0
1 NaT NaN
2 2019-10-07 4.0
3 2019-11-04 8.0
4 2019-11-28 12.0
5 2019-12-02 12.0
6 2020-01-24 20.0
7 2020-01-29 21.0
8 2020-02-05 22.0
9 2020-02-12 23.0

You have more than 66 weeks here, so either you want the real week count since the beginning or you want a dummy week rank. See below for both solutions:
# convert to week period
s = pd.to_datetime(df['date']).dt.to_period('W')
# get real week number
df['week'] = s.sub(s.iloc[0]).dropna().apply(lambda x: x.n).add(1)
# get dummy week rank
df['week2'] = s.rank(method='dense')
output:
date week week2
0 2019-09-10 1.0 1.0
1 NaN NaN NaN
2 2019-10-07 5.0 2.0
3 2019-11-04 9.0 3.0
4 2019-11-28 12.0 4.0
5 2019-12-02 13.0 5.0
6 2020-01-24 20.0 6.0
7 2020-01-29 21.0 7.0
8 2020-02-05 22.0 8.0
9 2020-02-12 23.0 9.0
10 2020-02-14 23.0 9.0
11 2020-02-24 25.0 10.0
12 2020-03-11 27.0 11.0
13 2020-03-16 28.0 12.0
14 2020-03-17 28.0 12.0
15 2020-03-18 28.0 12.0
16 2021-09-14 106.0 13.0
17 2021-09-30 108.0 14.0
18 2021-10-07 109.0 15.0
19 2021-10-08 109.0 15.0
...
42 2022-01-27 125.0 26.0
43 2022-01-31 126.0 27.0
44 2022-02-01 126.0 27.0
45 2022-02-10 127.0 28.0
46 2022-02-11 127.0 28.0
47 2022-02-16 128.0 29.0
48 2022-02-24 129.0 30.0

How to sort multiindex column month names?

I have this multiindex df:
YEARS_TMAX TMAX YEARS_TMAX TMAX YEARS_TMAX
MONTH April April August August December .....
CODE NAME
000130 RICA PLAYA 21.0 31.5 21.0 21.5 22.0
000132 PUERTO PIZARRO 12.0 33.8 12.0 32.4 11.0
000134 PAPAYAL 23.0 33.2 22.0 22.4 21.0
000135 EL SALTO 22.0 33.6 23.0 22.8 22.0
000136 CAÑAVERAL 16.0 32.7 15.0 33.1 11.0
... ... ... ... ...
158317 SUSAPAYA 19.0 17.6 19.0 17.3 21.0
158321 PALCA 16.0 19.3 17.0 19.8 16.0
158323 TALABAYA 12.0 17.6 13.0 17.5 13.0
158326 CAPAZO 17.0 13.6 17.0 13.0 19.0
158328 PAUCARANI 14.0 13.3 13.0 11.9 15.0
I want to sort columns by month name (and TMAX columns first) like this:
TMAX YEARS_TMAX TMAX YEARS_TMAX TMAX
MONTH January January February February March .....
CODE NAME
000130 RICA PLAYA 22.0 31.5 23.0 27.5 23.0
000132 PUERTO PIZARRO 17.0 32.8 18.0 30.4 18.0
000134 PAPAYAL 25.0 32.2 26.0 28.4 25.0
000135 EL SALTO 26.0 31.6 26.0 26.8 26.0
000136 CAÑAVERAL 16.0 32.7 18.0 31.1 15.0
... ... ... ... ...
158317 SUSAPAYA 19.0 17.6 19.0 17.3 21.0
158321 PALCA 16.0 19.3 17.0 19.8 16.0
158323 TALABAYA 12.0 17.6 13.0 17.5 13.0
158326 CAPAZO 17.0 13.6 17.0 13.0 19.0
158328 PAUCARANI 14.0 13.3 13.0 11.9 15.0
So i wrote this code:
source: Sort "Date" in Multi-Index
dates = pd.to_datetime(df.columns.get_level_values(1), format='%B')
df.columns = [df.columns.get_level_values(0), dates]
df = df.sort_index(axis=1, level=1)
To sort columns by month but dates is not creating month names, dates is creating random dates.
How can i solve this?
Thanks in advance.

Use a CategoricalDtype by creating an ordered dtype from calendar.month_name this will ensure the correct ordering by sort.
month_dtype = pd.CategoricalDtype(categories=list(month_name), ordered=True)
df.columns = [df.columns.get_level_values(0),
df.columns.get_level_values(1).astype(month_dtype)]
df = df.sort_index(axis=1, level=[1, 0])
Sample Data and Imports:
from calendar import month_name
import pandas as pd
df = pd.DataFrame(
[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12]],
columns=pd.MultiIndex.from_product([
['YEARS_TMAX', 'TMAX'],
['March', 'January', 'February']
])
)
df before sort:
YEARS_TMAX TMAX
March January February March January February
0 1 2 3 4 5 6
1 7 8 9 10 11 12
df after sort:
TMAX YEARS_TMAX TMAX YEARS_TMAX TMAX YEARS_TMAX
January January February February March March
0 5 2 6 3 4 1
1 11 8 12 9 10 7
The datetime approach would also work, but converting back to strings would be necessary with DatetimeIndex.strftime:
df.columns = [df.columns.get_level_values(0),
pd.to_datetime(df.columns.get_level_values(1), format='%B')]
df = df.sort_index(axis=1, level=[1, 0])
# convert back to strings
df.columns = [df.columns.get_level_values(0),
df.columns.get_level_values(1).strftime('%B')]
df:
TMAX YEARS_TMAX TMAX YEARS_TMAX TMAX YEARS_TMAX
January January February February March March
0 5 2 6 3 4 1
1 11 8 12 9 10 7
The drawback of this approach is level 1 is once again a string type which would need to be converted any time ordering needed to be changed as lexicographic ordering is not expected.

pandas dataframe interpolate for Nans with groupby using window of discrete days of the year

The small reproducible example below sets up a dataframe that is 100 yrs in length containing some randomly generated values. It then inserts 3 100-day stretches of missing values. Using this small example, I am attempting to sort out the pandas commands that will fill in the missing days using average values for that day of the year (hence the use of .groupby) with a condition. For example, if April 12th is missing, how can the last line of code be altered such that only the 10 nearest April 12th's are used to fill in the missing value? In other words, a missing April 12th value in 1920 would be filled in using the mean April 12th values between 1915 to 1925; a missing April 12th value in 2000 would be filled in with the mean April 12th values between 1995 to 2005, etc. I tried playing around with adding a .rolling() to the lambda function in last line of script, but was unsuccessful in my attempt.
Bonus question: The example below extends from 1918 to 2018. If a value is missing on April 12th 1919, for example, it would still be nice if ten April 12ths were used to fill in the missing value even though the window couldn't be 'centered' on the missing day because of its proximity to the beginning of the time series. Is there a solution to the first question above that would be flexible enough to still use a minimum of 10 values when missing values are close to the beginning and ending of the time series?
import pandas as pd
import numpy as np
import random
# create 100 yr time series
dates = pd.date_range(start="1918-01-01", end="2018-12-31").strftime("%Y-%m-%d")
vals = [random.randrange(1, 50, 1) for i in range(len(dates))]
# Create some arbitrary gaps
vals[100:200] = vals[9962:10062] = vals[35895:35995] = [np.nan] * 100
# Create dataframe
df = pd.DataFrame(dict(
list(
zip(["Date", "vals"],
[dates, vals])
)
))
# confirm missing vals
df.iloc[95:105]
df.iloc[35890:35900]
# set a date index (for use by groupby)
df.index = pd.DatetimeIndex(df['Date'])
df['Date'] = df.index
# Need help restricting the mean to the 10 nearest same-days-of-the-year:
df['vals'] = df.groupby([df.index.month, df.index.day])['vals'].transform(lambda x: x.fillna(x.mean()))

This answers both parts
build a DF dfr that is the calculation you want
lambda function returns a dict {year:val, ...}
make sure indexes are named in reasonable way
expand out dict with apply(pd.Series)
reshape by putting year columns back into index
merge() built DF with original DF. vals column contains NaN 0 column is value to fill
finally fillna()
# create 100 yr time series
dates = pd.date_range(start="1918-01-01", end="2018-12-31")
vals = [random.randrange(1, 50, 1) for i in range(len(dates))]
# Create some arbitrary gaps
vals[100:200] = vals[9962:10062] = vals[35895:35995] = [np.nan] * 100
# Create dataframe - simplified from question...
df = pd.DataFrame({"Date":dates,"vals":vals})
df[df.isna().any(axis=1)]
ystart = df.Date.dt.year.min()
# generate rolling means for month/day. bfill for when it's start of series
dfr = (df.groupby([df.Date.dt.month, df.Date.dt.day])["vals"]
.agg(lambda s: {y+ystart:v for y,v in enumerate(s.dropna().rolling(5).mean().bfill())})
.to_frame().rename_axis(["month","day"])
)
# expand dict into columns and reshape to by indexed by month,day,year
dfr = dfr.join(dfr.vals.apply(pd.Series)).drop(columns="vals").rename_axis("year",axis=1).stack().to_frame()
# get df index back, plus vals & fillna (column 0) can be seen alongside each other
dfm = df.merge(dfr, left_on=[df.Date.dt.month,df.Date.dt.day,df.Date.dt.year], right_index=True)
# finally what we really want to do - fill tha NaNs
df.fillna(dfm[0])
analysis
taking NaN for 11-Apr-1918, default is 22 as it's backfilled from 1921
(12+2+47+47+2)/5 == 22
dfm.query("key_0==4 & key_1==11").head(7)
key_0
key_1
key_2
Date
vals
0
100
4
11
1918
1918-04-11 00:00:00
nan
22
465
4
11
1919
1919-04-11 00:00:00
12
22
831
4
11
1920
1920-04-11 00:00:00
2
22
1196
4
11
1921
1921-04-11 00:00:00
47
27
1561
4
11
1922
1922-04-11 00:00:00
47
36
1926
4
11
1923
1923-04-11 00:00:00
2
34.6
2292
4
11
1924
1924-04-11 00:00:00
37
29.4

I'm not sure how far I've gotten with the intent of your question. The approach I've taken is to satisfy two requirements
Need an arbitrary number of averages
Use those averages to fill in the NA
I have addressed the
Simply put, instead of filling in the NA with before and after dates, I fill in the NA with averages extracted from any number of years in a row.
import pandas as pd
import numpy as np
import random
# create 100 yr time series
dates = pd.date_range(start="1918-01-01", end="2018-12-31").strftime("%Y-%m-%d")
vals = [random.randrange(1, 50, 1) for i in range(len(dates))]
# Create some arbitrary gaps
vals[100:200] = vals[9962:10062] = vals[35895:35995] = [np.nan] * 100
# Create dataframe
df = pd.DataFrame(dict(
list(
zip(["Date", "vals"],
[dates, vals])
)
))
df['Date'] = pd.to_datetime(df['Date'])
df['mm-dd'] = df['Date'].apply(lambda x:'{:02}-{:02}'.format(x.month, x.day))
df['yyyy'] = df['Date'].apply(lambda x:'{:04}'.format(x.year))
df = df.iloc[:,1:].pivot(index='mm-dd', columns='yyyy')
df.columns = df.columns.droplevel(0)
df['nans'] = df.isnull().sum(axis=1)
df['10n_mean'] = df.iloc[:,:-1].sample(n=10, axis=1).mean(axis=1)
df['10n_mean'] = df['10n_mean'].round(1)
df.loc[df['nans'] >= 1]
yyyy 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 ... 2011 2012 2013 2014 2015 2016 2017 2018 nans 10n_mean
mm-dd
02-29 NaN NaN 34.0 NaN NaN NaN 2.0 NaN NaN NaN ... NaN 49.0 NaN NaN NaN 32.0 NaN NaN 76 21.6
04-11 NaN 43.0 12.0 28.0 29.0 28.0 1.0 38.0 11.0 3.0 ... 17.0 35.0 8.0 17.0 34.0 NaN 5.0 33.0 3 29.7
04-12 NaN 19.0 38.0 34.0 48.0 46.0 28.0 29.0 29.0 14.0 ... 41.0 16.0 9.0 39.0 8.0 NaN 1.0 12.0 3 21.3
04-13 NaN 33.0 26.0 47.0 21.0 26.0 20.0 16.0 11.0 7.0 ... 5.0 11.0 34.0 28.0 27.0 NaN 2.0 46.0 3 21.3
04-14 NaN 36.0 19.0 6.0 45.0 41.0 24.0 39.0 1.0 11.0 ... 30.0 47.0 45.0 14.0 48.0 NaN 16.0 8.0 3 24.7
df_mean = df.T.fillna(df['10n_mean'], downcast='infer').T
df_mean.loc[df_mean['nans'] >= 1]
yyyy 1918 1919 1920 1921 1922 1923 1924 1925 1926 1927 ... 2011 2012 2013 2014 2015 2016 2017 2018 nans 10n_mean
mm-dd
02-29 21.6 21.6 34.0 21.6 21.6 21.6 2.0 21.6 21.6 21.6 ... 21.6 49.0 21.6 21.6 21.6 32.0 21.6 21.6 76.0 21.6
04-11 29.7 43.0 12.0 28.0 29.0 28.0 1.0 38.0 11.0 3.0 ... 17.0 35.0 8.0 17.0 34.0 29.7 5.0 33.0 3.0 29.7
04-12 21.3 19.0 38.0 34.0 48.0 46.0 28.0 29.0 29.0 14.0 ... 41.0 16.0 9.0 39.0 8.0 21.3 1.0 12.0 3.0 21.3
04-13 21.3 33.0 26.0 47.0 21.0 26.0 20.0 16.0 11.0 7.0 ... 5.0 11.0 34.0 28.0 27.0 21.3 2.0 46.0 3.0 21.3
04-14 24.7 36.0 19.0 6.0 45.0 41.0 24.0 39.0 1.0 11.0 ... 30.0 47.0 45.0 14.0 48.0 24.7 16.0 8.0 3.0 24.7

Reorganizing pandas dataframe turning Column into new Header, Original Header to be part of multiindex with a prexisting Column

I have been tasked with reorganizing a fairly large data set for analysis. I want to make a dataframe where each employee has a list of Stats associated with their Employee Number ordered based on how many periods they have been with the company. The data does not go all the way back to the start of the company so some employees will not appear in the first period. My guess is there's some combination of pivot and merge that I am unable to wrap my head around.
df1 looks like this:
Periods since Start Period Employee Number Wage Sick Days
0 3 202001 101 20 14
1 2 202001 102 15 12
2 1 202001 103 10 17
3 4 202002 101 20 14
4 3 202002 102 20 10
5 2 202002 103 10 13
6 5 202003 101 25 13
7 4 202003 102 20 9
8 3 202003 103 10 13
And I want df2 (Column# for reference only):
Column1 Column2 Column3 Column4 Column5
101 102 103
1 Wage NaN NaN 10
1 Sick Days NaN NaN 17
2 Wage NaN 15 10
2 Sick Days NaN 12 13
3 Wage 20 20 10
3 Sick Days 14 10 13
4 Wage 20 20 NaN
4 Sick Days 14 9 NaN
Column1 = 'Periods since Start'
Column2 = "Stat" e.g. 'Wage', 'Sick Days'
Column3 - Column 5 Headers = 'Employee Number'
First thoughts were to try pivot/merge/stack but I have had no good results.
The second option I thought of was to create a dataframe with the index and headers that I wanted and then populate it from df1
import pandas as pd
import numpy as np
stat_list = ['Wage', 'Sick Days']
largest_period = df1['Periods since Start'].max()
df2 = np.tile(stat_list, largest_period)
df2 = pd.DataFrame(data=df2, columns = ['Stat'])
df2['Period_Number'] = df2.groupby('Stat').cumcount()+1
df2 = pd.DataFrame(index = df2[['Period_Number', 'Stat']],
columns = df1['Employee Number'])
Which Yields:
Employee Number 101 102 103
(1, 'Wage') NaN NaN NaN
(1, 'Sick Days') NaN NaN NaN
(2, 'Wage') NaN NaN NaN
(2, 'Sick Days') NaN NaN NaN
(3, 'Wage') NaN NaN NaN
(3, 'Sick Days') NaN NaN NaN
(4, 'Wage') NaN NaN NaN
(4, 'Sick Days') NaN NaN NaN
But I am at a loss on how to populate it.

You can .melt and then .unstack the dataframe.
Finish up up with some multiindex column clean up using .droplevel and passing axis=1 to drop unnecessary levels on columns rather than the default axis=0, which would drop index columns. You can also use reset_index() to bring the index columns into your dataframe:
df = (df.melt(id_vars=['Periods since Start', 'Employee Number'],
value_vars=['Wage', 'Sick Days'])
.set_index(['Periods since Start', 'Employee Number', 'variable']).unstack(1)
.droplevel(0, axis=1)
.reset_index())
df
Out[1]:
Employee Number Periods since Start variable 101 102 103
0 1 Sick Days NaN NaN 17.0
1 1 Wage NaN NaN 10.0
2 2 Sick Days NaN 12.0 13.0
3 2 Wage NaN 15.0 10.0
4 3 Sick Days 14.0 10.0 13.0
5 3 Wage 20.0 20.0 10.0
6 4 Sick Days 14.0 9.0 NaN
7 4 Wage 20.0 20.0 NaN
8 5 Sick Days 13.0 NaN NaN
9 5 Wage 25.0 NaN NaN
When melting the dataframe, you can pass var_name= as the default is "variable". If you do that make sure to change the column name when using set_index() as well.

Try this, first melt the dataframe keeping Periods since Start, Employee Number, and Period in the index. Next, pivot the dataframe making rows and columns with 'value' from melt the values in the pivoted dataframe. Lastly, cleanup index with reset_index and remove the column index header name using rename_axis:
df.melt(['Periods since Start', 'Employee Number', 'Period'])\
.pivot(['Periods since Start', 'variable'], 'Employee Number', 'value')\
.reset_index()\
.rename_axis(None, axis=1)
Output:
Periods since Start variable 101 102 103
0 1 Sick Days NaN NaN 17.0
1 1 Wage NaN NaN 10.0
2 2 Sick Days NaN 12.0 13.0
3 2 Wage NaN 15.0 10.0
4 3 Sick Days 14.0 10.0 13.0
5 3 Wage 20.0 20.0 10.0
6 4 Sick Days 14.0 9.0 NaN
7 4 Wage 20.0 20.0 NaN
8 5 Sick Days 13.0 NaN NaN
9 5 Wage 25.0 NaN NaN

Pivot Tables on python

I have a csv with a table of data. I would like to read the csv and write a new xlsx file based on the initial csv.
However, I would also like to add a new column that will have a value based on the name of the header (e.g if the header contains the word online) and create sort of a pivot table based on this logic. So as opposed to having three columns for leads, I would have one column represented by three rows (for each column)
date online_won retail_won outbound_won online_leads retail_leads outbound_leads
1/1/11 9 10 11 12 14
2/1/11 1 2 13 15
3/1/11 10 8 14 17
This is the desired output
date source won leads
1/1/11 online 9 12
1/1/11 retail 10 14
1/1/11 outbound 11
.....
I would assume that I can solve this using pd.pivot_table. But can't figure out how to return columns as won and leads and extract only the online/retail/outbound part from the existing columns.

You could use pd.wide_to_long, with a little extra work on the columns, since wide format variables are assumed to start with the stub names:
df.columns = ['_'.join(j for j in i[::-1]) for i in df.columns.str.split('_')]
(pd.wide_to_long(df, stubnames=['won','leads'], i='date', j='source', suffix='_\w+')
.reset_index())
date source won leads
0 1/1/11 _online 9 12.0
1 2/1/11 _online 1 15.0
2 3/1/11 _online 10 17.0
3 1/1/11 _retail 10 14.0
4 2/1/11 _retail 2 NaN
5 3/1/11 _retail 8 NaN
6 1/1/11 _outbound 11 NaN
7 2/1/11 _outbound 13 NaN
8 3/1/11 _outbound 14 NaN

With a column rename, you can use wide_to_long
df.columns = ['_'.join(x.split('_')[::-1]) for x in df.columns ]
pd.wide_to_long(df, ['won','leads'], 'date', 'source', sep='_', suffix='\w+')
Output:
won leads
date source
1/1/11 online 9 12.0
2/1/11 online 1 15.0
3/1/11 online 10 17.0
1/1/11 retail 10 14.0
2/1/11 retail 2 NaN
3/1/11 retail 8 NaN
1/1/11 outbound 11 NaN
2/1/11 outbound 13 NaN
3/1/11 outbound 14 NaN

Another way using melt and series.str.split() with unstack()`:
m=df.melt('date').sort_values('date')
m[['Source','Status']]=m.pop('variable').str.split('_',expand=True)
final=(m.set_index(['date','Source','Status']).unstack()
.droplevel(0,axis=1).reset_index().rename_axis(None,axis=1))
date Source leads won
0 1/1/11 online 12.0 9.0
1 1/1/11 outbound NaN 11.0
2 1/1/11 retail 14.0 10.0
3 2/1/11 online 15.0 1.0
4 2/1/11 outbound NaN 13.0
5 2/1/11 retail NaN 2.0
6 3/1/11 online 17.0 10.0
7 3/1/11 outbound NaN 14.0
8 3/1/11 retail NaN 8.0

Use DataFrame.set_index with str.split for MultiIndex, so possible DataFrame.stack by first level 0:
df = df.set_index('date')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack(0).rename_axis(('date','Source')).reset_index()
print (df)
date Source leads won
0 1/1/11 online 12.0 9
1 1/1/11 outbound NaN 11
2 1/1/11 retail 14.0 10
3 2/1/11 online 15.0 1
4 2/1/11 outbound NaN 13
5 2/1/11 retail NaN 2
6 3/1/11 online 17.0 10
7 3/1/11 outbound NaN 14
8 3/1/11 retail NaN 8

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Want MultiIndex for rows and columns with read_csv - python

Related

Generate weeks from column with dates

How to sort multiindex column month names?

pandas dataframe interpolate for Nans with groupby using window of discrete days of the year

Reorganizing pandas dataframe turning Column into new Header, Original Header to be part of multiindex with a prexisting Column

Pivot Tables on python

Categories

Resources