I have a time series DataFrame df1 with prices in a ticker column, from which a new DataFrame df2 is created by concatenating df1 with 3 other columns sharing the same DateTimeIndex, as shown:
Now I need to set up the ticker name "Equity(42950 [FB])" to become the new header and to nest the 3 other columns under it, and to have the ticker's prices replaced by the values in the "closePrice" column.
How to achieve this in Python?
pd.MultiIndex:
d = pd.DataFrame(np.arange(20).reshape(5,4), columns=['Equity', 'closePrice', 'mMb', 'mMv'])
arrays = [['Equity','Equity','Equity'],['closePrice', 'mMb','mMv']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples)
df = pd.DataFrame(d.values[:, 1:], columns=index)
df
Equity
closePrice mMb mMv
0 1 2 3
1 5 6 7
2 9 10 11
3 13 14 15
4 17 18 19
Related
I'd like to use a lookup table (a list of strings) to search a dataframe column(name) and return a timeseries of row values of other columns. The list of lookup strings will not exactly match the strings in the name column, but will contain similar strings (startswith?).
lookup=["AB","AC","AX"]
DF = name year mean upper lower
AB2 2020 4 7 1
AB_7 2021 5 9 2
AC1 2022 3 9 2
AX 2019 4 9 2
return:
AB.csv
2020 4 7 1
2021 5 9 2
AC.csv
2022 3 9 3
AX.csv
2019 4 9 2
You can use Pandas filter and the regex parameter with rf'^{selector}'. For each loop, this regex will keep only the rows (axis=0) for which the pattern matches the selector at the beginning of the string in the index column (set_index('name')).
import pandas as pd
df = pd.read_csv('sample.csv')
print(df)
lookup=["AB","AC","AX"]
df = df.set_index('name')
for selector in lookup:
filtered_df = df.filter(regex=rf'^{selector}', axis=0)
filtered_df.to_csv(f'{selector}.csv', index=False)
AB.csv
year,mean,upper,lower
2020,4,7,1
2021,5,9,2
AC.csv
year,mean,upper,lower
2022,3,9,2
AX.csv
year,mean,upper,lower
2019,4,9,2
I have a pandas pivot table that lists individuals in rows and data sources across the columns. There are hundreds of individuals going down amongst the rows and hundreds of sources going across along the columns.
Desired_Value Source_1 Source_2 Source_3 ... Source_50
person1 20 20 20 20
person2 5 5 5 5
person3 Review 3 4 4 4
...
person50 1 1 1
What I want to do is create the Desired_Value column above. I want to pull in a value so long as it matches across all values (ignoring blank fields). If values do not match I want to show Review.
I use this pandas command to print my df to excel currently (without any Desired_Value column):
df13 = df12.pivot_table(index='person', columns = 'source_name', values = 'actual_data', aggfunc='first')
I'm new to Python so apologies if this is a silly question.
This is one method to do it:
df = df13.copy()
df = df.astype('Int64') # So NaN and Int values can coexist
# Create new column at the front of the data frame
df['Desired_Value'] = np.nan
cols = df.columns.tolist()
cols = cols[-1:] + cols[:-1]
df = df[cols]
# Loop over all rows and flag columns for review
for idx, row in df.iterrows():
val = row.dropna().unique()
if len(val) == 1:
df.loc[idx, 'Desired_Value'] = val
else:
df.loc[idx, 'Desired_Value'] = 'Review'
print(df)
Desired_Value Source_1 Source_2 Source_3 Source_50
person1 20 20 20 NaN 20
person2 5 5 NaN 5 5
person3 Review 3 4 4 4
person50 1 1 NaN NaN 1
I have a huge data set in a pandas data frame. It looks something like this
df = pd.DataFrame([[1,2,3,4],[31,14,13,11],[115,613,1313,1]], columns=['c1','c1','c2','c2'])
Here first two columns have same name. So they should be concatenated into a single column so the the values are one below another. so the dataframe should look something like this:
df1 = pd.DataFrame([[1,3],[31,13],[115,1313],[2,4],[14,11],[613,1]], columns=['c1','c2'])
Note: My orignal dataframe has many column so I cannot used simple concat function to stack the columns. Also I tried using stack function, apart from concat function. What can I do?
use groupby + cumcount to create a pd.MultiIndex. Reassign column with new pd.MultiIndex and stack
df = pd.DataFrame(
[[1,2,3,4],[31,14,13,11],[115,613,1313,1]],
columns=['c1','c1','c2','c2'])
df1 = df.copy()
df1.columns = [df.columns, df.columns.to_series().groupby(level=0).cumcount()]
print(df1.stack().reset_index(drop=True))
c1 c2
0 1 3
1 2 4
2 31 13
3 14 11
4 115 1313
5 613 1
Or with a bit of creativity, in one line
df.T.set_index(
df.T.groupby([df.columns]).cumcount(),
append=True
).unstack().T.reset_index(drop=True)
c1 c2
0 1 3
1 2 4
2 31 13
3 14 11
4 115 1313
5 613 1
You could melt the dataframe, then count entries within each column to use as index for the new dataframe and then unstack it back like this:
import pandas as pd
df = pd.DataFrame(
[[1,2,3,4],[31,14,13,11],[115,613,1313,1]],
columns=['c1','c1','c2','c2'])
df1 = (pd.melt(df,var_name='column')
.assign(n = lambda x: x.groupby('column').cumcount())
.set_index(['n','column'])
.unstack())
df1.columns=df1.columns.get_level_values(1)
print(df1)
Which produces
column c1 c2
n
0 1 3
1 31 13
2 115 1313
3 2 4
4 14 11
5 613 1
My dataframe looks as below:
id, date, target
1,2016-10-24,22
1,2016-10-25,31
1,2016-10-27,44
1,2016-10-28,12
2,2016-10-21,22
2,2016-10-22,31
2,2016-10-25,44
2,2016-10-27,12
Given the dataframe above, I want to select last 2 rows of ids to make a df2, and another df1 with the rest.
df1
id, date, target
1,2016-10-24,22
1,2016-10-25,31
2,2016-10-21,22
2,2016-10-22,31
df2
id, date, target
1,2016-10-27,44
1,2016-10-28,12
2,2016-10-25,44
2,2016-10-27,12
How can I do this?
Thanks in advance.
You can use GroupBy.tail for creating df2, then get difference of original with df1 index and select by loc rows from df - this is df1:
df2 = df.groupby('id').tail(2)
print (df2)
id date target
2 1 2016-10-27 44
3 1 2016-10-28 12
6 2 2016-10-25 44
7 2 2016-10-27 12
print (df.index.difference(df2.index))
Int64Index([0, 1, 4, 5], dtype='int64')
df1 = df.loc[df.index.difference(df2.index)]
print (df1)
id date target
0 1 2016-10-24 22
1 1 2016-10-25 31
4 2 2016-10-21 22
5 2 2016-10-22 31
You can use df.groupby('id').tail(2): http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.tail.html
I have the following DataFrame:
index PUBLICO CLASSIFICACAO_PUBLICO
0 19 143643 1
1 34 111879 2
2 31 50382 3
3 9 49204 4
4 32 37541 5
5 4 36095 6
I need convert the index name column to index column.
For example:
index PUBLICO CLASSIFICACAO_PUBLICO
19 143643 1
34 111879 2
31 50382 3
9 49204 4
32 37541 5
4 36095 6
I try use df.set_index('index'), but it didn't work.
The column with the name index previously was the index column the DataFrame, but I used reset_index(); now I need to do the reverse.
The method set_index doesn't work inplace. So that you have to reassign your dataframe, or to pass the option inplace = True:
df = df.set_index('index')
or
df.set_index('index',inplace = True)
see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html
You can try it this way:
df.set_index(df['index'], inplace=True)
This will set your index column as the index in your dataframe and your index column will still remain in your dataframe as well. Then, you can just drop that column.
df.drop('index', axis=1, inplace=True)