convert pandas series to a dataframe [duplicate] - python

I have a Series, like this:
series = pd.Series({'a': 1, 'b': 2, 'c': 3})
I want to convert it to a dataframe like this:
a b c
0 1 2 3
pd.Series.to_frame() doesn't work, it got result like,
0
a 1
b 2
c 3
How can I construct a DataFrame from Series, with index of Series as columns?

You can also try this :
df = DataFrame(series).transpose()
Using the transpose() function you can interchange the indices and the columns.
The output looks like this :
a b c
0 1 2 3

You don't need the transposition step, just wrap your Series inside a list and pass it to the DataFrame constructor:
pd.DataFrame([series])
a b c
0 1 2 3
Alternatively, call Series.to_frame, then transpose using the shortcut .T:
series.to_frame().T
a b c
0 1 2 3

you can also try this:
a = pd.Series.to_frame(series)
a['id'] = list(a.index)
Explanation:
The 1st line convert the series into a single-column DataFrame.
The 2nd line add an column to this DataFrame with the value same as the index.

Try reset_index. It will convert your index into a column in your dataframe.
df = series.to_frame().reset_index()

This
pd.DataFrame([series]) #method 1
produces a slightly different result than
series.to_frame().T #method 2
With method 1, the elements in the resulted dataframe retain the same type. e.g. an int64 in series will be kept as an int64.
With method 2, the elements in the resulted dataframe become objects IF there is an object type element anywhere in the series. e.g. an int64 in series will be become an object type.
This difference may cause different behaviors in your subsequent operations depending on the version of pandas.

Related

What's the fastest way to select values from columns based on keys in another columns in pandas?

I need a fast way to extract the right values from a pandas dataframe:
Given a dataframe with (a lot of) data in several named columns and an additional columns whose values only contains names of the other columns, how do I select values from the data-columns with the additional columns as keys?
It's simple to do via an explicit loop, but this is extremely slow with something like .iterrows() directly on the DataFrame. If converting to numpy-arrays, it's faster, but still not fast. Can I combine methods from pandas to do it even faster?
Example: This is the kind of DataFrame structure, where columns A and B contain data and column keys contains the keys to select from:
import pandas
df = pandas.DataFrame(
{'A': [1,2,3,4],
'B': [5,6,7,8],
'keys': ['A','B','B','A']},
)
print(df)
output:
Out[1]:
A B keys
0 1 5 A
1 2 6 B
2 3 7 B
3 4 8 A
Now I need some fast code that returns a DataFrame like
Out[2]:
val_keys
0 1
1 6
2 7
3 4
I was thinking something along the lines of this:
tmp = df.melt(id_vars=['keys'], value_vars=['A','B'])
out = tmp.loc[a['keys']==a['variable']]
which produces:
Out[2]:
keys variable value
0 A A 1
3 A A 4
5 B B 6
6 B B 7
but doesn't have the right order or index. So it's not quite a solution.
Any suggestions?
See if either of these work for you
df['val_keys']= np.where(df['keys'] =='A', df['A'],df['B'])
or
df['val_keys']= np.select([df['keys'] =='A', df['keys'] =='B'], [df['A'],df['B']])
No need to specify anything for the code below!
def value(row):
a = row.name
b = row['keys']
c = df.loc[a,b]
return c
df.apply(value, axis=1)
Have you tried filtering then mapping:
df_A = df[df['key'].isin(['A'])]
df_B = df[df['key'].isin(['B'])]
A_dict = dict(zip(df_A['key'], df_A['A']))
B_dict = dict(zip(df_B['key'], df_B['B']))
df['val_keys'] = df['key'].map(A_dict)
df['val_keys'] = df['key'].map(B_dict).fillna(df['val_keys']) # non-exhaustive mapping for the second one
Your df['val_keys'] column will now contain the result as in your val_keys output.
If you want you can just retain that column as in your expected output by:
df = df[['val_keys']]
Hope this helps :))

Pandas - Python Remodel the Date Column

I have a date column like this in my pandas data-frame.
My DataFrame looks like this,
ID SerialDate
1 2008-1-15
2 T1
3 2008-1-17
4 T1
T1 is the only text that will be found in this column and there won't be any blanks.The dtype of this column is object
I need to change this to look like,
Expected dataframe :-
ID SerialDate
1 15/01/2008
2 T1
3 17/01/2008
4 T1
Expected dtype :-Object.
How can I do this using Pandas. I would prefer a user defined function something like df[colb] = requiredfunction(df,colb)
You can easily generate the required output with basic string operations and apply function of pandas
df['SerialDate'] = df['SerialDate'].apply(lambda x:'/'.join(x.split('-')[::-1]))
In your particular case, 'T1' is also not affected by this operation.
Explanation:
what does [::-1] do?
>> [1,2,3][::-1]
>> [3,2,1]
It reverses an array
To convert x -> 0x
df['SerialDate'] = df['SerialDate'].apply(lambda x:'/'.join([y.zfill(2) for y in x.split('-')[::-1]]))
Using to_datetime with strftime then we mask back with original column
df=pd.read_clipboard()
s=pd.to_datetime(df.SerialDate,errors = 'coerce').dt.strftime('%d/%m/%Y')
df.SerialDate=s.mask(s=='NaT',df.SerialDate)
df
Out[402]:
ID SerialDate
0 1 15/01/2008
1 2 T1
2 3 17/01/2008
3 4 T1

Create a column based on multiple column distinct count pandas [duplicate]

I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.
my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.
something like this isn't working:
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()
nor is
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)
this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):
df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)
in R this is easily done in data.table with
df[, n_unique_id := uniqueN(id), by = c('track', 'type')]
thanks!
df.groupby(['track', 'type'])['id'].transform(nunique)
Implies that there is a name nunique in the name space that performs some function. transform will take a function or a string that it knows a function for. nunique is definitely one of those strings.
As pointed out by #root, often the method that pandas will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True even for passing numpy functions in some cases.
For example transform('sum') should be preferred over transform(sum).
Try this instead
df.groupby(['track', 'type'])['id'].transform('nunique')
demo
df = pd.DataFrame(dict(
track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)
id track type
0 X 1 A
1 X 1 A
2 Y 1 A
3 Z 1 A
4 W 2 B
5 W 2 B
6 W 2 B
7 W 2 B
df.groupby(['track', 'type'])['id'].transform('nunique')
0 3
1 3
2 3
3 3
4 1
5 1
6 1
7 1
Name: id, dtype: int64

Key Error: 0 while searching in pandas dataframe [duplicate]

I have the dataframe shown below. I need to get the scalar value of column B, dependent on the value of A (which is a variable in my script). I'm trying the loc() function but it returns a Series instead of a scalar value. How do I get the scalar value()?
>>> x = pd.DataFrame({'A' : [0,1,2], 'B' : [4,5,6]})
>>> x
A B
0 0 4
1 1 5
2 2 6
>>> x.loc[x['A'] == 2]['B']
2 6
Name: B, dtype: int64
>>> type(x.loc[x['A'] == 2]['B'])
<class 'pandas.core.series.Series'>
First of all, you're better off accessing both the row and column indices from the .loc:
x.loc[x['A'] == 2, 'B']
Second, you can always get at the underlying numpy matrix using .values on a series or dataframe:
In : x.loc[x['A'] == 2, 'B'].values[0]
Out: 6
Finally, if you're not interested in the original question's "conditional indexing", there are also specific accessors designed to get a single scalar value from a DataFrame: dataframe.at[index, column] or dataframe.iat[i, j] (these are similar to .loc[] and .iloc[] but designed for quick access to a single value).
elaborating on #ShineZhang comment:
x.set_index('A').at[2, 'B']
6
pd.__version__
u'0.22.0'

Forcing pandas .iloc to return a single-row dataframe?

For programming purpose, I want .iloc to consistently return a data frame, even when the resulting data frame has only one row. How to accomplish this?
Currently, .iloc returns a Series when the result only has one row. Example:
In [1]: df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
In [2]: df
Out[2]:
a b
0 1 3
1 2 4
In [3]: type(df.iloc[0, :])
Out[3]: pandas.core.series.Series
This behavior is poor for 2 reasons:
Depending on the number of chosen rows, .iloc can either return a Series or a Data Frame, forcing me to manually check for this in my code
- .loc, on the other hand, always return a Data Frame, making pandas inconsistent within itself (wrong info, as pointed out in the comment)
For the R user, this can be accomplished with drop = FALSE, or by using tidyverse's tibble, which always return a data frame by default.
Use double brackets,
df.iloc[[0]]
Output:
a b
0 1 3
print(type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
Short for df.iloc[[0],:]
Accessing row(s) by label: loc
# Setup
df = pd.DataFrame({'X': [1, 2, 3], 'Y':[4, 5, 6]}, index=['a', 'b', 'c'])
df
X Y
a 1 4
b 2 5
c 3 6
To get a DataFrame instead of a Series, pass a list of indices of length 1,
df.loc[['a']]
# Same as
df.loc[['a'], :] # selects all columns
X Y
a 1 4
To select multiple specific rows, use
df.loc[['a', 'c']]
X Y
a 1 4
c 3 6
To select a contiguous range of rows, use
df.loc['b':'c']
X Y
b 2 5
c 3 6
Access row(s) by position: iloc
Specify a list of indices of length 1,
i = 1
df.iloc[[i]]
X Y
b 2 5
Or, specify a slice of length 1:
df.iloc[i:i+1]
X Y
b 2 5
To select multiple rows or a contiguous slice you'd use a similar syntax as with loc.
The double-bracket approach doesn't always work for me (e.g. when I use a conditional to select a timestamped row with loc).
You can, however, just add to_frame() to your operation.
>>> df = pd.DataFrame({'a':[1,2], 'b':[3,4]})
>>> df2 = df.iloc[0, :].to_frame().transpose()
>>> type(df2)
<class 'pandas.core.frame.DataFrame'>
please use the below options:
df1 = df.iloc[[0],:]
#type(df1)
df1
or
df1 = df.iloc[0:1,:]
#type(df1)
df1
For getting single row extraction from Dataframe use:
df_name.iloc[index,:].to_frame().transpose()
single_Sample1=df.iloc[7:10]
single_Sample1
[1]: https://i.stack.imgur.com/RHHDZ.png**strong text**

Categories