I need to display the values of a series in brackets ().
import pandas as pd
pd = pd.Series (['12.1','23.2','30.3', '40.0'])
print (pd)
0 12.1
1 23.2
2 30.3
3 40.0
dtype: object
OUTPUT should look like here:
0 (12.1)
1 (23.2)
2 (30.3)
3 (40.0)
Any suggestions?
I don't know why do you want to this but maybe you can add '(' ')' manually.
pd = pd.Series([f"({i})" for i in pd])
I'm not sure if this is exactly what you want, but I hope it helps.
This works
import pandas as pd
pd = pd.Series (['12.1','23.2','30.3', '40.0'])
print (pd)
0 12.1
1 23.2
2 30.3
3 40.0
dtype: object
pd=pd.apply(lambda i: f"({i})")
print(pd)
0 (12.1)
1 (23.2)
2 (30.3)
3 (40.0)
dtype: object
Related
I am having some hard time to get rid of a string from my pandas series. I'd like to remove the first two '-' strings but want to keep the last two number objectives. Example is below.
import pandas as pd
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
print(temp)
Out[135]:
0 -
1 -
2 -0.3
3 -0.9
dtype: object
Can't use temp.str.replace("-", "") since it removes the minus sign from the last two number objectives as well. Can anyone help me with this. Thanks in advance!
Use a regular expression:
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
print(temp.str.replace('^-$', '', regex=True))
Output
0
1
2 -0.3
3 -0.9
dtype: object
Or simply use replace:
print(temp.replace('-', '')) # notice that there is no .str
Output
0
1
2 -0.3
3 -0.9
dtype: object
You can convert strings to numbers:
pd.to_numeric(temp, errors='coerce').fillna('')
Output:
0
1
2 -0.3
3 -0.9
You can remove the unwanted string's like this:
import pandas as pd
temp = pd.Series(['-', '-', '-0.3', '-0.9'])
# this will drop the string that match '-'
new_temp= temp[temp != '-']
print(new_temp)
Output:
2 -0.3
3 -0.9
dtype: object
reference: Here
Here is my Dataframe:
df={'pack':[2,2,2,2], 'a_cost':[10.5,0,11,0], 'b_cost':[0,6,0,6.5]}
It should look like this:
At this point you will find that a_cost and b_cost columns have 0s where other column has a value. I would like my function to follow this logic...
for i in df.a_cost:
if i==0:
b_cost(column):value *(multiply) pack(column):value
replace 0 with this new multiplied value (example: 6.0*2=12)
for i in df_b.cost:
if i==0:
a_cost(column):value /(divide) pack(column):value
replace 0 with this new divided value (example: 10.5/2=5.25)
I can't figure out how to write this logic successfully... Here is the expected output:
Output in code:
df={'pack':[2,2,2,2], 'a_cost':[10.5,12.0,11,13.0], 'b_cost':[5.25,6,5.50,6.5]}
Help is really appreciated!
IIUC,
df.loc[df.a_cost.eq(0), 'a_cost'] = df.b_cost * df.pack
df.loc[df.b_cost.eq(0), 'b_cost'] = df.a_cost / df.pack
You can also play with mask and fillna:
df['a_cost'] = df.a_cost.mask(df.a_cost.eq(0)).fillna(df.b_cost * df.pack)
df['b_cost'] = df.b_cost.mask(df.b_cost.eq(0)).fillna(df.a_cost / df.pack)
Update as commented, you can use other in mask:
df['a_cost'] = df.a_cost.mask(df.a_cost.eq(0), other=df.b_cost * df.pack)
Also note that the second filtering is not needed once you already fill 0 in columns a_cost. That is, we can just do:
df['b_cost'] = df.a_cost / df.pack
after the first command in both methods.
Output:
pack a_cost b_cost
0 2 10.5 5.25
1 2 12.0 6.00
2 2 11.0 5.50
3 2 13.0 6.50
import numpy as np
df = pd.DataFrame({'pack':[2,2,2,2], 'a_cost':[10.5,0,11,0], 'b_cost':[0,6,0,6.5]})
df['a_cost'] = np.where(df['a_cost']==0, df['pack']*df['b_cost'], df['a_cost'])
df['b_cost'] = np.where(df['b_cost']==0, df['a_cost']/df['pack'], df['b_cost'])
print (df)
#pack a_cost b_cost
#0 2 10.5 5.25
#1 2 12.0 6.0
#2 2 11.0 5.50
#3 2 13.0 6.5
Try this:
df['a_pack'] = df.apply(lambda x: x['b_cost']*x['pack'] if x['a_cost'] == 0 and x['b_cost'] != 0 else x['a_cost'], axis = 1)
df['b_pack'] = df.apply(lambda x: x['a_cost']/x['pack'] if x['b_cost'] == 0 and x['a_cost'] != 0 else x['b_cost'], axis = 1)
I have a pandas dataframe with more than 50 columns. All the data except the 1st column is float. I want to replace any value greater than 5.75 with 100. Can someone advise any function to do the same.
The replace function is not working as to_value can only take "=" function, and not the greater than function.
This can be done using
df['ColumnName'] = np.where(df['ColumnName'] > 5.75, 100, df['First Season'])
You can make a custom function and pass it to apply:
import pandas as pd
import random
df = pd.DataFrame({'col_name': [random.randint(0,10) for x in range(100)]})
def f(x):
if x >= 5.75:
return 100
return x
df['modified'] = df['col_name'].apply(f)
print(df.head())
col_name modified
0 2 2
1 5 5
2 7 100
3 1 1
4 9 100
If you have a dataframe:
import pandas as pd
import random
df = pd.DataFrame({'first_column': [random.uniform(5,6) for x in range(10)]})
print(df)
Gives me:
first_column
0 5.620439
1 5.640604
2 5.286608
3 5.642898
4 5.742910
5 5.096862
6 5.360492
7 5.923234
8 5.489964
9 5.127154
Then check if the value is greater than 5.75:
df[df > 5.75] = 100
print(df)
Gives me:
first_column
0 5.620439
1 5.640604
2 5.286608
3 5.642898
4 5.742910
5 5.096862
6 5.360492
7 100.000000
8 5.489964
9 5.127154
import numpy as np
import pandas as pd
#Create df
np.random.seed(0)
df = pd.DataFrame(2*np.random.randn(100,50))
for col_name in df.columns[1:]: #Skip first column
df.loc[:,col_name][df.loc[:,col_name] > 5.75] = 100
np.where(df.value > 5.75, 100, df.value)
pandas.get_dummies emits a dummy variable per categorical value. Is there some automated, easy way to ask it to create only N-1 dummy variables? (just get rid of one "baseline" variable arbitrarily)?
Needed to avoid co-linearity in our dataset.
Pandas version 0.18.0 implemented exactly what you're looking for: the drop_first option. Here's an example:
In [1]: import pandas as pd
In [2]: pd.__version__
Out[2]: u'0.18.1'
In [3]: s = pd.Series(list('abcbacb'))
In [4]: pd.get_dummies(s, drop_first=True)
Out[4]:
b c
0 0.0 0.0
1 1.0 0.0
2 0.0 1.0
3 1.0 0.0
4 0.0 0.0
5 0.0 1.0
6 1.0 0.0
There are a number of ways of doing so.
Possibly the simplest is replacing one of the values by None before calling get_dummies. Say you have:
import pandas as pd
import numpy as np
s = pd.Series(list('babca'))
>> s
0 b
1 a
2 b
3 c
4 a
Then use:
>> pd.get_dummies(np.where(s == s.unique()[0], None, s))
a c
0 0 0
1 1 0
2 0 0
3 0 1
4 1 0
to drop b.
(Of course, you need to consider if your category column doesn't already contain None.)
Another way is to use the prefix argument to get_dummies:
pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False)
prefix: string, list of strings, or dict of strings, default None - String to append DataFrame column names Pass a list with length equal to the number of columns when calling get_dummies on a DataFrame. Alternativly, prefix can be a dictionary mapping column names to prefixes.
This will append some prefix to all of the resulting columns, and you can then erase one of the columns with this prefix (just make it unique).
imaging i have a series looks like this:
Out[64]:
2 0
3 1
80 1
83 1
84 2
85 2
how can i append an item at the very beginning of this series? the native pandas.Series.append function only appends at the end.
thanks a lot
There is a pandas.concat function...
import pandas as pd
a = pd.Series([2,3,4])
pd.concat([pd.Series([1]), a])
See the Merge, Join, and Concatenate documentation.
Using concat, or append, the resulting series will have duplicate indices:
for concat():
import pandas as pd
a = pd.Series([2,3,4])
pd.concat([pd.Series([1]), a])
Out[143]:
0 1
0 2
1 3
2 4
and for append():
import pandas as pd
a = pd.Series([2,3,4])
a.append(pd.Series([1]))
Out[149]:
0 2
1 3
2 4
0 1
This could be a problem in the future, since a[0] (if you assign the result to a) will return two values for either case.
My solutions are in this case:
import pandas as pd
a = pd.Series([2,3,4])
b = [1]
b[1:] = a
pd.Series(b)
Out[199]:
0 1
1 2
2 3
3 4
or, by reindexing with concat():
import pandas as pd
a = pd.Series([2,3,4])
a.index = a.index + 1
pd.concat([pd.Series([1]), a])
Out[208]:
0 1
1 2
2 3
3 4
In case you need to prepend a single value from a different Series b, say its last value, this is what works for me:
import pandas as pd
a = pd.Series([2, 3, 4])
b = pd.Series([0, 1])
pd.concat([b[-1:], a])
Similarly, you can use append with a list or tuple of series (so long as you're using pandas version .13 or greater)
import pandas as pd
a = pd.Series([2,3,4])
pd.Series.append((pd.Series([1]), a))