I have a dataframe with some numbers (or strings, it doesn't actually matter). The thing is that I need to add a character in the middle of them. The dataframe looks like this (I got it from Google Takeout)
id A B
1 512343 -1234
1 213 1231345
1 18379 187623
And I want to add a comma in the second position
id A B
1 51,2343 -12,34
1 21,3 12,31345
1 18,379 18,7623
A and B are actually longitude and latitude so I think it is not possible to achieve to add the comma in the right place since there is no way to know if a number is supposed to have one or two digits as coordinates, but it would do the trick if I can put the comma on the second position.
This should do the trick:
df[["A", "B"]]=df[["A", "B"]].astype(str).replace(r"(\d{2})(\d+)", r"\1,\2", regex=True)
Outputs:
id A B
0 1 51,2343 -12,34
1 1 21,3 12,31345
2 1 18,379 18,7623
Here's another approach with str.extract:
for c in ['A','B']:
df[c] = df[c].astype(str).str.extract('(-?\d{2})(\d*)').agg(','.join,axis=1)
Output:
id A B
0 1 51,2343 -12,34
1 1 21,3 12,31345
2 1 18,379 18,7623
You could do something like this -
import numpy as np
df['A'] = np.where(df['A']>=0,'', '-') + ( df['A'].abs().astype(str).str[:2] + ',' + df['A'].abs().astype(str).str[2:] )
df['B'] = np.where(df['B']>=0,'', '-') + ( df['B'].abs().astype(str).str[:2] + ',' + df['B'].abs().astype(str).str[2:] )
df
id A B
0 1 51,2343 -12,34
1 1 21,3 12,31345
2 1 18,379 18,7623
Related
I would like to add a string at the beginning of each row- either positive or negative - depending on the value in the columns:
I keep getting ValueError, as per screenshot
For a generic method to handle any number of columns, use pandas.from_dummies:
cols = ['positive', 'negative']
user_input_1.index = (pd.from_dummies(user_input_1[cols]).squeeze()
+'_'+user_input_1.index
)
Example input:
Score positive negative
A 1 1 0
B 2 0 1
C 3 1 0
Output:
Score positive negative
positive_A 1 1 0
negative_B 2 0 1
positive_C 3 1 0
Use Series.map for prefixes by conditions and add to index:
df.index = df['positive'].eq(1).map({True:'positive_', False:'negative_'}) + df.index
Or use numpy.where:
df.index = np.where(df['positive'].eq(1), 'positive_','negative_') + df.index
If I have a cell containing 2 characters and sometimes 3.
I need to format the cell-like:
<2spaces>XX<2spaces>
and if contains 3 characters:
<2spaces>XXX<1space>.
I use a new-style format
dx['C'] = dx['C'].map('{:^4s}'.format)
Note: dx['C'] is a column in pandas table.
Given:
C
0 aaa
1 aa
Doing:
df.C = df.C.str.center(6) if len(df.C)%2 else (' ' + df.C).str.center(6)
Output:
C
0 aaa
1 aa
Let's say I have the following DataFrame:
df = pd.DataFrame({"my_col": ["one","two","two","one","two","one","one"]})
my_col
0 one
1 two
2 two
3 one
4 two
5 one
6 one
I would like append a string on the duplicated values with their duplicate count. Here is what I mean:
my_col
0 one_0
1 two_0
2 two_1
3 one_1
4 two_2
5 one_2
6 one_3
I know I could do something like df.groupby('my_col').apply(my_function_to_do_this) with something like this :
def my_function_to_do_this(group: pd.DataFrame) -> pd.DataFrame:
str_to_append = pd.Series(range(group.shape[0]), index=group.index).astype(str)
group["my_col"] += "_" + str_to_append
return group
but that's quite slow on a large DataFrame with a lot of small groups of like 4 rows maximum.
I'm trying to find a faster approach if any.
Many thanks in advance for the help !
Use GroupBy.cumcount for counter, convert to strings and add to original with Series.str.cat:
df['my_col'] = df['my_col'].str.cat(df.groupby('my_col').cumcount().astype(str), sep='_')
print (df)
my_col
0 one_0
1 two_0
2 two_1
3 one_1
4 two_2
5 one_2
6 one_3
Or join by +:
df['my_col'] += '_' + df.groupby('my_col').cumcount().astype(str)
#longer version
#df['my_col'] = df['my_col'] + '_' + df.groupby('my_col').cumcount().astype(str)
I have the following dataframe.
import pandas as pd
data=['ABC1','ABC2','ABC3','ABC4']
data = pd.DataFrame(data,columns=["Column A"])
Column A
0 ABC1
1 ABC2
2 ABC3
3 ABC4
How to insert "-" a ABC on column A of data?
Output:
Column A
0 ABC-1
1 ABC-2
2 ABC-3
3 ABC-4
The Simplest solution to Use replace method as a regex and inplace method to make it permanent in the dataframe.
>>> data['Column A'].replace(['ABC'], 'ABC-', regex=True, inplace=True)
print(data)
Column A
0 ABC-1
1 ABC-2
2 ABC-3
3 ABC-4
A possible solution is
data['Column A'] = data['Column A'].str[:-1] + '-' + data['Column A'].str[-1]
print (data)
# Column A
#0 ABC-1
#1 ABC-2
#2 ABC-3
#3 ABC-4
Here's a way which only assumes that the numbers to be preceded by a dash are at the end:
df['ColumnA'].str.split('([A-z]+)(\d+)').str.join('-').str.strip('-')
0 ABC-1
1 ABC-2
2 ABC-3
3 ABC-4
Another example:
df = pd.DataFrame({'ColumnA':['asf1','Ads2','A34']})
Will give:
df['ColumnA'].str.split('([A-z]+)(\d+)').str.join('-').str.strip('-')
0 asf-1
1 Ads-2
2 A-34
I need to group by and then return the values of a column in a concatenated form. While I have managed to do this, the returned dataframe has a column name 0. Just 0. Is there a way to specify what the results will be.
all_columns_grouped = all_columns.groupby(['INDEX','URL'], as_index = False)['VALUE'].apply(lambda x: ' '.join(x)).reset_index()
The resulting groupby object has the headers
INDEX | URL | 0
The results are in the 0 column.
While I have managed to rename the column using
.rename(index=str, columns={0: "variant"}) this seems very in elegant.
Any way to provide a header for the column? Thanks
The simpliest is remove as_index = False for return Series and add parameter name to reset_index:
Sample:
all_columns = pd.DataFrame({'VALUE':['a','s','d','ss','t','y'],
'URL':[5,5,4,4,4,4],
'INDEX':list('aaabbb')})
print (all_columns)
INDEX URL VALUE
0 a 5 a
1 a 5 s
2 a 4 d
3 b 4 ss
4 b 4 t
5 b 4 y
all_columns_grouped = all_columns.groupby(['INDEX','URL'])['VALUE'] \
.apply(' '.join) \
.reset_index(name='variant')
print (all_columns_grouped)
INDEX URL variant
0 a 4 d
1 a 5 a s
2 b 4 ss t y
You can use agg when applied to a column (VALUE in this case) to assign column names to the result of a function.
# Sample data (thanks #jezrael)
all_columns = pd.DataFrame({'VALUE':['a','s','d','ss','t','y'],
'URL':[5,5,4,4,4,4],
'INDEX':list('aaabbb')})
# Solution
>>> all_columns.groupby(['INDEX','URL'], as_index=False)['VALUE'].agg(
{'variant': lambda x: ' '.join(x)})
INDEX URL variant
0 a 4 d
1 a 5 a s
2 b 4 ss t y