How to split string in every 6 digits in a certain column? - python

In the column 'id', I would like to split the string into every 6 digits and add a comma.
df = pd.pivot_table(df, values=['id'], index=['eu_crm'], aggfunc='sum')
df.loc[df[:, 1] for i in range(0, len(['id'], 6)

Code:
import pandas as pd
data = {'Column 1': ['a', 'b', 'c'],
'id': [2468938493843983, 345642232, 23343433]}
df = pd.DataFrame(data)
df['id'] = df['id'].astype(str)
df['fromleft'] = [','.join([df['id'][i][j:j+6] for j in range(0, len(df['id'][i]), 6)]) for i in range(len(df))]
print(df)
Output:
Column 1 id fromleft
0 a 2468938493843983 246893,849384,3983
1 b 345642232 345642,232
2 c 23343433 233434,33

Assuming you want to split from the left:
df['id'] = df['id'].astype(str).str.replace(r'(.{6})(?=.)', r'\1,', regex=True)
Output:
id
0 280530,284442,284690

Related

How can I remove string after last underscore in python dataframe?

I want to remove the all string after last underscore from the dataframe. If I my data in dataframe looks like.
AA_XX,
AAA_BB_XX,
AA_BB_XYX,
AA_A_B_YXX
I would like to get this result
AA,
AAA_BB,
AA_BB,
AA_A_B
You can do this simply using Series.str.split and Series.str.join:
In [2381]: df
Out[2381]:
col1
0 AA_XX
1 AAA_BB_XX
2 AA_BB_XYX
3 AA_A_B_YXX
In [2386]: df['col1'] = df['col1'].str.split('_').str[:-1].str.join('_')
In [2387]: df
Out[2387]:
col1
0 AA
1 AAA_BB
2 AA_BB
3 AA_A_B
pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX']})['col'].apply(lambda r: '_'.join(r.split('_')[:-1]))
Explaination:
df = pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX']})
Creates
col
0 AA_XX
1 AAA_BB_XX
2 AA_BB_XYX
3 AA_A_B_YXX
Use apply in order to loop through the column you want to edit.
I broke the string at _ and then joined all parts leaving the last part at _
df['col'] = df['col'].apply(lambda r: '_'.join(r.split('_')[:-1]))
print(df)
Results:
col
0 AA
1 AAA_BB
2 AA_BB
3 AA_A_B
If your dataset contains values like AA (values without underscore).
Change the lambda like this
df = pd.DataFrame({'col': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX', 'AA']})
df['col'] = df['col'].apply(lambda r: '_'.join(r.split('_')[:-1]) if len(r.split('_')) > 1 else r)
print(df)
Here is another way of going about it.
import pandas as pd
data = {'s': ['AA_XX', 'AAA_BB_XX', 'AA_BB_XYX', 'AA_A_B_YXX']}
df = pd.DataFrame(data)
def cond1(s):
temp_s = s.split('_')
temp_len = len(temp_s)
if len(temp_s) == 1:
return temp_s
else:
return temp_s[:len(temp_s)-1]
df['result'] = df['s'].apply(cond1)

Python dictionary conversion

I have a Python dictionary:
adict = {
'col1': [
{'id': 1, 'tag': '#one#two'},
{'id': 2, 'tag': '#two#'},
{'id': 1, 'tag': '#one#three#'}
]
}
I want the result as follows:
Id tag
1 one,two,three
2 two
Could someone please tell me how to do this?
Try this
import pandas as pd
d={'col1':[{'id':1,'tag':'#one#two'},{'id':2,'tag':'#two#'},{'id':1,'tag':'#one#three#'}]}
df = pd.DataFrame()
for i in d:
for k in d[i]:
t = pd.DataFrame.from_dict(k, orient='index').T
t["tag"] = t["tag"].str.replace("#",",")
df = pd.concat([df,t])
tf = df.groupby(["id"])["tag"].apply(lambda x : ",".join(set(''.join(list(x)).strip(",").split(","))))
Here is a simple code
import pandas as pd
d = {'col1':[{'id':1,'tag':'#one#two'},{'id':2,'tag':'#two#'},{'id':1,'tag':'#one#three#'}]}
df = pd.DataFrame(d)
df['Id'] = df.col1.apply(lambda x: x['id'])
df['tag'] = df.col1.apply(lambda x: ''.join(list(','.join(x['tag'].split('#')))[1:]))
df.drop(columns = 'col1', inplace = True)
Output:
Id Tag
1 one, two
2 two
1 one, three
If order of tags is important first remove trailing # and split by #, then per groups remove duplicates and join:
df = pd.DataFrame(d['col1'])
df['tag'] = df['tag'].str.strip('#').str.split('#')
f = lambda x: ','.join(dict.fromkeys([z for y in x for z in y]).keys())
df = df.groupby('id')['tag'].apply(f).reset_index()
print (df)
id tag
0 1 one,two,three
1 2 two
If order of tags is not important for remove duplicates use sets:
df = pd.DataFrame(d['col1'])
df['tag'] = df['tag'].str.strip('#').str.split('#')
f = lambda x: ','.join(set([z for y in x for z in y]))
df = df.groupby('id')['tag'].apply(f).reset_index()
print (df)
id tag
0 1 three,one,two
1 2 two
I tried as below
import pandas as pd
a = {'col1':[{'id':1, 'tag':'#one#two'},{'id':2, 'tag':'#two#'},{'id':1, 'tag':'#one#three#'}]}
df = pd.DataFrame(a)
df[["col1", "col2"]] = pd.DataFrame(df.col1.values.tolist(), index = df.index)
df['col1'] = df.col1.str.replace('#', ',')
df = df.groupby(["col2"])["col1"].apply(lambda x : ",".join(set(''.join(list(x)).strip(",").split(","))))
O/P:
col2
1 one,two,three
2 two
dic=[{'col1':[{'id':1,'tag':'#one#two'},{'id':2,'tag':'#two#'},{'id':1,'tag':'#one#three#'}]}]
row=[]
for key in dic:
data=key['col1']
for rows in data:
row.append(rows)
df=pd.DataFrame(row)
print(df)
o

get a column count without "NaN"

How do I count the number of columns without NaN ?
import pandas as pd
df = pd.DataFrame(columns = ['name', 'age', 'favorite_color', 'grade','NaN'])
print(len(df.columns))
Output: 5
But looking Output: 4
How do I count the number of columns without NaN
Few additional ways are:
Method1 :
len(df.columns ^ ['NaN'])
Method2 pd.Index.difference:
len(df.columns.difference(['NaN']))
Method3 set: (almost same as 2)
len(set(df.columns)-{'NaN'})
Try this
res = [x for x in df.columns if str(x) not in ['nan','NaN','NAN'] ]
print(res)
#['name', 'age', 'favorite_color', 'grade']
print(len(res)
#4
if you mean to drop columns whose name is np.nan or NaN, respectively; try this:
df = df.loc[:, df.columns.notnull()]
df = df.loc[:, (filter(lambda x: x != 'NaN',df.columns))]
print(len(df.columns))
On the other side, you maight want to drop columns that include Nan:
# drop the columns where at least one element is NaN:
df.dropna(axis=1)
# drop the columns where all elements are NaN:
df.dropna(axis=1, how='all')
Use basic python to get the result:-
import pandas as pd
df = pd.DataFrame(columns = ['name', 'age', 'favorite_color', 'grade','NaN'])
col = df.columns
count = 0
for var in col:
if var not in ['NaN', 'nan']:
count+=1
print(f"The length of columns without NaN values is\t {count} ")
Output:
The length of columns without NaN values is 4

How to modify cells in a pandas DataFrame?

I need to change individual elements in a DataFrame. I tried doing something like this, but it doesn't work:
for index, row in df.iterrows():
if df.at[row, index] == 'something':
df.at[row, index] = df.at[row, index] + 'add a string'
else:
df.at[row, index] = df.at[row, index] + 'add a value'
How can I do that?
If need modify all columns in DataFrame use numpy.where with DataFrame constructor, because where return numpy array:
df = pd.DataFrame(np.where(df == 'something', df + 'add a string', df + 'add a value'),
index=df.index,
columns=df.columns)
If only one column col:
df['col'] = np.where(df['col'] == 'something',
df['col'] + 'add a string',
df['col'] + 'add a value')
Sample:
df = pd.DataFrame({'col': ['a', 'b', 'a'], 'col1': ['a', 'b', 'b']})
print (df)
col col1
0 a a
1 b b
2 a b
df = pd.DataFrame(np.where(df == 'a', df + 'add a string', df + 'add a value'),
index=df.index,
columns=df.columns)
print (df)
col col1
0 aadd a string aadd a string
1 badd a value badd a value
2 aadd a string badd a value
df['col'] = np.where(df['col'] == 'a',
df['col'] + 'add a string',
df['col'] + 'add a value')
print (df)
col col1
0 aadd a string a
1 badd a value b
2 aadd a string b
You can use .ix and apply a function like this:
import pandas as pd
D = pd.DataFrame({'A': ['a', 'b', 3,7,'b','a'], 'B': ['a', 'b', 3,7,'b','a']})
D.ix[D.index%2 == 0,'A'] = D.ix[D.index%2 == 0,'A'].apply(lambda s: s+'x' if isinstance(s,str) else s+1)
D.ix[D.index[2:5],'B'] = D.ix[D.index[2:5],'B'].apply(lambda s: s+'y' if isinstance(s,str) else s-1)
First example appends x to each string or alternatively adds 1 to each non-string on column A for every even index.
The second example appends y to each string or alternatively subtracts 1 from each non-string on column B for the indices 2,3,4.
Original Frame:
A B
0 a a
1 b b
2 3 3
3 7 7
4 b b
5 a a
Modified Frame:
A B
0 ax a
1 b b
2 4 2
3 7 6
4 bx by
5 a a

move column in pandas dataframe

I have the following dataframe:
a b x y
0 1 2 3 -1
1 2 4 6 -2
2 3 6 9 -3
3 4 8 12 -4
How can I move columns b and x such that they are the last 2 columns in the dataframe? I would like to specify b and x by name, but not the other columns.
You can rearrange columns directly by specifying their order:
df = df[['a', 'y', 'b', 'x']]
In the case of larger dataframes where the column titles are dynamic, you can use a list comprehension to select every column not in your target set and then append the target set to the end.
>>> df[[c for c in df if c not in ['b', 'x']]
+ ['b', 'x']]
a y b x
0 1 -1 2 3
1 2 -2 4 6
2 3 -3 6 9
3 4 -4 8 12
To make it more bullet proof, you can ensure that your target columns are indeed in the dataframe:
cols_at_end = ['b', 'x']
df = df[[c for c in df if c not in cols_at_end]
+ [c for c in cols_at_end if c in df]]
cols = list(df.columns.values) #Make a list of all of the columns in the df
cols.pop(cols.index('b')) #Remove b from list
cols.pop(cols.index('x')) #Remove x from list
df = df[cols+['b','x']] #Create new dataframe with columns in the order you want
For example, to move column "name" to be the first column in df you can use insert:
column_to_move = df.pop("name")
# insert column with insert(location, column_name, column_value)
df.insert(0, "name", column_to_move)
similarly, if you want this column to be e.g. third column from the beginning:
df.insert(2, "name", column_to_move )
You can use to way below. It's very simple, but similar to the good answer given by Charlie Haley.
df1 = df.pop('b') # remove column b and store it in df1
df2 = df.pop('x') # remove column x and store it in df2
df['b']=df1 # add b series as a 'new' column.
df['x']=df2 # add b series as a 'new' column.
Now you have your dataframe with the columns 'b' and 'x' in the end. You can see this video from OSPY : https://youtu.be/RlbO27N3Xg4
similar to ROBBAT1's answer above, but hopefully a bit more robust:
df.insert(len(df.columns)-1, 'b', df.pop('b'))
df.insert(len(df.columns)-1, 'x', df.pop('x'))
This function will reorder your columns without losing data. Any omitted columns remain in the center of the data set:
def reorder_columns(columns, first_cols=[], last_cols=[], drop_cols=[]):
columns = list(set(columns) - set(first_cols))
columns = list(set(columns) - set(drop_cols))
columns = list(set(columns) - set(last_cols))
new_order = first_cols + columns + last_cols
return new_order
Example usage:
my_list = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth']
reorder_columns(my_list, first_cols=['fourth', 'third'], last_cols=['second'], drop_cols=['fifth'])
# Output:
['fourth', 'third', 'first', 'sixth', 'second']
To assign to your dataframe, use:
my_list = df.columns.tolist()
reordered_cols = reorder_columns(my_list, first_cols=['fourth', 'third'], last_cols=['second'], drop_cols=['fifth'])
df = df[reordered_cols]
Simple solution:
old_cols = df.columns.values
new_cols= ['a', 'y', 'b', 'x']
df = df.reindex(columns=new_cols)
An alternative, more generic method;
from pandas import DataFrame
def move_columns(df: DataFrame, cols_to_move: list, new_index: int) -> DataFrame:
"""
This method re-arranges the columns in a dataframe to place the desired columns at the desired index.
ex Usage: df = move_columns(df, ['Rev'], 2)
:param df:
:param cols_to_move: The names of the columns to move. They must be a list
:param new_index: The 0-based location to place the columns.
:return: Return a dataframe with the columns re-arranged
"""
other = [c for c in df if c not in cols_to_move]
start = other[0:new_index]
end = other[new_index:]
return df[start + cols_to_move + end]
You can use pd.Index.difference with np.hstack, then reindex or use label-based indexing. In general, it's a good idea to avoid list comprehensions or other explicit loops with NumPy / Pandas objects.
cols_to_move = ['b', 'x']
new_cols = np.hstack((df.columns.difference(cols_to_move), cols_to_move))
# OPTION 1: reindex
df = df.reindex(columns=new_cols)
# OPTION 2: direct label-based indexing
df = df[new_cols]
# OPTION 3: loc label-based indexing
df = df.loc[:, new_cols]
print(df)
# a y b x
# 0 1 -1 2 3
# 1 2 -2 4 6
# 2 3 -3 6 9
# 3 4 -4 8 12
You can use movecolumn package in Python to move columns:
pip install movecolumn
Then you can write your code as:
import movecolumn as mc
mc.MoveToLast(df,'b')
mc.MoveToLast(df,'x')
Hope that helps.
P.S : The package can be found here. https://pypi.org/project/movecolumn/
You can also do this as a one-liner:
df.drop(columns=['b', 'x']).assign(b=df['b'], x=df['x'])
This will move any column to the last column :
Move any column to the last column of dataframe :
df= df[ [ col for col in df.columns if col != 'col_name_to_moved' ] + ['col_name_to_moved']]
Move any column to the first column of dataframe:
df= df[ ['col_name_to_moved'] + [ col for col in df.columns if col != 'col_name_to_moved' ]]
where col_name_to_moved is the column that you want to move.
I use Pokémon database as an example, the columns for my data base are
['Name', '#', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary']
Here is the code:
import pandas as pd
df = pd.read_html('https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6')[0]
cols = df.columns.to_list()
cos_end= ["Name", "Total", "HP", "Defense"]
for i, j in enumerate(cos_end, start=(len(cols)-len(cos_end))):
cols.insert(i, cols.pop(cols.index(j)))
print(cols)
df = df.reindex(columns=cols)
print(df)

Categories