String Manipulation in Dataframe [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Hey guys I have a quick question regarding the string manipulation in pandas dataframe.
Suppose we have 2 columns looks like this:
Question:
How I can keep only the string part for each cell and delete the [' ']?
Thank you so much for your help! I am looking forward to hearing your brilliant idea!

Please use regex to replace all non alphanumeric characters
print(df)
State City
0 ['AK'] ['Yakutat']
1 ['AK'] ['Apache']
Solution
df=df.replace(regex='[^\w]',value='')
print(df)
State City
0 AK Yakutat
1 AK Apache

Depends if the values in each of your cells are strings with brackets "['AK']" or actual lists: ['AK'].
If they are strings with brackets on either side, we can strip bracket characters from both sides:
df["State"] = df["State"].str.strip("[]")
df["City"] = df["City"].str.strip("[]")
If they are lists with you can join them with a comma to turn them into a string
df["State"] = df["State"].str.join(", ")
df["City"] = df["City"].str.join(", ")

You can do the following:
df['City']=df['City'].apply(lambda x: x[2:-2])
df['State']=df['State'].apply(lambda x: x[2:-2])

Related

Extraction of data from a delimited string in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a string variable which has some data as shown below:
'From\tTo\nA0A3Q8IUE6\t13392634\nA4I9M8\t5072523\nE9BQL4\t13392634\nQ4Q3E9\t5654813\nE9B4M7\t13452251\nA0A088S7I8\t22574266\nA4HAG8\t5414882\nA0A3P3Z499\t5414882'
The data basically has two columns 'From' and 'To'. How do I extract the entries from the 'To' column in python?
You can use split, and then extract the data from the odd indexes, like so:
data = 'From\tTo\nA0A3Q8IUE6\t13392634\nA4I9M8\t5072523\nE9BQL4\t13392634\nQ4Q3E9\t5654813\nE9B4M7\t13452251\nA0A088S7I8\t22574266\nA4HAG8\t5414882\nA0A3P3Z499\t5414882'
print(data)
data = data.split()
to = [data[i] for i in range(3, len(data), 2)]
print(to)
In python you could split a string at specific chars, in your case \n delimits the row and \t delimits the column
something like this should work:
string='From\tTo\nA0A3Q8IUE6\t13392634\nA4I9M8\t5072523\nE9BQL4\t13392634\nQ4Q3E9\t5654813\nE9B4M7\t13452251\nA0A088S7I8\t22574266\nA4HAG8\t5414882\nA0A3P3Z499\t5414882'
f=[]
t=[]
for row in string.split("\n")[1:]:
fr,to=row.split("\t")
f.append(fr)
t.append(to)
print(f,t)

how to find all the text in a column where text without white space [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have data frame which has a column text it contain many text.I want to display only the text without white space.
eg.,
'hi how are you' - I dont want this
'good' - want this.
final column look like
hi
good
You can use str.contains:
In [512]: df
Out[512]:
text
0 hi how are you
1 good
2 hello friends
## check for whitespaces and take a `not` of it
In [516]: df = df[~df.text.str.contains(' ')]
In [516]: df
Out[516]:
text
1 good

How to replace a substring with another string in a column in pandas [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a column as below:
df['name']
1 react trainer
2 react trainers
3 react trainer's
I need to replace the string trainers/trainer's to trainer:
1 react trainer
2 react trainer
3 react trainer
df['name'].str.replace('trainer(\'?s)*', 'trainer')
df['name'].str.replace(to_replace ='trainer.*', value = 'trainer', regex = True)
If you want to both find rows with this substring and return the substring itself, you should use regex. I have that below
import re
def return_substr(string,substring)
#Check substring
if re.search(substring,string):
return(substring)
#If not found, return ‘’
else:
return(‘’)
#Use apply to run this row by row
df[‘replaceCol’] = df[‘Name’].apply(lambda x: return_substr(x,’react trainer’), axis=1)

Efficient way of parsing string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
How would you turn a string that looks like this
7.11,8,9:00,9:15,14:30,15:00
into this dictionary entry
{7.11 : [8, (9:00, 9:15), (14:30, 15:00)]}?
Suppose that the number of time pairs (such as 9:00,9:15 and 14:30,15:00 is unknown and you want to have them all as tuple pairs.
First split the string at the commas, then zip cluster starting from the 3rd element and put it into a dictionary:
s = "7.11,8,9:00,9:15,14:30,15:00"
ss = s.split(',')
d = {ss[0]: [ss[1]] + list(zip(*[iter(ss[2:])]*2))}
Output:
{'7.11': ['8', ('9:00', '9:15'), ('14:30', '15:00')]}
If you need to convert it from string to appropiate data types (you'll have to adapt it according to your needs), then after getting the ss list:
time_list = [datetime.datetime.strptime(t,'%H:%M').time() for t in ss[2:]]
d = {float(ss[0]): [int(ss[1])] + list(zip(*[iter(time_list)]*2))}

How to read data without specific symbol in python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
My dataset looks like following. I am trying to read numbers in "per" column without reading "%" symbol.Being a beginner in python,I was wondering if we can do such in python. Also, if you could provide the explanation that will be great!
State Year per
A 1990 6.10%
A 1989 4.50%
B 1990 3.4%
B 1989 1.25%
Thanks in advance,
In case it is a csv file, this should help (or there might be another way to get a dataframe):
import pandas as pd
data = pd.read_csv("somefile.csv")
data["per"] = data["per"].str.replace("%", "").to_numeric()
Your file type doesn't matter for this and no modules required. It works by taking each row and going to the last word. Then it splits the percentage and removes the percent symbol.
def readFile(filename):
percents = []
with open (filename,"r") as f:
for row in f:#for each line, we remove the first one late
splitRow = row.split()[-1]# spliting the elements by word, we want the last one only
percent = splitRow
percent = percent.split("%")[0]#removing the percent
percents.append(percent)#if you want it as an number instead of a string do percents.append(float(percent))
percents = percents[1:] # removes the header "per"
return percents

Categories