Remove spaces from strings in pandas DataFrame not working

Remove spaces from strings in pandas DataFrame not working - python

Trying to remove spaces from a column of strings in pandas dataframe. Successfully did it using this method in other section of code.
for index, row in summ.iterrows():
row['TeamName'] = row['TeamName'].replace(" ", "")
summ.head() shows no change made to the column of strings after this operation, however no error as well.
I have no idea why this issue is happening considering I used this exact same method later in the code and accomplished the task successfully.

Why not use str.replace:
df["TeamName"] = df["TeamName"].str.replace(r' ', '', regex=False)

I may be proven wrong here, but I am wondering if its because you are iterating over it, and maybe working on a copy that isn't changing the data. From pandas.DataFrame.iterrows documentation, this is what I found there:
"You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect."
just a thought... hth

Related

How to avoid Pandas throwing a SettingCopyWarning when transforming a subset of a series?

mask = ~df.bar.isna()
df.bar.loc[mask] = df.bar.loc[mask].map(f)
This sets off a SettingCopyWarning, though I am using loc.
I am aware of df.mask, but this will not work either as the column contains missing values that throw errors when the mapping function is applied to it .

You get SettingCopyWarning because Pandas cannot be sure if you want to manipulate dataframe via a reference, or if you want to manipulate a copy of the dataframe. So try to add .copy() at the end of the second line and see if the warning goes away. Sometimes the cause of the warning is actually in the code somewhere a few lines before where you get the it.

Is there a way to find a substring in a DataFrame?

Well, I got this problem:
I have a pandas DataFrame and I'm trying to find a the value that starts with "THRL-" and delete that exact same prefix, I've tried to make it a string and use the result = df.toString() method as it follows (Where result is a DataFrame):
a = result.replace('THRL-', '')
But it doesn't work, I still see the same THRL- prefix in the string that I'm returning.
Is there a better way to do it? I also tried with a dictionary but it didn't seem to work because apparently the method .to_dict() returns a list instead of a dictionary

not able to change object to float in pandas dataframe

just started learning python. trying to change a columns data type from object to float to take out the mean. I have tried to change [] to () and even the "". I dont know whether it makes a difference or not. Please help me figure out what the issue is. thanks!!
My code:
df["normalized-losses"]=df["normalized-losses"].astype(float)
error which i see: attached as imageenter image description here

Use:
df['normalized-losses'] = df['normalized-losses'][~(df['normalized-losses'] == '?' )].astype(float)
Using df.normalized-losses leads to interpreter evaluating df.normalized which doesn't exist. The statement you have written executes (df.normalized) - (losses.astype(float)).There appears to be a question mark in your data which can't be converted to float.The above statement converts to float only those rows which don't contain a question mark and drops the rest.If you don't want to drop the columns you can replace them with 0 using:
df['normalized-losses'] = df['normalized-losses'].replace('?', 0.0)
df['normalized-losses'] = df['normalized-losses'].astype(float)

Welcome to Stack Overflow, and good luck on your Python journey! An important part of coding is learning how to interpret error messages. In this case, the traceback is quite helpful - it is telling you that you cannot call normalized after df, since a dataframe does not have a method of this name.
Of course you weren't trying to call something called normalized, but rather the normalized-losses column. The way to do this is as you already did once - df["normalized-losses"].
As to your main problem - if even one of your values can't be converted to a float, the columnwide operation will fail. This is very common. You need to first eliminate all of the non-numerical items in the column, one way to find them is with df[~df['normalized_losses'].str.isnumeric()].

The "df.normalized-losses" does not signify anything to python in this case. you can replace it with df["normalized-losses"]. Usually, if you try
df["normalized-losses"]=df["normalized-losses"].astype(float)
This should work. What this does is, it takes normalized-losses column from dataframe, converts it to float, and reassigns it to normalized column in the same dataframe. But sometimes it might need some data processing before you try the above statement.

You can't use - in an attribute or variable name. Perhaps you mean normalized_losses?

python dataframe lowercase slicing

I have a dataframe. I want to slice it by checking if the value contains a string. For example, this code works:
data_df[data_df['column1'].str.contains('test')]
But I first want to set my column1 to be all lowercase first. So being the n00b that I am, I tried:
data_df[data_df['column1'].lower().str.contains('test')]
Of course the Python gods gave me no mercy and gave me an AttributeError. Any tips on how I can slice a dataframe based on a substring but first make everything into lowercase first?
I feel like the following post is very close to my answer but I can't get it to work exactly how I described up there:
Python pandas dataframe slicing, with if condition
Thanks Python pros!!!

Try using apply()
data_df[data_df['column1'].apply(str.lower).str.contains('test')]

You can drop the apply:
data_df[data_df['column1'].str.lower().str.contains('test')]

How do I make a dynamically expanding array in python

Ok I have this part of code:
def Reading_Old_File(self, Path, turn_index, SKU):
print "Reading Old File! Turn Index = ", turn_index, "SKU= ", SKU
lenght_of_array=0
array_with_data=[]
if turn_index==1:
reading_old_file = open(Path,'rU')
data=np.genfromtxt(reading_old_file, delimiter="''", dtype=None)
for index, line_in_data in enumerate(data, start=0):
if index<3:
print index, "Not Yet"
if index>=3:
print ">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Reading All Old Items"
i=index-3
old_items_data[i]=line_in_data.split("\t")
old_items_data[i]=[lines_old.strip()for lines_old in old_items_data]
print old_items_data[i]
print len(old_items_data)
So what I am doing here is, I'm reading a file, on my first turn, I want to read it all, and keep all data, so it would be something like:
old_items_data[1]=['123','dog','123','dog','123','dog']
old_items_data[2]=['124','cat','124','cat','124','cat']
old_items_data[n]=['amount of list members is equal each time']
each line of the file should be stored in list, so I can use it in future for comparing, when turn_index will be greater than 2 I'll compare coming line with lines in every list(array) by iterating over all lists.
So question is how do I do it, or is there any better way to compare lists?
I'm new to python so maybe someone could help me with this issue?
Thanks

You just need to use append.
old_items_data.append(line_in_data.split("\t"))

I would use the package pandas for this. It will not only be much quicker, but also simpler. Use pandas.read_table to import the data (specifying delimiter and row-skipping can be done here by passing arguments to sep and skiprows). Then, use pandas.DataFrame.apply to apply your function to the rows of your data.
The speed gains are going to come from the fact that pandas was optimized to perform actions across lists like this (in the case of a pandas DataFrame, these would be called rows). This applies to both importing the data and applying a function to every row. The simplicity gains should hopefully be clear.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove spaces from strings in pandas DataFrame not working - python

Why not use str.replace: df["TeamName"] = df["TeamName"].str.replace(r' ', '', regex=False)

Related

How to avoid Pandas throwing a SettingCopyWarning when transforming a subset of a series?

Is there a way to find a substring in a DataFrame?

not able to change object to float in pandas dataframe

python dataframe lowercase slicing

How do I make a dynamically expanding array in python

Categories

Resources