How to removing irrelevant data in a dataframe?

How to removing irrelevant data in a dataframe? - python

I am having a large data(image:https://drive.google.com/file/d/1TFX1LQhQ-xwYp47fllL8PWwbwbv3wsb6/view?usp=drivesdk).The shape of data is 10000x100 .I want to remove irrelevant data by the condition(-0.5 =<angle<= 0.8).
the angle is a list of Angle_0,Angle_1,Angle_2......Angle_20.
ang = [f"Angle_{i}" for i in range(20)]
I want to have rows which follows the condition(-0.5 =<angle<= 0.8) for angle and delete other rows.
How to do this in python and pandas?
for example Angle_0 has value 0.1926715(row number 24) i want that row of entire dataset then in Angle_1 have data such as 0.1926715,0.192497 and i need those row(row number 7,9,14,16,19,21,23,26,28,29) similary for all other angles
I am a beginner in python.
Thank you very much in advance.

If you need to remove the rows that contain at least one angle that does not follow your conditions, this will solve your problem:
df.loc[(df>=-0.5 & df<=0.8).all(axis=1)]

new_df = df[(df['angle']>=-0.5 and df['angle']<=0.8]

Dirty way:
for n in range (20):
column_name = "Angle_" + str(n)
df = df[(df[column_name]>=-0.5 & df[column_name]<=0.8]
Since you need to adress several columns starting with the name Angle you could use regex like this to select them:
df = df.loc[(df.filter(regex='^Angle_') >=-0.5).all(axis=1) & (df.filter(regex='^Angle_')<=0.8).all(axis=1)]
The second solution is better in case you get a variable amount of angles or if it might change later

Related

Python: How to code the Hi_1 (previous line) (Hi+Hi_1)*qi

Dear all,
I have a question which would be quiet simple to do in Excel, but I fail so in Python.
The following formular needs to be calculated: (Hi+Hi_1)*qi
Table screenshot with the data
How can I do the coding to pick Hi and qi from the same line, however, Hi_1 from the previous line.
df1['gini_coef'] = (df['Hi'] + '???') * df['qi']
Any help and suggestions are most appreciated.
Kind regards
Sina

If you have a number index on your dataframe you can use from a simple row and iterate on your dataframe. For itrating on your dataframe there are many ways. A simple way is :
(If you index is not integer you can use from reset_index
df ['newcol'] = 0
For i in range(1,df.index):
df['newcol'][i] = (df['Hi'][i] + df[Hi_1][i-1]) * df['qi'][i]

Removing numbers from dataframe that lie within range

I have a pandas dataframe that contains values between -1000 to 1000. I want to eliminate all the numbers between the range of -0.00001 to 0.00001 i.e replace them with NaN. It is worth mentioning that my df contains numerous instances of very small positive and negative numbers that I want to include within this range as well e.g. 6.26478E-52.
How do I go about doing this?
P.S. I am attaching an image of the first few rows of my df for reference.

IIUC use if need less like -0.00001 and 0.00001:
df = df.mask(df.lt(-0.00001) | df.lt(0.00001))
is same like below 0.00001:
df = df.mask(df.lt(0.00001))
Or if need values between:
df = df.mask(df.gt(-0.00001) & df.lt(0.00001))

Pandas - Use values from rows with equal values in iteration

In case this has been answered in the past I want to apologize, I was not sure how to phrase the question.
I have a dataframe with 3d coordinates and rows with a scalar value (magnetic field in this case) for each point in space. I calculated the radius as the distance from the line at (x,y)=(0,0) for each point. The unique radius and z values are transferred into a new dataframe. Now I want to calculate the scalar values for every point (Z,R) in the volume by averaging over all points in the 3d system with equal radius.
Currently I am iterating over all unique Z and R values. It works but is awfully slow.
df is the original dataframe, dfn is the new one which - in the beginning - only contains the unique combinations of R and Z values.
for r in dfn.R.unique():
for z in df.Z.unique():
dfn.loc[(df["R"]==r)&(df["Z"]==z), "B"] = df["B"][(df["R"]==r)&(df["Z"]==z)].mean()
Is there any way to speed this up by writing a single line of code, in which pandas is given the command to grab all rows from the original dataframe, where Z and R have the values according to each row in the new dataframe?
Thank you in advance for your help.

Try groupby!!!
It looks like you can achieve with something like:
df[['R', 'Z', 'B']].groupby(['R', 'Z']).mean()

beginner panda change row data based upon code

I'm a beginner in panda and python, trying to learn it.
I would like to iterate over panda rows, to apply simple coded logic.
Instead of fancy mapping functions, I just want simple coded logic.
So then I can easily adapt it later for other coded logic rules as well.
In my dataframe dc,
I like to check if column AgeUnkown == 1 (or >0 )
And if so it should move the value of column Age to AgeUnknown.
And then make Age equal to 0.0
I tried various combinations of my below code but it won't work.
# using a row reference #########
for index, row in dc.iterrows():
r = row['AgeUnknown']
if (r>0):
w = dc.at[index,'Age']
dc.at[index,'AgeUnknown']=w
dc.at[index,'Age']=0
Another attempt
for index in dc.index:
r = dc.at[index,'AgeUnknown'].[0] # also tried .sum here
if (r>0):
w= dc.at[index,'Age']
dc.at[index,'AgeUnknown']=w
dc.at[index,'Age']=0
Also tried
if(dc[index,'Age']>0 #wasnt allowed either
Why isn't this working as far as I understood a dataframe should be able to be addressed like above.

I realize you requested a solution involving iterating the df, but I thought I'd provide one that I think is more traditional.
A non-iterating solution to your problem is something like this- 1) get all the indexes that meet your criteria 2) set those indexes of the df to what you want.
# indexes where column AgeUnknown is >0
inds = dc[dc['AgeUnknown'] > 0].index.tolist()
# change the indexes of AgeUnknown to to the Age column
dc.loc[inds, 'AgeUnknown'] = dc.loc[inds, 'Age']
# change the Age to 0 at those indexes
dc.loc[inds, 'Age'] = 0

Pandas - How do I look for a set of values in a column and if it is present return a value in another column

I am new to pandas. I have a csv file which has a latitude and longitude columns and also a tile ID column, the file has around 1 million rows. I have a list of around a hundred tile ID's and want to get the latitude and longitude coordinates for these tile ID's. Currently I have:
good_tiles_str = [str(q) for q in good_tiles]#setting list elements to string data type
file['tile'] = file.tile.astype(str)#setting title column to string data type
for i in range (len(good_tiles_str)):
x = good_tiles_str[i]
lat = file.loc[file['tile'].str.contains(x), 'BL_Latitude'] #finding lat coordinates
long = file.loc[file['tile'].str.contains(x), 'BL_Longitude'] #finding long coordinates
print(lat)
print(long)
This method is very slow and I know it is not the correct way as I heard you should not use for loops like this whilst using pandas. Also, it does not work as it doesn't find all the latitude and longitude points for the tile ID's
Any help would be very gladly appreciated

There is no need to iterate rows explicitly , I think as far as I understood your question.
If you wish a particular assignment given a condition, you can do so explicitly. Here's one way using numpy.where; we use ~ to indicate "negative".
rule1= file['tile'].str.contains(x)
rule2= file['tile'].str.contains(x)
file['flag'] = np.where(rule1 , 'BL_Latitude', " " )
file['flag'] = np.where(rule2 & ~rule1, 'BL_Longitude', file['flag'])

Try this:
search_for = '|'.join(good_tiles_str)
good = file[file.tile.str.contains(search_for)]
good = good[['BL_Latitude', 'BL_Longitude']].drop_duplicates()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to removing irrelevant data in a dataframe? - python

If you need to remove the rows that contain at least one angle that does not follow your conditions, this will solve your problem: df.loc[(df>=-0.5 & df<=0.8).all(axis=1)]

new_df = df[(df['angle']>=-0.5 and df['angle']<=0.8]

Related

Python: How to code the Hi_1 (previous line) (Hi+Hi_1)*qi

Removing numbers from dataframe that lie within range

Pandas - Use values from rows with equal values in iteration

beginner panda change row data based upon code

Pandas - How do I look for a set of values in a column and if it is present return a value in another column

Categories

Resources