panda dataframe extracting values - python

I have a dataframe called "nums" and am trying to find the value of the column "angle" by specifying the values of other columns like this:
nums[(nums['frame']==300)&(nums['tad']==6)]['angl']
When I do so, I do not get a singular number and cannot do calculations on them. What am I doing wrong?
nums

First of all, in general you should use .loc rather than concatenate indexes like that:
>>> s = nums.loc[(nums['frame']==300)&(nums['tad']==6), 'angl']
Now, to get the float, you may use the .item() accessor.
>>> s.item()
-0.466331

Related

extract number of ranking position in pandas dataframe

I have a pandas dataframe with a column named ranking_pos. All the rows of this column look like this: #123 of 12,216.
The output I need is only the number of the ranking, so for this example: 123 (as an integer).
How do I extract the number after the # and get rid of the of 12,216?
Currently the type of the column is object, just converting it to integer with .astype() doesn't work because of the other characters.
You can use .str.extract:
df['ranking_pos'].str.extract(r'#(\d+)').astype(int)
or you can use .str.split():
df['ranking_pos'].str.split(' of ').str[0].str.replace('#', '').astype(int)
df.loc[:,"ranking_pos"] =df.loc[:,"ranking_pos"].str.replace("#","").astype(int)

How can I assign a lists elements to corresponding rows of a dataframe in pandas?

I have numbers in a List that should get assigned to certain rows of a dataframe consecutively.
List=[2,5,7,12….]
In my dataframe that looks similar to the below table, I need to do the following:
A frame_index that starts with 1 gets the next element of List as “sequence_number”
Frame_Index==1 then assign first element of List as Sequence_number.
Frame_index == 1 again, so assign second element of List as Sequence_number.
So my goal is to achieve a new dataframe like this:
I don't know which functions to use. If this weren't python language, I would use a for loop and check where frame_index==1, but my dataset is large and I need a pythonic way to achieve the described solution. I appreciate any help.
EDIT: I tried the following to fill with my List values to use fillna with ffill afterwards:
concatenated_df['Sequence_number']=[List[i] for i in
concatenated_df.index if (concatenated_df['Frame_Index'] == 1).any()]
But of course I'm getting "list index out of range" error.
I think you could do that in two steps.
Add column and fill with your list where frame_index == 1.
Use df.fillna() with method="ffill" kwarg.
import pandas as pd
df = pd.DataFrame({"frame_index": [1,2,3,4,1,2]})
sequence = [2,5]
df.loc[df["frame_index"] == 1, "sequence_number"] = sequence
df.ffill(inplace=True) # alias for df.fillna(method="ffill")
This puts the sequence_number as float64, which might be acceptable in your use case, if you want it to be int64, then you can just force it when creating the column (line 4) or cast it later.

Why does pandas DataFrame convert integer to object like this?

I am using value_counts() to get the frequency for sec_id. The output of value_counts() should be integers.
When I build DataFrame with these integers, I found those columns are object dtype. Does anyone know the reason?
They are the object dtype because your sec_id column contains string values (e.g. "94114G"). When you call .values on the dataframe created by .reset_index(), you get two arrays which both contain string objects.
More importantly, I think you are doing some unnecessary work. Try this:
>>> sec_count_df = df['sec_id'].value_counts().rename_axis("sec_id").rename("count").reset_index()

indexing into a column in pandas

I am trying to set colb in a pandas array depending on the value in colb.
The order in which I refer to the two column indices in the array seems to have an impact on whether the indexing works. Why is this?
Here is an example of what I mean.
I set up my dataframe:
test=pd.DataFrame(np.random.rand(20,1))
test['cola']=[x for x in range(20)]
test['colb']=0
If I try to set column b using the following code:
test.loc['colb',test.cola>2]=1
I get the error:`ValueError: setting an array element with a sequence
If I use the following code, the code alters the dataframe as I expect.
test.loc[test.cola>2,'colb']=1
Why is this?
Further, is there a better way to assign a column using a test like this?

How to create a new empty pandas columns with a specific dtype?

I have a DataFrame df with columns 'a'. How would I create a new column 'b' which has dtype=object?
I know this may be considered poor form, but at the moment I have a dataframe df where the column 'a' contains arrays (each element is an np.array). I want to create a new column 'b' where each element is a new np.array that contains the logs of the corresponding elemnent in 'a'.
At the moment I tried these two methods, but neither worked:
for i in df.index:
df.set_value(i,'b', log10(df.loc[i,'a']))
and
for i in df.index:
df.loc[i,'b'] = log10(df.loc[i,'a']))
Both give me ValueError: Must have equal len keys and value when setting with an iterable.
I'm assuming the error comes about because the dtype of the new column is defaulted to float although I may be wrong.
As each row of your column is an array, it's better to use the standard NumPy mathematical functions for computing their element-wise logarithms to the base 10:
df['log_a'] = df.a.apply(lambda x: np.log10(x))

Categories