Select all dataframe rows containing a specific integer - python

My data-frame looks something like this:
x y
1 a
1 b
2 c
3 d
4 e
5 f
1 g
All I want is to count the number of rows containing the number '1' in column 'x'. I know how this works for strings, but I can't find anything similar for numbers. The printed output in this case would be
3

df.loc[df.x == 1, 'x'].count()

Related

Saving small sub-dataframes containing all values associated to a specific 'key' string

I'd need a little suggestion on a procedure using pandas, I have a 2-columns dataset that looks like this:
A 0.4533
B 0.2323
A 1.2343
A 1.2353
B 4.3521
C 3.2113
C 2.1233
.. ...
where first column contains strings and the second one floats. I would like to save the minimum value for each group of unique strings in order to have the associated minimum with A, B, C. Does anybody have any suggestions on that? It could help me also storing somehow all the values for each string they are associated.
Many thanks,
James
Input data:
>>> df
0 1
0 A 0.4533
1 B 0.2323
2 A 1.2343
3 A 1.2353
4 B 4.3521
5 C 3.2113
6 C 2.1233
Use groupby before min:
out = df.groupby(0).min()
Output result:
>>> out
1
0
A 0.4533
B 0.2323
C 2.1233
Update:
filter out all the values in the original dataset that are more than 20% different from the minimum
out = df[df.groupby(0)[1].apply(lambda x: x <= x.min() * 1.2)]
>>> out
0 1
0 A 0.4533
1 B 0.2323
6 C 2.1233
You can simply do it by
min_A=min(df[df["column_1"]=="A"]["value"])
min_B=min(df[df["column_1"]=="B"]["value"])
min_C=min(df[df["column_1"]=="C"]["value"])
where df = Dataframe column_1 and value are the names of the columns of the dataframe
You can also do it by using the pre-defined function of pandas i.e. groupby()
>> df.groupby(["column_1"]).min()
The Above will also give the same results.

How to get the different column in similar rows

I have a dataframe as follows:
ID NAME LOCATION OCCUPATION IND
1 A XYZ QWE 1
1 A WER QWE 1
2 B ERT NBV 1
2 B ERT BVC 1
3 C RTY VCX 1
As you can see there are a few similar rows that differ only in the value in one/two columns. How can I get to know which column(s) is the differentiator in between similar rows i.e rows with same ID?
Any indicator of the column name in any way works.
You extract each row as a list and compare elements.
size = len(row(1))
diff = [row1[i] != row2[i] for i in range(size)]
This gives you a sequence of Booleans, True where the elements differ. Now check how many elements are the same:
if 1 <= sum(same) <= 2:
Now, simply return the indices of the True elements.
You should be able to handle the coding from here.

how to split a column into comma seperated string?

Here is sample dataframe and a is my column name.
a b x
0 1 3 a
1 2 4 a
2 1 3 b
3 2 5 b
4 2 4 c
need a column unique values to be seperated in this way
required output: '1','2'
below is my code i'm getting like this
x=x1['id'].unique()
x2=','.join("\'"+str(i)+"\'" for i in x)
for this way of code
i'm getting output some thing like this
output:"'1','2'"
**2nd approach:**
x2=','.join("\'"+x1['id']+"\'"):
if i'm do this i'm getting the count of id has been increasing
i need to pass output into sql query like select * from abc where a in (x2) for that reason need output something like this
x2 -->'1','2'
i'm getting
x2--->" '1','2'"
Try using your first approach with f-strings to make things easier.
x2 =' ,'.join(f"'{str(i)}'" for i in x)
query = rf"""
SELECT
*
FROM
abc
WHERE
a in ({x2})
"""
If you try print(query), it gives
SELECT
*
FROM
abc
WHERE
a in ('1' ,'2')

Split Pandas Dataframe Column According To a Value

I searched and I couldn't find a problem like mine. So if there is and somehow I couldn't find please let me know. So I can delete this post.
I stuck with a problem to split pandas dataframe into different data frames (df) by a value.
I have a dataset inside a text file and I store them as pandas dataframe that has only one column. There are more than one sets of information inside the dataset and a certain value defines the end of that set, you can see a sample below:
The Sample Input
In [8]: df
Out[8]:
var1
0 a
1 b
2 c
3 d
4 endValue
5 h
6 f
7 b
8 w
9 endValue
So I want to split this df into different data frames. I couldn't find a way to do that but I'm sure there must be an easy way. The format I display in sample output can be a wrong format. So, If you have a better idea I'd love to see. Thank you for help.
The sample output I'd like
var1
{[0 a
1 b
2 c
3 d
4 endValue]},
{[0 h
1 f
2 b
3 w
4 endValue]}
You could check where var1 is endValue, take the cumsum, and use the result as a custom grouper. Then Groupby and build a dictionary from the result:
d = dict(tuple(df.groupby(df.var1.eq('endValue').cumsum().shift(fill_value=0.))))
Or for a list of dataframes (effectively indexed in the same way):
l = [v for _,v in df.groupby(df.var1.eq('endValue').cumsum().shift(fill_value=0.))]
print(l[0])
var1
0 a
1 b
2 c
3 d
4 endValue
One idea with unique index values is replace non matched values to NaNs and backfilling them, last loop groupby object for list of DataFrames:
g = df.index.to_series().where(df['var1'].eq('endValue')).bfill()
dfs = [a for i, a in df.groupby(g, sort=False)]
print (dfs)
[ var1
0 a
1 b
2 c
3 d
4 endValue, var1
5 h
6 f
7 b
8 w
9 endValue]

How to find the number of an element in a column of a dataframe

For example, I have a dataframe A likes below :
a b c
x 0 2 1
y 1 3 2
z 0 2 4
I want to get the number of 0 in column 'a' , which should returns 2. ( A[x][a] and A[z][a] )
Is there a simple way or is there a function I can easily do this?
I've Googled for it, but there are only articles like this.
count the frequency that a value occurs in a dataframe column
Which makes a new dataframe, and is too complicated to what I only need to do.
Use sum with boolean mask - Trues are processes like 1, so output is count of 0 values:
out = A.a.eq(0).sum()
print (out)
2
Try value_counts from pandas (here):
df.a.value_counts()["0"]
If the values are changeable, do it with df[column_name].value_counts()[searched_value]

Categories