Can you reverse the output to count on another key? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'd like to count the number of IDs in terms of how many times it appears in data.
Now I got
U6492ea665413f304b323fea3e7f76739 7
Uf873b1e4dfc9f18d92758020dc1435c6 7
Ua30d2a8da85ac1144f9cbbf390c10d3c 7
Uf169ffec7dc767b89694a26cb057a258 7
U9e9c89c308d6c2f77dad28f8ec8e7993 7
.
The left is ID, and the right is how many times ID appears in data.
What I wannna get is like
7 900
6 435
5 434
4 343
3 453
2 34
1 121 .
The left is the number of appearances. The right is the number of IDs.
uid = data['id']
col=uid.value_counts()
col
The information of the original data is below.

I think this is what you want to do - just reset the index to get the ids as a separate column and then group by on the counts that you previously got - then count the IDs (here they'll be called index
df = col.reset_index()
df.groupby(by='count')['index'].count()

uid = data['uid']
col=uid.value_counts()
col
num = col.value_counts()
num
Repeating value_counts() has resolved the issue.

Related

How to find which doctor a patient is using, when only given a list of doctor's patients? (code improvement request) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I need to create a dataframe which lists all patients and their matching doctors.
I have a txt file with doctor/patient records organized in the following format:
Doctor_1: patient23423,patient292837,patient1232423...
Doctor_2: patient456785,patient25363,patient23425665...
And a list of all unique patients.
To do this, I imported the txt file into a doctorsDF dataframe, separated by a colon. I also created a patientsDF dataframe with 2 columns: 'Patients' filled from the patient list, and 'Doctors' column empty.
I then ran the following:
for pat in patientsDF['Patient']:
for i, doc in enumerate(doctorsDF[1]):
if doctorsDF[1][i].find(str(pat)) >= 0 :
patientsDF['Doctor'][i] = doctorsDF.loc[i,0]
else:
continue
This worked fine, and now all patients are matched with the doctors, but the method seems clumsy. Is there any function that can more cleanly achieve the result? Thanks!
(First StackOverflow post here. Sorry if this is a newb question!)
If you use Pandas, try:
df = pd.read_csv('data.txt', sep=':', header=None, names=['Doctor', 'Patient'])
df = df[['Doctor']].join(df['Patient'].str.strip().str.split(',')
.explode()).reset_index(drop=True)
Output:
>>> df
Doctor Patient
0 Doctor_1 patient23423
1 Doctor_1 patient292837
2 Doctor_1 patient1232423
3 Doctor_2 patient456785
4 Doctor_2 patient25363
5 Doctor_2 patient23425665
How to search:
>>> df.loc[df['Patient'] == 'patient25363', 'Doctor'].squeeze()
'Doctor_2'

how to get Unique count from a DataFrame in case of duplicate index [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am working on a dataframe. Data in the image
Q. I want the number of shows released per year but if I'm applying count() function, it's giving me 6 instead of 3. Could anyone suggest how do I get the correct value count.
To get unique value of single year, you can use
count = len(df.loc[df['release_year'] == 1945, 'show_id'].unique())
# or
count = df.loc[df['release_year'] == 1945, 'show_id'].nunique()
To summarize unique value of dataframe by year, you can drop_duplicates() on column show_id first.
df.drop_duplicates(subset=['show_id']).groupby('release_year').count()
Or use value_counts() on column after dropping duplicates.
df.drop_duplicates(subset=['show_id'])['release_year'].value_counts()
df['show_id'].nunique().count()
should do the job.

How do I remove squared brackets from data that is saved as a list in a Dataframe in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
The data is like this and it is in a data frame.
PatientId Payor
0 PAT10000 [Cash, Britam]
1 PAT10001 [Madison, Cash]
2 PAT10002 [Cash]
3 PAT10003 [Cash, Madison, Resolution]
4 PAT10004 [CIC Corporate, Cash]
I want to remove the square brackets and filter all patients who used at least a certain mode of payment eg madison then obtain their ID. Please help.
This will generate a list of tuples (id, payor). (df is the dataframe)
payment = 'Madison'
ids = [(id, df.Payor[i][1:-1]) for i, id in enumerate(df.PatientId) if payment in df.Payor[i]]
let's say, your data frame variable initialized as "df" and after removing square brackets, you want to filter all elements containing "Madison" under "Payor" column
df.replace({'[':''}, regex = True)
df.replace({']':''}, regex = True)
filteredDf = df.loc[df['Payor'].str.contains("Madison")]
print(filteredDf)

Python :Select the rows for the most recent entry from multiple users [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a dataframe df with 3 columns :
df=pd.DataFrame({
'User':['A','A','B','A','C','B','C'],
'Values':['x','y','z','p','q','r','s'],
'Date':[14,11,14,12,13,10,14]
})
I want to create a new dataframe that will contain the rows corresponding to highest values in the 'Date' columns for each user. For example for the above dataframe I want the desired dataframe to be as follows ( its a jpeg image):
Can anyone help me with this problem?
This answer assumes that there is different maximum values per user in Values column:
In [10]: def get_max(group):
...: return group[group.Date == group.Date.max()]
...:
In [12]: df.groupby('User').apply(get_max).reset_index(drop=True)
Out[12]:
Date User Values
0 14 A x
1 14 B z
2 14 C s

Counting different names in python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a file and want to count few names on it. The problem is for one of the names, I have more than one name! what i can do to count them as one name and not different names?
For example:
LR = lrr = LRr = lrrs they are all same thing but when I want to count them they assume as different names.
Thank you
It is not easy. And solution is simplified - first read_csv, then convert all letters to lower and then replace one or more s from end of string to empty string. Then remove duplicates - a bit modified this solution(replaced to only one letter). Last value_counts:
So if some words what need end with s there are replaced too.
df = pd.read_csv('file.csv')
#sample DataFrame
df = pd.DataFrame({'names': ['LR','lrr','LRr','lrrs', 'lrss', 'lrsss']})
print (df)
names
0 LR
1 lrr
2 LRr
3 lrrs
4 lrss
5 lrsss
print (df.names.str.lower().str.replace('s{1,}$','').str.replace(r'(.)\1+', r'\1'))
0 lr
1 lr
2 lr
3 lr
4 lr
5 lr
Name: names, dtype: object
print (df.names.str.lower()
.str.replace('s{1,}$','')
.str.replace(r'(.)\1+', r'\1')
.value_counts())
lr 6
Name: names, dtype: int64

Categories