Count ocurrences in index - python

I have a dataframe like this:
brand1 brand2 brand3
survey 22 33 12
clothes 19 22 19
shoes 34 12 15
What I'd like to do is count how many clothes I have and how many shoes in total, not taking into consideration the categories. I'm not sure how to do this since "survey" is not a column.
I basically want this:
survey
clothes 100
shoes 100
Any advice would be helpful.

Try
df.sum(axis = 1)
This should give the sum of values of the rows, then to display you can use a dictionary with keys as the survey column names, and value as the df.sum's values (maybe after storing it in a list).

Related

How to count text event type and transform it into country-year data using pandas?

I am trying to convert a dataframe where each row is a specific event, and each column has information about the event. I want to turn this into data in which each row is a country and year with information about the number and characteristics about the events in the given year.In this data set, each event is an occurrence of terrorism, and I want to count the number of events where the "target" is a government building. One of the columns is called "targettype" or "targettype_txt" and there are 5 different entries in this column I want to count (government building, military, police, diplomatic building etc). The targettype is also coded as a number if that is easier (i.e. there is another column where gov't building is 2, military installation is 4 etc..)
FYI This data set has 16 countries in West Africa and is looking at years 2000-2020 with a total of roughly 8000 events recorded. The data comes from the Global Terrorism Database, and this is for a thesis/independent research project (i.e. not a graded class assignment).
Right now my data looks like this (there are a ton of other columns but they aren't important for this):
eventID
iyear
country_txt
nkill
nwounded
nhostages
targettype_txt
10000102
2000
Nigeria
3
10
0
government building
10000103
2000
Mali
1
3
15
military installation
10000103
2000
Nigeria
15
0
0
government building
10000103
2001
Benin
1
0
0
police
10000103
2001
Nigeria
1
3
15
private business
.
.
.
And I would like it to look like this:
country_txt
iyear
total_nkill
total_nwounded
total_nhostages
total public_target
Nigeria
2000
200
300
300
15
Nigeria
2001
250
450
15
17
I was able to get the total number for nkill,nwounded, and nhostages using this super simple line:
df2 = cdf.groupby(['country','country_txt', 'iyear'])['nkill', 'nwound','nhostkid'].sum()
But this is a little different because I want to only count certain entries and sum up the total number of times they occur. Any thoughts or suggestions are really appreciated!
Try:
cdf['CountCondition'] = (cdf['targettype_txt']=='government building') |
(cdf['targettype_txt']=='military installation') |
(cdf['targettype_txt']=='police')
df2 = cdf[cdf['CountCondition']].groupby(['country','country_txt', 'iyear', 'CountCondition']).count()
You create a new column 'CountCondition' which just marks as true or false if the condition in the statement holds. Then you just count the number of times the CountCondition is True. Hope this makes sense.
It is possible to combine all this into one statement and NOT create an additional column but the statement gets quite convaluted and more difficult to understand how it works:
df2 = cdf[(cdf['targettype_txt']=='government building') |
(cdf['targettype_txt']=='military installation') |
(cdf['targettype_txt']=='police')].groupby(['country','country_txt', 'iyear']).count()

lookup within filtered range

I have a dataframe with data from ecommerce panel.
It has orders and returns mixed together.
Each row has orderID - it's the same number for normal orders and for corresponding returns that come back from customers.
My data looks like this:
orderID
Shop
Revenue
Note
44
0
-32
Return
45
0
-100
Return
44
1
14
45
3
20
Something else
46
2
50
47
1
80
Something
48
2
222
For each return I want to find a 'Shop' column value that corresponds to original order.
For example : 'orderID' == 44 comes twice: once as return (with 'Shop' == 0) and once as normal order (with 'Shop' == 1).
I want to replace all the 0 values with 'Shop' column with values from earlier orders
My desired output looks like this:
orderID
Shop
Revenue
Note
44
1
-32
Return
45
3
-100
Return
44
1
14
45
3
20
Something else
46
2
50
47
1
80
Something
48
2
222
I know how to do it in Google Sheets (first I filter table removing 'Shop'==0 values and then I vlookup for numbers in this filtered array)
I know how to filter this table using Pandas but I don't know how to write it.
I assume that I will need to write a temporary column first, where I store both types of values - for normal orders (just copied) and for returns.
Original dataframe is 1 000 000+ rows
My data in .csv is available here:
https://docs.google.com/spreadsheets/d/e/2PACX-1vQAJ4tMc_Bcvv-4FsUy3E7sG0m9hm-nLTVLj-LwlSEns-YJ1pbq6gSKp5mj5lZqRI2EgHOsOutwnn1I/pub?gid=0&single=true&output=csv
Thank you for any advice!
IIUC, using map:
m = df.query('Shop != 0').set_index('orderID')['Shop']
df['Shop'] = df['orderID'].map(m)
print(df)
Output:
orderID Shop Revenue Note
0 44 1 -32 Return
1 45 3 -100 Return
2 44 1 14 NaN
3 45 3 20 Something else
4 46 2 50 NaN
5 47 1 80 Something
6 48 2 222 NaN
Create a pd.Series using query to filter out zero shops then set_index and map shops to orderID​.
This works if there is a 1-1 shop to order mapping. If you have multiple shops per order, then you'll need logic to determine which shop valid.
If you have duplicate order to the same shop, then you need to drop_duplicates first.

Is it possible to do full text search in pandas dataframe

currently, I'm using pandas DataFrame.filter to filter the records of the dataset. if I give a word, I have got all the records that are matching with that word. now if I give two words that are present in the dataset but they are not in one record then I got an empty set. Is there any way in either pandas or other python modules that I can find something that can search multiple words ( not in one record )?
With python list comprehension, we can build a full-text search by mapping. in pandas DataFrame.filter uses indexing. is there any difference between mapping and indexing? if yes what is it and which can give a better performance?
CustomerID Genre Age AnnualIncome (k$) SpendingScore (1-100)
1 Male 19 15 39
2 Male 21 15 81
3 Female 20 16 6
4 Female 23 16 77
5 Female 31 17 40
pokemon[pokemon['CustomerID'].isin(['200','5'])]
Output:
CustomerID Genre Age AnnualIncome (k$) SpendingScore (1-100)
5 Female 31 17 40
200 Male 30 137 83
Name Qty.
0 Apple 3
1 Orange 4
2 Cake 5
Considering the above dataframe, if you want to find quantities of Apples and Oranges, you can do it like this:
result = df[df['Name'].isin(['Apple','Orange'])]
print (result)

Problem about using groupby on a list column

I'm using the MovieLens 1M dataset to learn pandas, and I want to get some data based on the genres column.
one row of the dataframe I get is like this:
movieid title genres rating userid gender age occupation zipcode timestamp
1000204 2198 Modulations (1998) [Documentary] 5 5949 M 18 17 47901 958846401
1000205 2703 Broken Vessels (1998) [Drama] 3 5675 M 35 14 30030 976029116
1000206 2845 White Boys (1999) [Drama] 1 5780 M 18 17 92886 958153068
1000207 3607 One Little Indian (1973) [Comedy, Drama, Western] 5 5851 F 18 20 55410 957756608
1000208 2909 Five Wives, Three Secretaries and Me (1998) [Documentary] 4 5938 M 25 1 35401 957273353
I want to us df.groupby('genres') to groupby the dataframe and then get the sum of each genre and the mean rating of each genre.
However, when I use the df.groupby('genres').mean(), it had an error
"TypeError: unhashable type: 'list' "
Please tell me why this error happeded and how can I use groupby on a column which the data are lists.
THX very much!
groupby takes a list as argument. Try df.groupby(['genres']).mean()

Finding most common values from a count in pandas jupyterbnote book

I currently have a massive dataset with a large amount of rows and I wanted to create a smaller dataframe that only pulls 2 columns from the larger one and how many times each name occurred in that chapter in this instance 'Occurrence'
The below code is what I am using
df1 = (Dec16.groupby(["BNF Chapter", "Name"]).size().reset_index(name="Occurrence"))
df1
It plots this
BNF Chapter Name Occurrence
1 Aluminium hydroxide 2
1 Aluminium hydroxide + Magnesium trisilicate 2
1 Alverine 702
.......
21 Polihexanide 2
21 Potassium hydroxide 32
21 Sesame oil 22
21 Sodium chloride 222
What I would like to get is the top 10 most occurred names for a certain chapter as the dataset is so large.
For example a dataframe that only pulls
The top 10 most common names in chapter 1
How would I go about doing this?
Many thanks!!!
You can use this pandas.DataFrame.count
This Count Values In Pandas Dataframe here can help you out I hope

Categories