idx_list = []
for idx, row in df_quries_copy.iterrows():
for brand in brand_name:
if row['user_query'].contains(brand):
idx_list.append(idx)
else:
continue
brand_name list looks like below
brand_name = ['Apple', 'Lenovo', Samsung', ... ]
I have df_queries data frame which has the query the user used
the table is looks like below
user_query
user_id
Apple Laptop
A
Lenovo 5GB
B
and also I have a brand name as a list
i want to find out the users who uses related with brand such as 'Apple laptop'
but when I run the script, I got a message saying that
'str' object has no attribute 'contains'
how am I supposed to do to use multiple for loop ?
Thank you in advance.
for brand in brand_name[:100]:
if len(copy_df[copy_df['user_query'].str.contains(brand)]) >0:
ls.append(copy_df[copy_df['user_query'].str.contains(brand)].index)
else:continue
I tried like answer but the whole dataframe came out in a sudden as a result
You can use df_quries_copy[df_quries_copy['user_query'].str.contrains(brand)].index to get index directly.
for brand in brand_name:
df_quries_copy[df_quries_copy['user_query'].str.contrains(brand)].index
Or in your code, use brand in row['user_query'] since row['user_query'] is a string value.
I've tried doing some searching, but I'm having troubles finding what I specifically need. I currently have this.
location = 'Location'
data = pd.read_csv('testbook.csv')
df = pd.DataFrame(data)
search = 'OR' # This will be replaced with an input
row = (df[df.eq(search).any(1)])
print(row)
Location = row.at[0, location]
print(Location)
This outputs this
row print out
Location City Price Etc
0 FL OR 50 123
Location print out
FL
this is the CSV information that it's pull the data from.
My main question and issue is what I'm trying to find out is at this specific line of code
Location = row.at[0, location]
for Location what I'm trying to do and see if possible is in the brackets [0, location].
I want it to automate in the future since for example if I need to find instead of 'OR' I need to find what data is in 'OR1'. The issue is that the [0] is related to the Row # hence this(this is the entire df).
Location City Price Etc
0 FL OR 50 123
1 FL1 OR1 501 1231
2 FL2 OR2 502 1232
I would have to manually change the code every single time which of course is unfeasible with what I'm trying to accomplish.
My main question is, how do I pull specific row numbers all the way on the left and take that output and make it a variable that I can input anywhere?
I'm having a bit of trouble figuring out what you are looking for but this is my best guess
import pandas as pd
data = {'Location':['FL', 'FL1', 'FL2'],
'City': ['OR', 'OR1', 'OR2'],
'Price':[50, 501, 502],
'Etc': [123,1231,1232]}
data = pd.DataFrame(data)
df = pd.DataFrame(data)
# Given search term -> find location
search = 'OR'
# Outputs 'FL'
df['Location'][df['City'] == search].any()
This question already has answers here:
Apply Python function to one pandas column and apply the output to multiple columns
(4 answers)
Closed 1 year ago.
I have some code which is (simplified) like this. The actual data lists are tens of thousands in size, not just 3.
There is a dictionary of staff which I make a DataFrame from.
There is a list of dictionary objects which contain additional staff information.
Also:
The staff list and the extra staff information (master_info_list) overlap but each has items that are unique to them.
The "index" I am using (StaffNumber) is actually prefixed with "SN_" in the extra staff information, so I can't compare them directly.
The duplication of StaffNumber in the master_info_list is intended (that's just how I receive it!).
What I want to do is populate two new columns into the dataframe which get their data from the extra staff information. I can do this by making 2 separate calls to get_department_and_manager, one for Department and one for Manager. That works. But, it "feels" like I should be able to take 2 fields from the output of get_department_and_manager and populate the dataframe in one go, but I'm struggling to get the syntax right. What is the correct syntax (if possible)? Also, iterating through the list the way I do (with a for loop) seems inefficient. Is there a better way?
The examples I have seen all seem to create new columns from existing data in the dataframe, or they are simple examples where no mashing of data is required before comparing the two "lists" (or list and dictionary).
import pandas as pd
def get_department_and_manager(row, master_list):
dept = 'bbb'
manager = 'aaa'
for i in master_list:
if i['StaffNumber'] == 'SN_' + row['StaffNumber']:
dept = i['data']['Department']
manager = i['data']['Manager']
break
return [dept, manager]
staff = {'Name': ['Alice', 'Bob', 'Dave'],
'StaffNumber': ['001', '002', '004']}
master_info_list = [{'StaffNumber': 'SN_001', 'data': {'StaffNumber': 'SN_001', 'Department': 'Sales', 'Manager': 'Luke' }},
{'StaffNumber': 'SN_002', 'data': {'StaffNumber': 'SN_002', 'Department': 'Marketing', 'Manager': 'Mary' }},
{'StaffNumber': 'SN_003', 'data': {'StaffNumber': 'SN_003', 'Department': 'IT', 'Manager': 'Neal' }}]
df = pd.DataFrame(data=staff)
df[['Department']['Manager']] = df.apply(get_department_and_manager, axis='columns', args=[master_info_list])
print(df)
If I understand you correctly, you can use .merge:
x = pd.DataFrame([v["data"] for v in master_info_list])
x["StaffNumber"] = x["StaffNumber"].str.split("_").str[-1]
print(df.merge(x, on="StaffNumber", how="left"))
Prints:
Name StaffNumber Department Manager
0 Alice 001 Sales Luke
1 Bob 002 Marketing Mary
2 Dave 004 NaN NaN
I have a dictionary of dataframes called names_and_places in pandas that looks like the below.
names_and_places:
Alfred,,,
Date,F_1,F_2,Key
4/1/2020,1,4,NAN
4/2/2020,2,5,NAN
4/3/2020,3,6,"[USA,NY,NY, NY]"
Brett,,,
Date,F_1,F_2,Key
4/1/2020,202,404,NAN
4/2/2020,101,401,NAN
4/3/2020,102,403,"[USA,CT, Fairfield, Stamford] "
Claire,,,
Date,F_1,F_2,Key
4/1/2020,NAN,12,NAN
4/2/2020,NAN,45,NAN
4/3/2020,7,78,"[USA,CT, Fairfield, Darian] "
Dane,,,
Date,F_1,F_2,Key
4/1/2020,4,17,NAN
4/2/2020,5,18,NAN
4/3/2020,7,19,"[USA,CT, Bridgeport, New Haven] "
Edward,,,
Date,F_1,F_2,Key
4/1/2020,4,17,NAN
4/2/2020,5,18,NAN
4/3/2020,7,19,"[USA,CT, Bridgeport, Milford] "
(text above or image below)
The key column is either going to be NAN or of the form [Country, State, County, City], but can be of length 3 or 4 elements (sometimes County is absent). I need to find all the elements with a given element that is contained in a key. For instance if the element = "CT", the script returns Edward, Brett, Dane and Claire (order is not important). If the element = "Stamford" then only Brett is returned. However I am going about the identification process in a way that seems very inefficient. I basically have variables that iterate through each possible combination of State, County, City (all of which I am currently manually inputting into variables) to identify which names to extract like below:
country = 'USA' #this never needs to change
element = 'CT'
#These next two are actually in .txt files that I create once I am asked for
#a given breakdown but I would like to not have to manually input these
middle_node = ['Fairfield','Bridgeport']
terminal_nodes = ['Stamford','Darian','New Haven','Milford']
names=[]
for a in middle_node:
for b in terminal_nodes:
my_key = [country,key_of_interest,a,b]
for s in names_and_places:
for z in names_and_places[s]['Key']:
if my_key == z:
names.append(s)
#Note having "if my_key in names_and_places[s]['Key']": was causing sporadic failures for
#some reason
display(names)
Output:
Edward, Brett, Dane, Claire
What I would like is to be able to input only the variable element and this can either be a level 2 (State), 3 (County), or 4 (City) node. However short of adding additional for loops and going into the Key column, I don't know how to do this. The one benefit (for a novice like myself) is that the double for loops allow me to keep bucketing intact and makes it easier for people to see where names are coming from when that is also needed.
But is there a better way? For bonus points if there is a way to handle the case when the key_of_interest is 'NY' and values in the Keys column can be like [USA, NY, NY, NY] or [USA, NY, NY, Queens].
Edit: names_and_places is a dictionary with names as the index, so
display(names_and_places['Alfred'])
would be
Date,F_1,F_2,Key
4/1/2020,1,4,NAN
4/2/2020,2,5,NAN
4/3/2020,3,6,"[USA,NY,NY, NY]"
I do have the raw dataframe that has columns:
Date, Field name, Value, Names,
Where Field Name is either F_1, F_2 or Key and Value is the associated value of that field. I then pivot the data on Name with columns of Field Name to make my extraction easier.
Here's a way to do that in a somewhat more effective way. You start by building a single dataframe out of the dictionary, and then do the actual work on that dataframe.
single_df = pd.concat([df.assign(name = k) for k, df in names_and_places.items()])
single_df["Key"] = single_df.Key.replace("NAN", np.NaN)
single_df.dropna(inplace=True)
# Since the location is a string, we have to parse it.
location_df = pd.DataFrame(single_df.Key.str.replace(r"[\[\]]", "").str.split(",", expand=True))
location_df.columns = ["Country", "State", "County", "City"]
single_df = pd.concat([single_df, location_df], axis=1)
# this is where the actual query goes.
single_df[(single_df.Country == "USA") & (single_df.State == "CT")].name
The output is:
2 Brett
2 Claire
2 Dane
2 Edward
Name: name, dtype: object
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Most profitable element for every category
Must read my file and determinate most profitable element for every category in range of dates by user entries.
File:
Date|Category|Name|Price
05/01/2016|category6|Name8|4200
06/01/2016|category1|Name1|1000
07/01/2016|category2|Name2|1200
07/01/2016|category3|Name1|1000
07/01/2016|category1|Name2|1200
07/01/2016|category3|Name2|1200
07/01/2016|category2|Name1|1000
07/01/2016|category2|Name2|1200
07/01/2016|category2|Name2|1200
08/01/2016|category2|Name1|1000
09/01/2016|category4|Name7|3100
My file will be a lot bigger this is just example.
Start Date : 07/01/2016
End Date: 07/01/2016
For every date in that range program will print most profitable element for every category
Category 1:
07/01/2016|category1|Name2|1200
Name2 = 1200
Comparing prices >>> Most profitable is: Name2
Category 2:
07/01/2016|category2|Name2|1200
07/01/2016|category2|Name1|1000
07/01/2016|category2|Name2|1200
07/01/2016|category2|Name2|1200
Name1 = 1000
Name2 = 3600
Comparing prices >>> Most proftable: Name2
Category 3:
07/01/2016|category3|Name1|1000
07/01/2016|category3|Name2|1200
Name1: 1000
Name2: 1200
Comparing prices >>> Most profitable: Name2
Problem is i don't know how to compare these prices for categoris and names.
Also dates will be always on asending order.
I'm using both dictionary and lists.
INPUT AND OUTPUT:
Start Date : 07/01/2016
End Date: 07/01/2016
Category1; Most profitable is: Name2
Category2; Most profitable is: Name2
Category3; Most profitable is: Name2
in this case most profitable is Name2 for every category.
The following is not exactly what you need but should give you a fair idea to get going. I keep track of the most profitable name and value for combinations of date and category:
date_cat_profit_dict = {}
with open('data.txt') as f:
for line in f:
# split and store into variables.
# You could skip processing line
# if you are looking for specific date
date, category, name, profit = line.split('|')
# Convert to int for comparison
profit = int(profit)
# Key for storing into dict
composite_key = '{0}|{1}'.format(date, category)
# _ because we don't need the name right now
_, max_profit = (date_cat_profit_dict.
setdefault(composite_key, ('', 0)))
if max_profit < profit:
date_cat_profit_dict[composite_key] = (name, profit)
for composite_key, (name, profit) in date_cat_profit_dict.items():
print('Max for {0} : {1}, {2}'.format(composite_key, name, profit))