How to update a dynamic list in python? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I had a list that has Business = ['Company name','Mycompany',Revenue','1000','Income','2000','employee','3000','Facilities','4000','Stock','5000'] , the output of the list structure is shown below:
Company Mycompany
Revenue 1000
Income 2000
employee 3000
Facilities 4000
Stock 5000
The Dynamic list gets updated ***
for every iteration for the list and some of the items in the list is
missing
***. for example execution 1 the list gets updated as below:
Company Mycompany
Income 2000 #revenue is missing
employee 3000
Facilities 4000
Stock 5000
In the above list the Revenue is removed from list as company has no revenue, in second example :
Company Mycompany
Revenue 1000
Income 2000
Facilities 4000 #Employee is missing
Stock 5000
In the above example 2 Employee is missing. How to create a output list that replaces the missing values with 0, in example 1 revenue is missing , hence I have to replace the output list with ['Revenue,'0'] at its position, for better understanding please find below
output list created for example 1: Revenue replaced with 0
Company Mycompany| **Revenue 0**| Income 2000| employee 3000| Facilities 4000| Stock 5000
Output list for example 2: employee is replaced with 0
Company Mycompany| Revenue 1000| Income 2000| **employee 0**| Facilities 4000| Stock 5000
How can I achieve the output list with replacing the output list with 0 on missing list items without changing the structure of list. My code so far:
for line in Business:
if 'Company' not in line:
Business.insert( 0, 'company')
Business.insert( 1, '0')
if 'Revenue' not in line:
#got stuck here
if 'Income' not in line:
#got stuck here
if 'Employee' not in line:
#got stuck here
if 'Facilities' not in line:
#got stuck here
if 'Stock' not in line:
#got stuck here
Thanks a lot in advance

If you are getting inputs as a list then you can convert the list into a dict like this then you'll have a better approach on data, getting as a dictionary would be a better choice though
Business = ['Company name','Mycompany','Revenue',1000,'Income',2000,'employee',3000,'Facilities',4000,'Stock',5000]
BusinessDict = {Business[i]:Business[i+1] for i in range(0,len(Business)-1,2)}
print(BusinessDict)

As said in the comments, a dict is a much better data structure for the problem. If you really need the list, you could use a temporary dict like this:
example = ['Company name','Mycompany','Income','2000','employee','3000','Facilities','4000','Stock','5000']
template = ['Company name', 'Revenue', 'Income', 'employee', 'Facilities', 'Stock']
# build a temporary dict
exDict = dict(zip(example[::2], example[1::2]))
# work on it
result = []
for i in template:
result.append(i)
if i in exDict.keys():
result.append(exDict[i])
else:
result.append(0)
A bit more efficient (but harder to understand for beginners) would be to create the temporary dict like this:
i = iter(example)
example_dict = dict(zip(i, i))
This works because zip uses lazy evaluation.

You can use dictionary like this:
d={'Company':0,'Revenue':0,'Income':0,'employee':0,'Facilities':0,'Stock':0}
given=[['Company','Mycompany'],['Income',2000],['employee',3000],['Facilities',4000],['Stock',5000]]
for i in given:
d[i[0]]=i[1]
ans=[]
for key,value in d.items():
ans.append([key,value])

Related

python multiple for loop question (pandas, dataframe)

idx_list = []
for idx, row in df_quries_copy.iterrows():
for brand in brand_name:
if row['user_query'].contains(brand):
idx_list.append(idx)
else:
continue
brand_name list looks like below
brand_name = ['Apple', 'Lenovo', Samsung', ... ]
I have df_queries data frame which has the query the user used
the table is looks like below
user_query
user_id
Apple Laptop
A
Lenovo 5GB
B
and also I have a brand name as a list
i want to find out the users who uses related with brand such as 'Apple laptop'
but when I run the script, I got a message saying that
'str' object has no attribute 'contains'
how am I supposed to do to use multiple for loop ?
Thank you in advance.
for brand in brand_name[:100]:
if len(copy_df[copy_df['user_query'].str.contains(brand)]) >0:
ls.append(copy_df[copy_df['user_query'].str.contains(brand)].index)
else:continue
I tried like answer but the whole dataframe came out in a sudden as a result
You can use df_quries_copy[df_quries_copy['user_query'].str.contrains(brand)].index to get index directly.
for brand in brand_name:
df_quries_copy[df_quries_copy['user_query'].str.contrains(brand)].index
Or in your code, use brand in row['user_query'] since row['user_query'] is a string value.

Python Panda's row selection

I've tried doing some searching, but I'm having troubles finding what I specifically need. I currently have this.
location = 'Location'
data = pd.read_csv('testbook.csv')
df = pd.DataFrame(data)
search = 'OR' # This will be replaced with an input
row = (df[df.eq(search).any(1)])
print(row)
Location = row.at[0, location]
print(Location)
This outputs this
row print out
Location City Price Etc
0 FL OR 50 123
Location print out
FL
this is the CSV information that it's pull the data from.
My main question and issue is what I'm trying to find out is at this specific line of code
Location = row.at[0, location]
for Location what I'm trying to do and see if possible is in the brackets [0, location].
I want it to automate in the future since for example if I need to find instead of 'OR' I need to find what data is in 'OR1'. The issue is that the [0] is related to the Row # hence this(this is the entire df).
Location City Price Etc
0 FL OR 50 123
1 FL1 OR1 501 1231
2 FL2 OR2 502 1232
I would have to manually change the code every single time which of course is unfeasible with what I'm trying to accomplish.
My main question is, how do I pull specific row numbers all the way on the left and take that output and make it a variable that I can input anywhere?
I'm having a bit of trouble figuring out what you are looking for but this is my best guess
import pandas as pd
data = {'Location':['FL', 'FL1', 'FL2'],
'City': ['OR', 'OR1', 'OR2'],
'Price':[50, 501, 502],
'Etc': [123,1231,1232]}
data = pd.DataFrame(data)
df = pd.DataFrame(data)
# Given search term -> find location
search = 'OR'
# Outputs 'FL'
df['Location'][df['City'] == search].any()

Populate 2 columns of dataframe at the same time using apply function [duplicate]

This question already has answers here:
Apply Python function to one pandas column and apply the output to multiple columns
(4 answers)
Closed 1 year ago.
I have some code which is (simplified) like this. The actual data lists are tens of thousands in size, not just 3.
There is a dictionary of staff which I make a DataFrame from.
There is a list of dictionary objects which contain additional staff information.
Also:
The staff list and the extra staff information (master_info_list) overlap but each has items that are unique to them.
The "index" I am using (StaffNumber) is actually prefixed with "SN_" in the extra staff information, so I can't compare them directly.
The duplication of StaffNumber in the master_info_list is intended (that's just how I receive it!).
What I want to do is populate two new columns into the dataframe which get their data from the extra staff information. I can do this by making 2 separate calls to get_department_and_manager, one for Department and one for Manager. That works. But, it "feels" like I should be able to take 2 fields from the output of get_department_and_manager and populate the dataframe in one go, but I'm struggling to get the syntax right. What is the correct syntax (if possible)? Also, iterating through the list the way I do (with a for loop) seems inefficient. Is there a better way?
The examples I have seen all seem to create new columns from existing data in the dataframe, or they are simple examples where no mashing of data is required before comparing the two "lists" (or list and dictionary).
import pandas as pd
def get_department_and_manager(row, master_list):
dept = 'bbb'
manager = 'aaa'
for i in master_list:
if i['StaffNumber'] == 'SN_' + row['StaffNumber']:
dept = i['data']['Department']
manager = i['data']['Manager']
break
return [dept, manager]
staff = {'Name': ['Alice', 'Bob', 'Dave'],
'StaffNumber': ['001', '002', '004']}
master_info_list = [{'StaffNumber': 'SN_001', 'data': {'StaffNumber': 'SN_001', 'Department': 'Sales', 'Manager': 'Luke' }},
{'StaffNumber': 'SN_002', 'data': {'StaffNumber': 'SN_002', 'Department': 'Marketing', 'Manager': 'Mary' }},
{'StaffNumber': 'SN_003', 'data': {'StaffNumber': 'SN_003', 'Department': 'IT', 'Manager': 'Neal' }}]
df = pd.DataFrame(data=staff)
df[['Department']['Manager']] = df.apply(get_department_and_manager, axis='columns', args=[master_info_list])
print(df)
If I understand you correctly, you can use .merge:
x = pd.DataFrame([v["data"] for v in master_info_list])
x["StaffNumber"] = x["StaffNumber"].str.split("_").str[-1]
print(df.merge(x, on="StaffNumber", how="left"))
Prints:
Name StaffNumber Department Manager
0 Alice 001 Sales Luke
1 Bob 002 Marketing Mary
2 Dave 004 NaN NaN

Identifying elements in a dataframe

I have a dictionary of dataframes called names_and_places in pandas that looks like the below.
names_and_places:
Alfred,,,
Date,F_1,F_2,Key
4/1/2020,1,4,NAN
4/2/2020,2,5,NAN
4/3/2020,3,6,"[USA,NY,NY, NY]"
Brett,,,
Date,F_1,F_2,Key
4/1/2020,202,404,NAN
4/2/2020,101,401,NAN
4/3/2020,102,403,"[USA,CT, Fairfield, Stamford] "
Claire,,,
Date,F_1,F_2,Key
4/1/2020,NAN,12,NAN
4/2/2020,NAN,45,NAN
4/3/2020,7,78,"[USA,CT, Fairfield, Darian] "
Dane,,,
Date,F_1,F_2,Key
4/1/2020,4,17,NAN
4/2/2020,5,18,NAN
4/3/2020,7,19,"[USA,CT, Bridgeport, New Haven] "
Edward,,,
Date,F_1,F_2,Key
4/1/2020,4,17,NAN
4/2/2020,5,18,NAN
4/3/2020,7,19,"[USA,CT, Bridgeport, Milford] "
(text above or image below)
The key column is either going to be NAN or of the form [Country, State, County, City], but can be of length 3 or 4 elements (sometimes County is absent). I need to find all the elements with a given element that is contained in a key. For instance if the element = "CT", the script returns Edward, Brett, Dane and Claire (order is not important). If the element = "Stamford" then only Brett is returned. However I am going about the identification process in a way that seems very inefficient. I basically have variables that iterate through each possible combination of State, County, City (all of which I am currently manually inputting into variables) to identify which names to extract like below:
country = 'USA' #this never needs to change
element = 'CT'
#These next two are actually in .txt files that I create once I am asked for
#a given breakdown but I would like to not have to manually input these
middle_node = ['Fairfield','Bridgeport']
terminal_nodes = ['Stamford','Darian','New Haven','Milford']
names=[]
for a in middle_node:
for b in terminal_nodes:
my_key = [country,key_of_interest,a,b]
for s in names_and_places:
for z in names_and_places[s]['Key']:
if my_key == z:
names.append(s)
#Note having "if my_key in names_and_places[s]['Key']": was causing sporadic failures for
#some reason
display(names)
Output:
Edward, Brett, Dane, Claire
What I would like is to be able to input only the variable element and this can either be a level 2 (State), 3 (County), or 4 (City) node. However short of adding additional for loops and going into the Key column, I don't know how to do this. The one benefit (for a novice like myself) is that the double for loops allow me to keep bucketing intact and makes it easier for people to see where names are coming from when that is also needed.
But is there a better way? For bonus points if there is a way to handle the case when the key_of_interest is 'NY' and values in the Keys column can be like [USA, NY, NY, NY] or [USA, NY, NY, Queens].
Edit: names_and_places is a dictionary with names as the index, so
display(names_and_places['Alfred'])
would be
Date,F_1,F_2,Key
4/1/2020,1,4,NAN
4/2/2020,2,5,NAN
4/3/2020,3,6,"[USA,NY,NY, NY]"
I do have the raw dataframe that has columns:
Date, Field name, Value, Names,
Where Field Name is either F_1, F_2 or Key and Value is the associated value of that field. I then pivot the data on Name with columns of Field Name to make my extraction easier.
Here's a way to do that in a somewhat more effective way. You start by building a single dataframe out of the dictionary, and then do the actual work on that dataframe.
single_df = pd.concat([df.assign(name = k) for k, df in names_and_places.items()])
single_df["Key"] = single_df.Key.replace("NAN", np.NaN)
single_df.dropna(inplace=True)
# Since the location is a string, we have to parse it.
location_df = pd.DataFrame(single_df.Key.str.replace(r"[\[\]]", "").str.split(",", expand=True))
location_df.columns = ["Country", "State", "County", "City"]
single_df = pd.concat([single_df, location_df], axis=1)
# this is where the actual query goes.
single_df[(single_df.Country == "USA") & (single_df.State == "CT")].name
The output is:
2 Brett
2 Claire
2 Dane
2 Edward
Name: name, dtype: object

Comparing four parameter in file [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
Most profitable element for every category
Must read my file and determinate most profitable element for every category in range of dates by user entries.
File:
Date|Category|Name|Price
05/01/2016|category6|Name8|4200
06/01/2016|category1|Name1|1000
07/01/2016|category2|Name2|1200
07/01/2016|category3|Name1|1000
07/01/2016|category1|Name2|1200
07/01/2016|category3|Name2|1200
07/01/2016|category2|Name1|1000
07/01/2016|category2|Name2|1200
07/01/2016|category2|Name2|1200
08/01/2016|category2|Name1|1000
09/01/2016|category4|Name7|3100
My file will be a lot bigger this is just example.
Start Date : 07/01/2016
End Date: 07/01/2016
For every date in that range program will print most profitable element for every category
Category 1:
07/01/2016|category1|Name2|1200
Name2 = 1200
Comparing prices >>> Most profitable is: Name2
Category 2:
07/01/2016|category2|Name2|1200
07/01/2016|category2|Name1|1000
07/01/2016|category2|Name2|1200
07/01/2016|category2|Name2|1200
Name1 = 1000
Name2 = 3600
Comparing prices >>> Most proftable: Name2
Category 3:
07/01/2016|category3|Name1|1000
07/01/2016|category3|Name2|1200
Name1: 1000
Name2: 1200
Comparing prices >>> Most profitable: Name2
Problem is i don't know how to compare these prices for categoris and names.
Also dates will be always on asending order.
I'm using both dictionary and lists.
INPUT AND OUTPUT:
Start Date : 07/01/2016
End Date: 07/01/2016
Category1; Most profitable is: Name2
Category2; Most profitable is: Name2
Category3; Most profitable is: Name2
in this case most profitable is Name2 for every category.
The following is not exactly what you need but should give you a fair idea to get going. I keep track of the most profitable name and value for combinations of date and category:
date_cat_profit_dict = {}
with open('data.txt') as f:
for line in f:
# split and store into variables.
# You could skip processing line
# if you are looking for specific date
date, category, name, profit = line.split('|')
# Convert to int for comparison
profit = int(profit)
# Key for storing into dict
composite_key = '{0}|{1}'.format(date, category)
# _ because we don't need the name right now
_, max_profit = (date_cat_profit_dict.
setdefault(composite_key, ('', 0)))
if max_profit < profit:
date_cat_profit_dict[composite_key] = (name, profit)
for composite_key, (name, profit) in date_cat_profit_dict.items():
print('Max for {0} : {1}, {2}'.format(composite_key, name, profit))

Categories