Imagine I have a table with one column, product_line. I want to create another column with the product_line_name, based on the product_line number. The information is taken from a dictionary.
I have declared a dictionary containing product lines and descriptions:
categories={
'Apparel': 99,
'Bikes': 32}
######## function that returns the category from a gl number
def get_cat(glnum,dic=categories):
for cat, List in dic.items():
if glnum in List:
return cat
return(0)
data['category']=data['product_line'].apply(lambda x: get_cat(x))
Works
However I cannot get it to work using method chaining:
tt = (tt
.assign(category = lambda d: get_cat(d.gl_product_line)))
It should be an error related to the Series but I am unsure why it wouldn't work since the lambda function should call the get_cat repeatedly for each row of the dataframe - which does not happen apparently.
Any ideas of how I could do this using the .assign in method chaining?
Related
I have defined a function which can get the history price for the coin:
def get_price(pair):
df=binance.fetch_ohlcv(pair,limit=258,timeframe="1d")
df=pd.DataFrame(df).rename(columns={0:"date",1:"open",2:"high",3:"low",4:"close",5:"volume"})
df["date"]=pd.to_datetime(df["date"],unit="ms")+pd.Timedelta(hours=8)
df.set_index("date",inplace=True)
return df
Then i want to use zip function to create two lists which can correspond to each other,so i can easily apply the function to get history data for each of the coin in the name list:
name=["btc","eth"]
symbol=["BTC/USDT","ETH/USDT"]
for name,pair in zip(name,symbol):
name=get_price(pair)
eth
But when i type "eth",to get the dataframe of "ETH/USDT", it gave me the error of "NameError: name 'eth' is not defined". The reason for me to do this is if i have a list of more than 10 pairs of coins, i dont want to use get_price function for each of them one by one in order to get the history data for all of them. can anyone help me to fix this errors? Thanks
You may have to do another function if you want to keep the things the way you have.
The first part (where you get the df seems fine since you have checked it). The issue is with the second part, where you are trying to run that function. Change is as below may help.
Example Run this & it will print ETH/USDT as output.
def func(x):
name=["btc","eth"]
symbol=["BTC/USDT","ETH/USDT"]
for name,pair in zip(name,symbol):
if name == x:
print(pair)
func('eth')
Similarly, if you want to run the function of getting the df, try something like this.
def func(x):
name=["btc","eth"]
symbol=["BTC/USDT","ETH/USDT"]
for name,pair in zip(name,symbol):
if name == x:
get_price(pair)
func('eth')
Use a dict to store the dataframes
Original Code:
name=["btc","eth"]
symbol=["BTC/USDT","ETH/USDT"]
for name,pair in zip(name,symbol):
name=get_price(pair)
The loop is trying to assign a pd.DataFrame object, to a string, which won't work.
I'm surprised that didn't cause a SyntaxError: can't assign to literal
Equivalently, name=get_price(pair) → 'eth' = pd.DataFrame()
From the notebook, I see can a list of names and symbols, for which you're trying to create individual dataframes.
The notebook shows, get_price returns a dataframe, when given a symbol.
Replacement Code:
The following code will create a dict of dataframes, where each string in name, is a key.
df_dict = dict()
for name, pair in zip(name, symbol):
df_dict[name] = get_price(pair)
df_dict['eth']
I am supposed to create a new pandas columns by comparing the values of this column ('% Renewable') to the median of the same column. And the result should make up a new column.
Of course I could use a for loop to do this. Though I am only at the beginning of my learning I want to make more use of the map, lambda etc methods.
Therefore I tried this:
def above(x,y):
if x>=y:
return 1
else:
return 0
def answer_ten():
Top15 = answer_one() #loads the dataframe and formats it
Median=Top15['% Renewable'].median()
Top15['HighRenew']=map(above, Top15['% Renewable'], Top15['% Renewable'].median()
# one try: list(map(above, (Top15['% Renewable'], Top15['% Renewable'].median())))
# one more try: [*map(above, (Top15['% Renewable'], Top15['% Renewable'].median()))]
return Top15['HighRenew']
But instead of the value I get an error: 'float' object is not iterable
I tried to alternatives that are liste in the comment line which I got from one other post here: Getting a map() to return a list in Python 3.x
By now I figured out a different one-line solution like this:
Top15['HighRenew']=(Top15['% Renewable']>=Top15['% Renewable'].median()).astype('int')
But I would like to know how I could have this differently (of course more lenghty) with Lambda or map() or filter(?).
Could anyone point me towards an alternative solution?
Thanks.
You probably just want above(Top15['% Renewable'], Top15['% Renewable'].median()). map takes a sequence of objects and applies the function to each one, but you only want to apply it once. The error you get is because the two values you pass in cannot be looped over.
So you basically want something like this:
Top15['HighRenew'] = Top15.apply(lambda df: int(df['% Renewable'] >= Top15['% Renewable'].median()))
Need some help in order to understand some things in Python and get dictionary method.
Let's suppose that we have some list of dictionaries and we need to make some data transformation (e.g. get all names from all dictionaries by key 'name'). Also I what to call some specific function func(data) if key 'name' was not found in specific dict.
def func(data):
# do smth with data that doesn't contain 'name' key in dict
return some_data
def retrieve_data(value):
return ', '.join([v.get('name', func(v)) for v in value])
This approach works rather well, but as far a I can see function func (from retrieve_data) call each time, even key 'name' is present in dictionary.
If you want to avoid calling func if the dictionary contains the value, you can use this:
def retrieve_data(value):
return ', '.join([v['name'] if 'name' in v else func(v) for v in value])
The reason func is called each time in your example is because it gets evaluated before get even gets called.
I am practically repeating the same code with only one minor change in each function, but an essential change.
I have about 4 functions that look similar to this:
def list_expenses(self):
explist = [(key,item.amount) for key, item in self.expensedict.iteritems()] #create a list from the dictionary, making a tuple of dictkey and object values
sortedlist = reversed(sorted(explist, key = lambda (k,a): (a))) #sort the list based on the value of the amount in the tuples of sorted list. Reverse to get high to low
for ka in sortedlist:
k, a = ka
print k , a
def list_income(self):
inclist = [(key,item.amount) for key, item in self.incomedict.iteritems()] #create a list from the dictionary, making a tuple of dictkey and object values
sortedlist = reversed(sorted(inclist, key = lambda (k,a): (a))) #sort the list based on the value of the amount in the tuples of sorted list. Reverse to get high to low
for ka in sortedlist:
k, a = ka
print k , a
I believe this is what they refer to as violating "DRY", however I don't have any idea how I can change this to be more DRYlike, as I have two seperate dictionaries(expensedict and incomedict) that I need to work with.
I did some google searching and found something called decorators, and I have a very basic understanding of how they work, but no clue how I would apply it to this.
So my request/question:
Is this a candidate for a decorator, and if a decorator is
necessary, could I get hint as to what the decorator should do?
Pseudocode is fine. I don't mind struggling. I just need something
to start with.
What do you think about using a separate function (as a private method) for list processing? For example, you may do the following:
def __list_processing(self, list):
#do the generic processing of your lists
def list_expenses(self):
#invoke __list_processing with self.expensedict as a parameter
def list_income(self):
#invoke __list_processing with self.incomedict as a parameter
It looks better since all the complicated processing is in a single place, list_expenses and list_income etc are the corresponding wrapper functions.
I need to sort the catalog results by multiple fields.
In my case, first sort by year, then by month. The year and month field are included in my custom content type (item_publication_year and item_publication_month respectively).
However, I'm not getting the results that I want. The year and month are not ordered at all. They should appear in descending order i.e. 2006, 2005, 2004 etc.
Below is my code:
def queryItemRepository(self):
"""
Perform a search returning items matching the criteria
"""
query = {}
portal_catalog = getToolByName(self, 'portal_catalog')
folder_path = '/'.join( self.context.getPhysicalPath() )
query['portal_type'] = "MyContentType"
query['path'] = {'query' : folder_path, 'depth' : 2 }
results = portal_catalog.searchResults(query)
# convert the results to a python list so we can use the sort function
results = list(results)
results.sort(lambda x, y : cmp((y['item_publication_year'], y['item_publication_year']),
(x['item_publication_month'], x['item_publication_month'])
))
return results
Anyone care to help?
A better bet is to use the key parameter for sorting:
results.sort(key=lambda b: (b.item_publication_year, b.item_publication_month))
You can also use the sorted() built-in function instead of using list(); it'll return a sorted list for you, it's the same amount of work for Python to first call list on the results, then sort, as it is to just call sorted:
results = portal_catalog.searchResults(query)
results = sorted(results, key=lambda b: (b.item_publication_year, b.item_publication_month))
Naturally, both item_publication_year and item_publication_month need to be present in the catalog metadata.
You can get multiple sorting straight from catalog search using advanced query see also its official docs