I'm trying to use this create_df() function in Streamlit to gather a list of user-provided URLs called "recipes" and loop through each URL to return a df I've labeled "res" towards the end of the function. I've tried several approaches with the Streamlit syntax but I just cannot get this to work as I'm getting this error message:
recipe_scrapers._exceptions.WebsiteNotImplementedError: recipe-scrapers exception: Website (h) not supported.
Have a look at my entire repo here. The main.py script works just fine once you've installed all requirements locally, but when I try running the same script with Streamlit syntax in the streamlit.py script I get the above error. Once you run streamlit run streamlit.py in your terminal and have a look at the UI I've create it should be quite clear what I'm aiming at, which is providing the user with a csv of all ingredients in the recipe URLs they provided for a convenient grocery shopping list.
Any help would be greatly appreciated!
def create_df(recipes):
"""
Description:
Creates one df with all recipes and their ingredients
Arguments:
* recipes: list of recipe URLs provided by user
Comments:
Note that ingredients with qualitative amounts e.g., "scheutje melk", "snufje zout" have been ommitted from the ingredient list
"""
df_list = []
for recipe in recipes:
scraper = scrape_me(recipe)
recipe_details = replace_measurement_symbols(scraper.ingredients())
recipe_name = recipe.split("https://www.hellofresh.nl/recipes/", 1)[1]
recipe_name = recipe_name.rsplit('-', 1)[0]
print("Processing data for "+ recipe_name +" recipe.")
for ingredient in recipe_details:
try:
df_temp = pd.DataFrame(columns=['Ingredients', 'Measurement'])
df_temp[str(recipe_name)] = recipe_name
ing_1 = ingredient.split("2 * ", 1)[1]
ing_1 = ing_1.split(" ", 2)
item = ing_1[2]
measurement = ing_1[1]
quantity = float(ing_1[0]) * 2
df_temp.loc[len(df_temp)] = [item, measurement, quantity]
df_list.append(df_temp)
except (ValueError, IndexError) as e:
pass
df = pd.concat(df_list)
print("Renaming duplicate ingredients e.g., Kruimige aardappelen, Voorgekookte halve kriel met schil -> Aardappelen")
ingredient_dict = {
'Aardappelen': ('Dunne frieten', 'Half kruimige aardappelen', 'Voorgekookte halve kriel met schil',
'Kruimige aardappelen', 'Roodschillige aardappelen', 'Opperdoezer Ronde aardappelen'),
'Ui': ('Rode ui'),
'Kipfilet': ('Kipfilet met tuinkruiden en knoflook'),
'Kipworst': ('Gekruide kipworst'),
'Kipgehakt': ('Gemengd gekruid gehakt', 'Kipgehakt met Mexicaanse kruiden', 'Half-om-halfgehakt met Italiaanse kruiden',
'Kipgehakt met tuinkruiden'),
'Kipshoarma': ('Kalkoenshoarma')
}
reverse_label_ing = {x:k for k,v in ingredient_dict.items() for x in v}
df["Ingredients"].replace(reverse_label_ing, inplace=True)
print("Assigning ingredient categories")
category_dict = {
'brood': ('Biologisch wit rozenbroodje', 'Bladerdeeg', 'Briochebroodje', 'Wit platbrood'),
'granen': ('Basmatirijst', 'Bulgur', 'Casarecce', 'Cashewstukjes',
'Gesneden snijbonen', 'Jasmijnrijst', 'Linzen', 'Maïs in blik',
'Parelcouscous', 'Penne', 'Rigatoni', 'Rode kidneybonen',
'Spaghetti', 'Witte tortilla'),
'groenten': ('Aardappelen', 'Aubergine', 'Bosui', 'Broccoli',
'Champignons', 'Citroen', 'Gele wortel', 'Gesneden rodekool',
'Groene paprika', 'Groentemix van paprika, prei, gele wortel en courgette',
'IJsbergsla', 'Kumato tomaat', 'Limoen', 'Little gem',
'Paprika', 'Portobello', 'Prei', 'Pruimtomaat',
'Radicchio en ijsbergsla', 'Rode cherrytomaten', 'Rode paprika', 'Rode peper',
'Rode puntpaprika', 'Rode ui', 'Rucola', 'Rucola en veldsla', 'Rucolamelange',
'Semi-gedroogde tomatenmix', 'Sjalot', 'Sperziebonen', 'Spinazie', 'Tomaat',
'Turkse groene peper', 'Veldsla', 'Vers basilicum', 'Verse bieslook',
'Verse bladpeterselie', 'Verse koriander', 'Verse krulpeterselie', 'Wortel', 'Zoete aardappel'),
'kruiden': ('Aïoli', 'Bloem', 'Bruine suiker', 'Cranberrychutney', 'Extra vierge olijfolie',
'Extra vierge olijfolie met truffelaroma', 'Fles olijfolie', 'Gedroogde laos',
'Gedroogde oregano', 'Gemalen kaneel', 'Gemalen komijnzaad', 'Gemalen korianderzaad',
'Gemalen kurkuma', 'Gerookt paprikapoeder', 'Groene currykruiden', 'Groentebouillon',
'Groentebouillonblokje', 'Honing', 'Italiaanse kruiden', 'Kippenbouillonblokje', 'Knoflookteen',
'Kokosmelk', 'Koreaanse kruidenmix', 'Mayonaise', 'Mexicaanse kruiden', 'Midden-Oosterse kruidenmix',
'Mosterd', 'Nootmuskaat', 'Olijfolie', 'Panko paneermeel', 'Paprikapoeder', 'Passata',
'Pikante uienchutney', 'Runderbouillonblokje', 'Sambal', 'Sesamzaad', 'Siciliaanse kruidenmix',
'Sojasaus', 'Suiker', 'Sumak', 'Surinaamse kruiden', 'Tomatenblokjes', 'Tomatenblokjes met ui',
'Truffeltapenade', 'Ui', 'Verse gember', 'Visbouillon', 'Witte balsamicoazijn', 'Wittewijnazijn',
'Zonnebloemolie', 'Zwarte balsamicoazijn'),
'vlees': ('Gekruide runderburger', 'Half-om-half gehaktballetjes met Spaanse kruiden', 'Kipfilethaasjes', 'Kipfiletstukjes',
'Kipgehaktballetjes met Italiaanse kruiden', 'Kippendijreepjes', 'Kipshoarma', 'Kipworst', 'Spekblokjes',
'Vegetarische döner kebab', 'Vegetarische kaasschnitzel', 'Vegetarische schnitzel'),
'zuivel': ('Ei', 'Geraspte belegen kaas', 'Geraspte cheddar', 'Geraspte grana padano', 'Geraspte oude kaas',
'Geraspte pecorino', 'Karnemelk', 'Kruidenroomkaas', 'Labne', 'Melk', 'Mozzarella',
'Parmigiano reggiano', 'Roomboter', 'Slagroom', 'Volle yoghurt')
}
reverse_label_cat = {x:k for k,v in category_dict.items() for x in v}
df["Category"] = df["Ingredients"].map(reverse_label_cat)
col = "Category"
first_col = df.pop(col)
df.insert(0, col, first_col)
df = df.sort_values(['Category', 'Ingredients'], ascending = [True, True])
print("Merging ingredients by row across all recipe columns using justify()")
gp_cols = ['Ingredients', 'Measurement']
oth_cols = df.columns.difference(gp_cols)
arr = np.vstack(df.groupby(gp_cols, sort=False, dropna=False).apply(lambda gp: justify(gp.to_numpy(), invalid_val=np.NaN, axis=0, side='up')))
# Reconstruct DataFrame
# Remove entirely NaN rows based on the non-grouping columns
res = (pd.DataFrame(arr, columns=df.columns)
.dropna(how='all', subset=oth_cols, axis=0))
res = res.fillna(0)
res['Total'] = res.drop(['Ingredients', 'Measurement'], axis=1).sum(axis=1)
res=res[res['Total'] !=0] #To drop rows that are being duplicated with 0 for some reason; will check later
print("Processing complete!")
return res
Your function create_df needs a list as an argument but st.text_input returs always a string.
In your streamlit.py, replace this df_download = create_df(recs) by this df_download = create_df([recs]). But if you need to handle multiple urls, you should use str.split like this :
def create_df(recipes):
recipes = recipes.split(",") # <--- add this line to make a list from the user-input
### rest of the code ###
if download:
df_download = create_df(recs)
# Output :
I am attempting to create a new column based on conditional logic from another column. I've tried searching and haven't been able to find anything that addresses my issue.
I have imported a CSV to a pandas dataframe, it is structured like this. I edited a few of the descriptions for this post, but other than that everything is the same:
#code used to load dataframe:
df = pd.read_csv(r"C:\filepath\filename.csv")
#output from print(type(df)):
#class 'pandas.core.frame.DataFrame'
#output from print(df.columns.values):
#['Type' 'Trans Date' 'Post Date' 'Description' 'Amount']
#output from print(df.columns):
Index(['Type', 'Trans Date', 'Post Date', 'Description', 'Amount'], dtype='object')
#output from print
Type Trans Date Post Date Description Amount
0 Sale 01/25/2018 01/25/2018 DESC1 -13.95
1 Sale 01/25/2018 01/26/2018 AMAZON MKTPLACE PMTS -6.99
2 Sale 01/24/2018 01/25/2018 SUMMIT BISTRO -5.85
3 Sale 01/24/2018 01/25/2018 DESC3 -9.13
4 Sale 01/24/2018 01/26/2018 DYNAMIC VENDING INC -1.60
I then write the following code:
def criteria(row):
if row.Description.find('SUMMIT BISTRO')>0:
return 'Lunch'
elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
return 'Amazon'
elif row.Description.find('Aldi')>0:
return 'Groceries'
else:
return 'NotWorking'
df['Category'] = df.apply(criteria, axis=0)
Errors:
Traceback (most recent call last):
File "C:\Users\Test_BankReconcile2.py", line 44, in <module>
df['Category'] = df.apply(criteria, axis=0)
File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
ignore_failures=ignore_failures)
File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
results[i] = func(v)
File "C:\Users\OneDrive\Documents\finance\Test_BankReconcile2.py", line 35, in criteria
if row.Description.find('SUMMIT BISTRO')>0:
File "C:\Users\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3081, in __getattr__
return object.__getattribute__(self, name)
AttributeError: ("'Series' object has no attribute 'Description'", 'occurred at index Type')
I'm able to successfully execute this same sort of command on a very similar csv file from a different bank (this example is from my credit card), so I don't know what is going on but possibly I need to define the dataframe in some way that I'm not doing? Or possibly something else that is very obvious that I'm not seeing? Thank you all in advance for helping me solve this.
Yes, your problem is that you need to pass axis=1 to .apply:
In [52]: df
Out[52]:
Type Trans Date Post Date Description Amount
0 Sale 01/25/2018 01/25/2018 DESC1 -13.95
1 Sale 01/25/2018 01/26/2018 AMAZON MKTPLACE PMTS -6.99
2 Sale 01/24/2018 01/25/2018 SUMMIT BISTRO -5.85
3 Sale 01/24/2018 01/25/2018 DESC3 -9.13
4 Sale 01/24/2018 01/26/2018 DYNAMIC VENDING INC -1.60
In [53]: def criteria(row):
...: if row.Description.find('SUMMIT BISTRO')>0:
...: return 'Lunch'
...: elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
...: return 'Amazon'
...: elif row.Description.find('Aldi')>0:
...: return 'Groceries'
...: else:
...: return 'NotWorking'
...:
In [54]: df.apply(criteria, axis=1)
Out[54]:
0 NotWorking
1 NotWorking
2 NotWorking
3 NotWorking
4 NotWorking
dtype: object
The second problem is you have a logic error, instead of .find(x) > 0 you want .find(x) >= 0, or better yet, some_string in some_other_string
For more general solution omit Description in loop and instead use df['Description'].apply(criteria) with Series.apply.
Also for check substring in string use in.
def criteria(row):
if 'SUMMIT BISTRO' in row:
return 'Lunch'
elif 'AMAZON MKTPLACE PMTS' in row:
return 'Amazon'
elif 'Aldi' in row:
return 'Groceries'
else:
return 'NotWorking'
df['Category'] = df['Description'].apply(criteria)
I have the pandas df below, with a few columns, one of which is ip_addresses
df.head()
my_id someother_id created_at ip_address state
308074 309115 2859690 2014-09-26 22:55:20 67.000.000.000 rejected
308757 309798 2859690 2014-09-30 04:16:56 173.000.000.000 approved
309576 310619 2859690 2014-10-02 20:13:12 173.000.000.000 approved
310347 311390 2859690 2014-10-05 04:16:01 173.000.000.000 approved
311784 312827 2859690 2014-10-10 06:38:39 69.000.000.000 approved
For each ip_address I'm trying to return the description, city, country
I wrote a function below and tried to apply it
from ipwhois import IPWhois
def returnIP(ip) :
obj = IPWhois(str(ip))
result = obj.lookup_whois()
description = result["nets"][len(result["nets"]) - 1 ]["description"]
city = result["nets"][len(result["nets"]) - 1 ]["city"]
country = result["nets"][len(result["nets"]) - 1 ]["country"]
return [description, city, country]
# ---
suspect['ipwhois'] = suspect['ip_address'].apply(returnIP)
My problem is that this returns a list, I want three separate columns.
Any help is greatly appreciated. I'm new to Pandas/Python so if there's a better way to write the function and use Pandas would be very helpful.
from ipwhois import IPWhois
def returnIP(ip) :
obj = IPWhois(str(ip))
result = obj.lookup_whois()
description = result["nets"][len(result["nets"]) - 1 ]["description"]
city = result["nets"][len(result["nets"]) - 1 ]["city"]
country = result["nets"][len(result["nets"]) - 1 ]["country"]
return (description, city, country)
suspect['description'], suspect['city'], suspect['country'] = \
suspect['ip_address'].apply(returnIP)
I was able to solve it with another stackoverflow solution
for n,col in enumerate(cols):
suspect[col] = suspect['ipwhois'].apply(lambda ipwhois: ipwhois[n])
If there's a more elegant way to solve this, please share!
I would like to write a function in python3 to parse a string based on the input list element. The following function works but is there a better way to do it?
def func(oStr, s_s):
if not oStr:
return s_s
elif '' in s_s:
return [oStr]
else:
for x in s_s:
st = oStr.find(x)
end = st + len(x)
res.append(oStr[st:end])
oStr = oStr.replace(x, '')
if oStr:
res.append(oStr)
return res
case 1
o_str = 'ABCNew York - Address'
s_str = ['ABC']
return ['ABC', 'New York - Address']
case 2
o_str = 'New York Friend Add | NumberABCNewYork Name | FirstName Last Name | time : Jan-31-2017'
s_str = ['New York Friend Add | Number', 'ABC', 'NewYork Name | FirstName Last Name | time: Jan-31-2017']
return ['New York Friend Add | Number', 'ABC', 'NewYork Name | FirstName Last Name | time: Jan-31-2017']
case 3
o_str = '-'
s_str = ['']
return ['-']
case 4
o_str = '1'
s_str = ['']
return ['1']
case 5
o_str = '1234Family-Name'
s_str = ['1234']
return ['1234', 'Family-Name']
case 6
o_str = ''
s_str = ['12345667', 'name']
return ['12345667', 'name']
To use a string like an array, you would just program it in the same way. For example
myStr="Hello, World!"
myString.insert(len(myString),"""Your character here""")
For your purposes .append() would work exactly the same way. Hope I helped.