cascading dropdown shows dictionary/array - Python (Jupyter Notebook) - python

I am trying to create a series of dropdown menus on Jupyter Notebook.
The first few dropdown lists work fine, but it starts to go wonky on the last one, with the output being read as a string instead of a dictionary
Code as follows:
#Initialise GUI
from ipywidgets import interact,Dropdown
import ipywidgets as widgets
#Initialise Dictionaries
NAICSd = {"21-Mining,Quarrying,and oil and gas extraction(8)":0.08,
"11-Agriculture,forestry,fishing and hunting(9)":0.09,
"55-Management of companies and enterprises(10)":0.08,
"62-Healthcare and social assistance(10)":0.1,
"22-Utilities(14)":0.14,
"92-Public Administration(15)":0.15,
"54-Professional,scientific and technical services(19)":0.19,
"42-Wholesale trade(19)":0.19,
"31-Manufacturing(19)":0.19,
"32-Manufacturing(16)":0.16,
"33-Manufacturing(14)":0.14,
"81-Other Sevices Exept Public Administration(20)":0.2,
"71-Arts,Entertainment and Recreation(21)":0.21,
"72-Accommodation and Food Services(22)":0.22,
"44-Retail Trade(22)":0.22,
"45-Retail Trade(23)":0.23,
"23-Construction(23)":0.23,
"56-Administrative/Support & Waste Management/Remediation Service(24)":0.24,
"61-Educational Services(24)":0.24,
"51-Information (25)":0.25,
"48-Transportation and warehousing(27)":0.27,
"49-Transportation and warehousing(23)":0.23,
"52-Finance and Insurance(28)":0.28,
"53-Real Estate and rental and leasing(29)":0.29}
Stated = {"Undefined(28)":0.28,
"Alaska(10)":0.1,
"Alabama(18)":0.18,
"Arkansas(18)":0.18,
"Arizona(20)":0.2,
"California(20)":0.2,
"Colorado(18)":0.18,
"Connecticut(13)":0.13,
"D.C.(8)":0.08,
"Delaware(18)":0.18,
"Florida(28)":0.28,
"Georgia(27)":0.27,
"Hawaii(16)":0.16,
"Iowa(13)":0.13,
"Idaho(15)":0.15,
"Illonois(23)":0.23,
"Indiana(18)":0.18,
"Kansas(13)":0.13,
"Kentucy(20)":0.20,
"Louisiana(18)":0.18,
"Massachusetts(15)":0.15,
"Maryland(20)":0.20,
"Maine(9)":0.09,
"Michigan(23)":0.23,
"Minnesota(13)":0.13,
"Missouri(15)":0.15,
"Missisipi(15)":0.15,
"Montana(8)":0.08,
"North Carolina(20)":0.2,
"North Dakota(8)":0.08,
"Nebraska(15)":0.15,
"New Hampshire(10)":0.1,
"New Jersey(20)":0.2,
"New Mexico(10)":0.1,
"Nevada(23)":0.23,
"New York(20)":0.2,
"Ohio(18)":0.18,
"Oklahoma(13)":0.13,
"Oregon(17)":0.17,
"Pennsylvania(15)":0.15,
"Rhode Island(8)":0.08,
"South Carolina(20)":0.2,
"South Dakota(8)":0.08,
"Tennesee(23)":0.23,
"Texas(20)":0.2,
"Utah(18)":0.18,
"Virginia(19)":0.19,
"Vermont(8)":0.08,
"Washington(13)":0.13,
"Wisconsin(15)":0.15,
"West Virginia(15)":0.15,
"Wyoming(8)":0.18
}
Businessd = {"New Business <2 years(18.98)":0.1898,
"Normal Business >= 2 years (17.36)":0.1736}
BackedRealEstated = {"Yes(1.64)":0.0164,
"No(21.16)":0.2116}
IsRecessiond ={"Yes(32.21)":0.3221,
"No(16.63)":0.1663}
SBARatiod = {"More than 50%(25)":0.25,
"50% or less(40)":0.4}
NAICSList = Dropdown(options = NAICSd)
StateList = Dropdown(options = Stated)
BusinessList = Dropdown(options = Businessd)
BackedRealEstateList = Dropdown(options = BackedRealEstated)
IsRecessionList = Dropdown(options = IsRecessiond)
SBARatioList = Dropdown(option = SBARatiod)
#interact(Sector = NAICSList, US_State=StateList, New_Business=BusinessList, Real_Estate_Backed=BackedRealEstateList,
Recession = IsRecessionList, Guarantee_Ratio = SBARatioList)
def print_dropdown(Sector, US_State, New_Business, Real_Estate_Backed, Recession, Guarantee_Ratio):
NAICSList.options = NAICSd
StateList.options = Stated
BusinessList = Businessd
BackedRealEstateList = BackedRealEstated
IsRecessionList = IsRecessiond
Guarantee_Ratio = SBARatiod
print(Sector, US_State, New_Business, Real_Estate_Backed, Recession, Guarantee_Ratio)
Sector, US_State, New_Business, Real_Estate_Backed and Recession all return a float, which is what I want. But Guarantee_Ratio returns '{'More than 50%(25)': 0.25, '50% or less(40)': 0.4}'

I saw my problem. I confused the Guarantee_Ratio variable with the list
def print_dropdown(Sector, US_State, New_Business, Real_Estate_Backed, Recession, Guarantee_Ratio):
NAICSList = NAICSd
StateList = Stated
BusinessList = Businessd
BackedRealEstateList = BackedRealEstated
IsRecessionList = IsRecessiond
SBARatioList = SBARatiod
print(Sector, US_State, New_Business, Real_Estate_Backed, Recession, Guarantee_Ratio)]

Related

Trying to create a streamlit app that uses user-provided URLs to scrape and return a downloadable df

I'm trying to use this create_df() function in Streamlit to gather a list of user-provided URLs called "recipes" and loop through each URL to return a df I've labeled "res" towards the end of the function. I've tried several approaches with the Streamlit syntax but I just cannot get this to work as I'm getting this error message:
recipe_scrapers._exceptions.WebsiteNotImplementedError: recipe-scrapers exception: Website (h) not supported.
Have a look at my entire repo here. The main.py script works just fine once you've installed all requirements locally, but when I try running the same script with Streamlit syntax in the streamlit.py script I get the above error. Once you run streamlit run streamlit.py in your terminal and have a look at the UI I've create it should be quite clear what I'm aiming at, which is providing the user with a csv of all ingredients in the recipe URLs they provided for a convenient grocery shopping list.
Any help would be greatly appreciated!
def create_df(recipes):
"""
Description:
Creates one df with all recipes and their ingredients
Arguments:
* recipes: list of recipe URLs provided by user
Comments:
Note that ingredients with qualitative amounts e.g., "scheutje melk", "snufje zout" have been ommitted from the ingredient list
"""
df_list = []
for recipe in recipes:
scraper = scrape_me(recipe)
recipe_details = replace_measurement_symbols(scraper.ingredients())
recipe_name = recipe.split("https://www.hellofresh.nl/recipes/", 1)[1]
recipe_name = recipe_name.rsplit('-', 1)[0]
print("Processing data for "+ recipe_name +" recipe.")
for ingredient in recipe_details:
try:
df_temp = pd.DataFrame(columns=['Ingredients', 'Measurement'])
df_temp[str(recipe_name)] = recipe_name
ing_1 = ingredient.split("2 * ", 1)[1]
ing_1 = ing_1.split(" ", 2)
item = ing_1[2]
measurement = ing_1[1]
quantity = float(ing_1[0]) * 2
df_temp.loc[len(df_temp)] = [item, measurement, quantity]
df_list.append(df_temp)
except (ValueError, IndexError) as e:
pass
df = pd.concat(df_list)
print("Renaming duplicate ingredients e.g., Kruimige aardappelen, Voorgekookte halve kriel met schil -> Aardappelen")
ingredient_dict = {
'Aardappelen': ('Dunne frieten', 'Half kruimige aardappelen', 'Voorgekookte halve kriel met schil',
'Kruimige aardappelen', 'Roodschillige aardappelen', 'Opperdoezer Ronde aardappelen'),
'Ui': ('Rode ui'),
'Kipfilet': ('Kipfilet met tuinkruiden en knoflook'),
'Kipworst': ('Gekruide kipworst'),
'Kipgehakt': ('Gemengd gekruid gehakt', 'Kipgehakt met Mexicaanse kruiden', 'Half-om-halfgehakt met Italiaanse kruiden',
'Kipgehakt met tuinkruiden'),
'Kipshoarma': ('Kalkoenshoarma')
}
reverse_label_ing = {x:k for k,v in ingredient_dict.items() for x in v}
df["Ingredients"].replace(reverse_label_ing, inplace=True)
print("Assigning ingredient categories")
category_dict = {
'brood': ('Biologisch wit rozenbroodje', 'Bladerdeeg', 'Briochebroodje', 'Wit platbrood'),
'granen': ('Basmatirijst', 'Bulgur', 'Casarecce', 'Cashewstukjes',
'Gesneden snijbonen', 'Jasmijnrijst', 'Linzen', 'Maïs in blik',
'Parelcouscous', 'Penne', 'Rigatoni', 'Rode kidneybonen',
'Spaghetti', 'Witte tortilla'),
'groenten': ('Aardappelen', 'Aubergine', 'Bosui', 'Broccoli',
'Champignons', 'Citroen', 'Gele wortel', 'Gesneden rodekool',
'Groene paprika', 'Groentemix van paprika, prei, gele wortel en courgette',
'IJsbergsla', 'Kumato tomaat', 'Limoen', 'Little gem',
'Paprika', 'Portobello', 'Prei', 'Pruimtomaat',
'Radicchio en ijsbergsla', 'Rode cherrytomaten', 'Rode paprika', 'Rode peper',
'Rode puntpaprika', 'Rode ui', 'Rucola', 'Rucola en veldsla', 'Rucolamelange',
'Semi-gedroogde tomatenmix', 'Sjalot', 'Sperziebonen', 'Spinazie', 'Tomaat',
'Turkse groene peper', 'Veldsla', 'Vers basilicum', 'Verse bieslook',
'Verse bladpeterselie', 'Verse koriander', 'Verse krulpeterselie', 'Wortel', 'Zoete aardappel'),
'kruiden': ('Aïoli', 'Bloem', 'Bruine suiker', 'Cranberrychutney', 'Extra vierge olijfolie',
'Extra vierge olijfolie met truffelaroma', 'Fles olijfolie', 'Gedroogde laos',
'Gedroogde oregano', 'Gemalen kaneel', 'Gemalen komijnzaad', 'Gemalen korianderzaad',
'Gemalen kurkuma', 'Gerookt paprikapoeder', 'Groene currykruiden', 'Groentebouillon',
'Groentebouillonblokje', 'Honing', 'Italiaanse kruiden', 'Kippenbouillonblokje', 'Knoflookteen',
'Kokosmelk', 'Koreaanse kruidenmix', 'Mayonaise', 'Mexicaanse kruiden', 'Midden-Oosterse kruidenmix',
'Mosterd', 'Nootmuskaat', 'Olijfolie', 'Panko paneermeel', 'Paprikapoeder', 'Passata',
'Pikante uienchutney', 'Runderbouillonblokje', 'Sambal', 'Sesamzaad', 'Siciliaanse kruidenmix',
'Sojasaus', 'Suiker', 'Sumak', 'Surinaamse kruiden', 'Tomatenblokjes', 'Tomatenblokjes met ui',
'Truffeltapenade', 'Ui', 'Verse gember', 'Visbouillon', 'Witte balsamicoazijn', 'Wittewijnazijn',
'Zonnebloemolie', 'Zwarte balsamicoazijn'),
'vlees': ('Gekruide runderburger', 'Half-om-half gehaktballetjes met Spaanse kruiden', 'Kipfilethaasjes', 'Kipfiletstukjes',
'Kipgehaktballetjes met Italiaanse kruiden', 'Kippendijreepjes', 'Kipshoarma', 'Kipworst', 'Spekblokjes',
'Vegetarische döner kebab', 'Vegetarische kaasschnitzel', 'Vegetarische schnitzel'),
'zuivel': ('Ei', 'Geraspte belegen kaas', 'Geraspte cheddar', 'Geraspte grana padano', 'Geraspte oude kaas',
'Geraspte pecorino', 'Karnemelk', 'Kruidenroomkaas', 'Labne', 'Melk', 'Mozzarella',
'Parmigiano reggiano', 'Roomboter', 'Slagroom', 'Volle yoghurt')
}
reverse_label_cat = {x:k for k,v in category_dict.items() for x in v}
df["Category"] = df["Ingredients"].map(reverse_label_cat)
col = "Category"
first_col = df.pop(col)
df.insert(0, col, first_col)
df = df.sort_values(['Category', 'Ingredients'], ascending = [True, True])
print("Merging ingredients by row across all recipe columns using justify()")
gp_cols = ['Ingredients', 'Measurement']
oth_cols = df.columns.difference(gp_cols)
arr = np.vstack(df.groupby(gp_cols, sort=False, dropna=False).apply(lambda gp: justify(gp.to_numpy(), invalid_val=np.NaN, axis=0, side='up')))
# Reconstruct DataFrame
# Remove entirely NaN rows based on the non-grouping columns
res = (pd.DataFrame(arr, columns=df.columns)
.dropna(how='all', subset=oth_cols, axis=0))
res = res.fillna(0)
res['Total'] = res.drop(['Ingredients', 'Measurement'], axis=1).sum(axis=1)
res=res[res['Total'] !=0] #To drop rows that are being duplicated with 0 for some reason; will check later
print("Processing complete!")
return res
Your function create_df needs a list as an argument but st.text_input returs always a string.
In your streamlit.py, replace this df_download = create_df(recs) by this df_download = create_df([recs]). But if you need to handle multiple urls, you should use str.split like this :
def create_df(recipes):
recipes = recipes.split(",") # <--- add this line to make a list from the user-input
### rest of the code ###
if download:
df_download = create_df(recs)
# Output :

BeautifulSoup Data Scraping : Unable to fetch correct information from the page

I am trying to scrape data from:-
https://www.canadapharmacy.com/
below are a few pages that I need to scrape:-
https://www.canadapharmacy.com/products/abilify-tablet
https://www.canadapharmacy.com/products/accolate
https://www.canadapharmacy.com/products/abilify-mt
I need all the information from the page. I wrote the below code:-
base_url = 'https://www.canadapharmacy.com'
data = []
for i in tqdm(range(len(medicine_url))):
r = requests.get(base_url+medicine_url[i])
soup = BeautifulSoup(r.text,'lxml')
# Scraping medicine Name
try:
main_name = (soup.find('h1',{"class":"mn"}).text.lstrip()).rstrip()
except:
main_name = None
try:
sec_name = (soup.find('div',{"class":"product-name"}).find('h3').text.lstrip()).rstrip()
except:
sec_name = None
try:
generic_name = (soup.find('div',{"class":"card product generic strength equal"}).find('div').find('h3').text.lstrip()).rstrip()
except:
generic_name = None
# Description
try:
des1 = soup.find('div',{"class":"answer expanded"}).find_all('p')[1].text
except:
des1 = ''
try:
des2 = soup.find('div',{"class":"answer expanded"}).find('ul').text
except:
des2 = ''
try:
des3 = soup.find('div',{"class":"answer expanded"}).find_all('p')[2].text
except:
des3 = ''
desc = (des1+des2+des3).replace('\n',' ')
#Directions
try:
dir1 = soup.find('div',{"class":"answer expanded"}).find_all('h4')[1].text
except:
dir1 = ''
try:
dir2 = soup.find('div',{"class":"answer expanded"}).find_all('p')[5].text
except:
dir2 = ''
try:
dir3 = soup.find('div',{"class":"answer expanded"}).find_all('p')[6].text
except:
dir3 = ''
try:
dir4 = soup.find('div',{"class":"answer expanded"}).find_all('p')[7].text
except:
dir4 = ''
directions = dir1+dir2+dir3+dir4
#Ingredients
try:
ing = soup.find('div',{"class":"answer expanded"}).find_all('p')[9].text
except:
ing = None
#Cautions
try:
c1 = soup.find('div',{"class":"answer expanded"}).find_all('h4')[3].text
except:
c1 = None
try:
c2 = soup.find('div',{"class":"answer expanded"}).find_all('p')[11].text
except:
c2 = ''
try:
c3 = soup.find('div',{"class":"answer expanded"}).find_all('p')[12].text #//div[#class='answer expanded']//p[2]
except:
c3 = ''
try:
c4 = soup.find('div',{"class":"answer expanded"}).find_all('p')[13].text
except:
c4 = ''
try:
c5 = soup.find('div',{"class":"answer expanded"}).find_all('p')[14].text
except:
c5 = ''
try:
c6 = soup.find('div',{"class":"answer expanded"}).find_all('p')[15].text
except:
c6 = ''
caution = (c1+c2+c3+c4+c5+c6).replace('\xa0','')
#Side Effects
try:
se1 = soup.find('div',{"class":"answer expanded"}).find_all('h4')[4].text
except:
se1 = ''
try:
se2 = soup.find('div',{"class":"answer expanded"}).find_all('p')[18].text
except:
se2 = ''
try:
se3 = soup.find('div',{"class":"answer expanded"}).find_all('ul')[1].text
except:
se3 = ''
try:
se4 = soup.find('div',{"class":"answer expanded"}).find_all('p')[19].text
except:
se4 = ''
try:
se5 = soup.find('div',{"class":"post-author-bio"}).text
except:
se5 = ''
se = (se1 + se2 + se3 + se4 + se5).replace('\n',' ')
for j in soup.find('div',{"class":"answer expanded"}).find_all('h4'):
if 'Product Code' in j.text:
prod_code = j.text
#prod_code = soup.find('div',{"class":"answer expanded"}).find_all('h4')[5].text #//div[#class='answer expanded']//h4
pharma = {"primary_name":main_name,
"secondary_name":sec_name,
"Generic_Name":generic_name,
"Description":desc,
"Directions":directions,
"Ingredients":ing,
"Caution":caution,
"Side_Effects":se,
"Product_Code":prod_code}
data.append(pharma)
But, each page is having different positions for the tags hence not giving correct data. So, I tried:-
soup.find('div',{"class":"answer expanded"}).find_all('h4')
which gives me the output:-
[<h4>Description </h4>,
<h4>Directions</h4>,
<h4>Ingredients</h4>,
<h4>Cautions</h4>,
<h4>Side Effects</h4>,
<h4>Product Code : 5513 </h4>]
I want to create a data frame where the description contains all the information given in the description, directions contain all the information of directions given on the web page.
for i in soup.find('div',{"class":"answer expanded"}).find_all('h4'):
if 'Description' in i.text:
print(soup.find('div',{"class":"answer expanded"}).findAllNext('p'))
but it prints all the after the soup.find('div',{"class":"answer expanded"}).find_all('h4'). but I want only the tags are giving me the description of the medicine and no others.
Can anyone suggest how to do this? Also, how to scrape the rate table from the page as it gives me values in unappropriate fashion?
You can try the next working example:
import requests
from bs4 import BeautifulSoup
import pandas as pd
data = []
r = requests.get('https://www.canadapharmacy.com/products/abilify-tablet')
soup = BeautifulSoup(r.text,"lxml")
try:
card = ''.join([x.get_text(' ',strip=True) for x in soup.select('div.answer.expanded')])
des = card.split('Directions')[0].replace('Description','')
#print(des)
drc = card.split('Directions')[1].split('Ingredients')[0]
#print(drc)
ingre= card.split('Directions')[1].split('Ingredients')[1].split('Cautions')[0]
#print(ingre)
cau=card.split('Directions')[1].split('Ingredients')[1].split('Cautions')[1].split('Side Effects')[0]
#print(cau)
se= card.split('Directions')[1].split('Ingredients')[1].split('Cautions')[1].split('Side Effects')[1]
#print(se)
except:
pass
data.append({
'Description':des,
'Directions':drc,
'Ingredients':ingre,
'Cautions':cau,
'Side Effects':se
})
print(data)
# df = pd.DataFrame(data)
# print(df)
Output:
[{'Description': " Abilify Tablet (Aripiprazole) Abilify (Aripiprazole) is a medication prescribed to treat or manage different conditions, including: Agitation associated with schizophrenia or bipolar mania (injection formulation only) Irritability associated with autistic disorder Major depressive disorder , adjunctive treatment Mania and mixed episodes associated with Bipolar I disorder Tourette's disorder Schizophrenia Abilify works by activating different neurotransmitter receptors located in brain cells. Abilify activates D2 (dopamine) and 5-HT1A (serotonin) receptors and blocks 5-HT2A (serotonin) receptors. This combination of receptor activity is responsible for the treatment effects of Abilify. Conditions like schizophrenia, major depressive disorder, and bipolar disorder are caused by neurotransmitter imbalances in the brain. Abilify helps to correct these imbalances and return the normal functioning of neurons. ", 'Directions': ' Once you are prescribed and buy Abilify, then take Abilify exactly as prescribed by your
doctor. The dose will vary based on the condition that you are treating. The starting dose of Abilify ranges from 2-15 mg once daily, and the recommended dose for most conditions is between 5-15 mg once daily. The maximum dose is 30 mg once daily. Take Abilify with or without food. ', 'Ingredients': ' The active ingredient in Abilify medication is aripiprazole . ', 'Cautions': ' Abilify and other antipsychotic medications have been associated with an increased risk of death in elderly patients with dementia-related psychosis. When combined with other dopaminergic agents, Abilify can increase the risk of neuroleptic malignant syndrome. Abilify can cause metabolic changes and in some cases can induce high blood sugar in people with and without diabetes . Abilify can also weight gain and increased risk of dyslipidemia. Blood glucose should be monitored while taking Abilify. Monitor for low blood pressure and heart rate while taking Abilify; it can cause orthostatic hypertension which may lead to dizziness or fainting. Use with caution in patients with a history of seizures. ', 'Side Effects': ' The side effects of Abilify vary greatly depending
on what condition is being treated, what other medications are being used concurrently, and what dose is being taken. Speak with your doctor or pharmacist for a full list of side effects that apply to you. Some of the most common side effects include: Akathisia Blurred vision Constipation Dizziness Drooling Extrapyramidal disorder Fatigue Headache Insomnia Nausea Restlessness Sedation Somnolence Tremor Vomiting Buy Abilify online from Canada Pharmacy . Abilify can be purchased online with a valid prescription from a doctor. About Dr. Conor Sheehy (Page Author) Dr. Sheehy (BSc Molecular Biology, PharmD) works a clinical pharmacist specializing in cardiology, oncology, and ambulatory care. He’s a board-certified pharmacotherapy specialist (BCPS), and his experience working one-on-one with patients to fine tune their medication and therapy plans for optimal results makes him a valuable subject matter expert for our pharmacy. Read More.... IMPORTANT NOTE: The above information is intended to increase awareness of health information
and does not suggest treatment or diagnosis. This information is not a substitute for individual medical attention and should not be construed to indicate that use of the drug is safe, appropriate, or effective for you. See your health care professional for medical advice and treatment. Product Code : 5513'}]

Python float print in real time

Is there any way to print the numbers in real times instead of printing them one by one? I have 6 different countries
china = 1399746872
india = 1368138206
USA = 327826334
Japan = 12649000
Russia = 146804372
Sweden = 10379295
I change this numbers in the script but how do I print them so I see them change?
!EDITED!
I want to kind of overwrite this list everytime it prints so I see the numbers go up
Countries = []
china = 1399746872
india = 1368138206
USA = 327826334
Japan = 12649000
Russia = 146804372
Sweden = 10379295
Countries.append(china)
Countries.append(india)
Countries.append(USA)
Countries.append(Japan)
Countries.append(Russia)
Countries.append(Sweden)
print(Countries)
you could use os.system("cls") to clear the console.
I made a little demo:
import time, sys, json, os
from random import randint
vals = {
"china": 1399746872,
"india": 1368138206,
"USA": 327826334,
"Japan": 12649000,
"Russia": 146804372,
"Sweden": 10379295
}
for _ in range(100):
# clear console
os.system("cls")
# print values
[print(f"{k}: {v}") for k, v in vals.items()]
# renew values with random generated integers
vals = {k:randint(0, 1000000) for k in vals}
# sleep 5s
time.sleep(5)

How do I add a list of brand names to data frame or if it does not appear add 'None'?

brand_names = ["Tommy Hilfiger", "Tommy Jeans", "Hugo", "Hugo Boss", "Boss", "HUGO", "Lacoste", "lacoste",
"Adidas",
"adidas", "Armani", "The North Face", "Paul Smith", "Vivienne Westwood", "Levis", "Kent And Curwen",
"Nike", "BOSS", "Calvin Klein", "Kent and Curwen",
"Pretty Green", "Lyle And Scott", "Moschino", "Converse", "Timberland", "Ralph Lauren", "Fred Perry",
"True Religion",
"Luke 1977", "Belstaff", "Paul And Shark", "CP Company", "Money Tri Wheel", "Money Sig", "Gant","Versace"]
image = []
title = []
price = []
link = []
shop = []
brand = []
mainline_t_shirt(soup, brand_names)
mainline = pd.DataFrame({
'Images': image,
'Titles': title,
'Prices': price,
'link': link,
'Website': 'mainlinemenswear',
'brand': brand
})
# Image
(code) 63 elements- code working
# Title
(code) 63 elements- code working
# Price
(code) 63 elements- code working
# link
(code) 63 elements- code working
# website
(code) 63 elements- code working
#brand
**for container5 in title_div:
for temp in brand_names_in:
if temp in container5.text:
print(temp)
brand.append(temp)
if temp not in container5.text:
brand.append("None")**
The data frame 'mainline' has 63 rows. The issue is the 'brand' column. Everytime I run this code I get this error
raise ValueError("arrays must all be same length")
ValueError: arrays must all be same length
This is because not all the brands(Nike, Adidas etc) are in the container.text. How can I add the string "None" to the row instead of adding the brand?
The code that needs to be changed is in between the two stars.
The problem is that for each container5, you're looping over all your brands. Out of the 20 or so brands, only one (if any) will be matched with container5.text. Every other brand will mismatch and as a result, brand.append("None") is executed. In total about 20 × len(title_div). Which makes the brand list far too large, with lots of "None"s (which you could see if you print(brand) somewhere inside or directly after the loop).
You can use a for-else here:
for container5 in title_div:
for temp in brand_names_in:
if temp in container5.text:
brand.append(temp)
break
else: # not broken out of the inner for-loop, thus there was no match
brand.append("None")

Looping through tree to create a dictionary_NLTK

I'm new to Python and trying to solve a problem looping through a tree in NLTK. I'm stuck on the final output, it is not entirely correct.
I'm looking to create a dictionary with 2 variables and if there is no quantity then add value 1.
This is the desired final output:
{ quantity =1, food =pizza }, {quantity =1, food = coke }
,{quantity =2, food = beers}, {quantity =1, food = sandwich }
Here is my code, any help is much appreaciated!
'''
import nltk as nltk
nltk.download()
grammar = r""" Food:{<DT>?<VRB>?<NN.*>+}
}<>+{
Quantity: {<CD>|<JJ>|<DT>}
"""
rp = nltk.RegexpParser(grammar)
def RegPar(menu):
grammar = r"""Food:{<DT>?<VRB>?<NN.*>+}
}<>+{
Quantity: {<CD>|<JJ>|<DT>}
"""
rp = nltk.RegexpParser(grammar)
output = rp.parse(menu)
return(output)
Sentences = [ 'A pizza margherita', 'one coke y 2 beers', 'Sandwich']
tagged_array =[]
output_array =[]
for s in Sentences:
tokens = nltk.word_tokenize(s)
tags = nltk.pos_tag(tokens)
tagged_array.append(tags)
output = rp.parse(tags)
output_array.append(output)
print(output)
dat = []
tree = RegPar(output_array)
for subtree in tree.subtrees():
if subtree.label() == 'Food' or subtree.label() =='Quantity':
dat.append({(subtree.label(),subtree.leaves()[0][0])})
print(dat)
##[{('Food', 'A')}, {('Quantity', 'one')}, {('Food', 'coke')}, {('Quantity', '2')}, {('Food', 'beers')}, {('Food', 'Sandwich')}]*
'''

Categories