Printing a list in dataframe format - python

I am trying to print a two-dimensional list in pandas data frame format.
The result of printing the data as Pandas Data frame
Pandas Data Frame
My Code
cols = ["prod_id", "description", "cost"]
data = [["p01", "Domaxx Geniune Leather RFID Blocking Trifold Wallets-Made Genuine Soft Leather Large Classic Pocket Wallet,Holding 9 Cards Photo ID Coin Pocket and 2 Note compartments-Black Surface/Orange Inner", "10.00"],
["p02","Neck Wallet, Passport Holder with RFID Blocking Anti-Theft Travel Pouch Security Wallet for Credit Cards and Passport - Silver","15.00"]]
temp_str = ''
for item in cols :
temp_str += "\t " + item
print(temp_str)
i = 0
for row in data :
print(str(i) + "\t" + row[0] + "\t" + row[1] + "\t" + row[2])
i += 1
============
Print Result
normal List

I am not sure if I understood your input, but if you want this:
prod_id description cost
0 p01 Domaxx Geniune Leather RFID Blocking Trifold W... 10.00
1 p02 Neck Wallet, Passport Holder with RFID Blockin... 15.00
from:
cols = ["prod_id", "description", "cost"]
data = [["p01", "Domaxx Geniune Leather RFID Blocking Trifold Wallets-Made Genuine Soft Leather Large Classic Pocket Wallet,Holding 9 Cards Photo ID Coin Pocket and 2 Note compartments-Black Surface/Orange Inner", "10.00"], ["p02","Neck Wallet, Passport Holder with RFID Blocking Anti-Theft Travel Pouch Security Wallet for Credit Cards and Passport - Silver","15.00"]]
Just do this:
import pandas as pd
df = pd.DataFrame(data, columns=cols)
EDIT
I show you an example with shorter columns fields:
data = [["p01", "Domaxx Geniune Leather RFID Blocking Trifold Wallets-Made", "10.00"],
["p02","Neck Wallet, Passport Holder with RFID Blocking Anti-Theft Travel Pouch Security Wallet f","15.00"]]
fmt = '{:<4}{:<10}{:<100}{}'
data1 = map(list, zip(*data))
print(fmt.format('', "prod_id", "description", "cost")) # your columns here
for i, (x, y, z) in enumerate(zip(data1[0], data1[1], data1[2])):
print(fmt.format(i, x, y, z))
You can play with the values of the format to reach the result that is the best for you

Related

BeautifulSoup Data Scraping : Unable to fetch correct information from the page

I am trying to scrape data from:-
https://www.canadapharmacy.com/
below are a few pages that I need to scrape:-
https://www.canadapharmacy.com/products/abilify-tablet
https://www.canadapharmacy.com/products/accolate
https://www.canadapharmacy.com/products/abilify-mt
I need all the information from the page. I wrote the below code:-
base_url = 'https://www.canadapharmacy.com'
data = []
for i in tqdm(range(len(medicine_url))):
r = requests.get(base_url+medicine_url[i])
soup = BeautifulSoup(r.text,'lxml')
# Scraping medicine Name
try:
main_name = (soup.find('h1',{"class":"mn"}).text.lstrip()).rstrip()
except:
main_name = None
try:
sec_name = (soup.find('div',{"class":"product-name"}).find('h3').text.lstrip()).rstrip()
except:
sec_name = None
try:
generic_name = (soup.find('div',{"class":"card product generic strength equal"}).find('div').find('h3').text.lstrip()).rstrip()
except:
generic_name = None
# Description
try:
des1 = soup.find('div',{"class":"answer expanded"}).find_all('p')[1].text
except:
des1 = ''
try:
des2 = soup.find('div',{"class":"answer expanded"}).find('ul').text
except:
des2 = ''
try:
des3 = soup.find('div',{"class":"answer expanded"}).find_all('p')[2].text
except:
des3 = ''
desc = (des1+des2+des3).replace('\n',' ')
#Directions
try:
dir1 = soup.find('div',{"class":"answer expanded"}).find_all('h4')[1].text
except:
dir1 = ''
try:
dir2 = soup.find('div',{"class":"answer expanded"}).find_all('p')[5].text
except:
dir2 = ''
try:
dir3 = soup.find('div',{"class":"answer expanded"}).find_all('p')[6].text
except:
dir3 = ''
try:
dir4 = soup.find('div',{"class":"answer expanded"}).find_all('p')[7].text
except:
dir4 = ''
directions = dir1+dir2+dir3+dir4
#Ingredients
try:
ing = soup.find('div',{"class":"answer expanded"}).find_all('p')[9].text
except:
ing = None
#Cautions
try:
c1 = soup.find('div',{"class":"answer expanded"}).find_all('h4')[3].text
except:
c1 = None
try:
c2 = soup.find('div',{"class":"answer expanded"}).find_all('p')[11].text
except:
c2 = ''
try:
c3 = soup.find('div',{"class":"answer expanded"}).find_all('p')[12].text #//div[#class='answer expanded']//p[2]
except:
c3 = ''
try:
c4 = soup.find('div',{"class":"answer expanded"}).find_all('p')[13].text
except:
c4 = ''
try:
c5 = soup.find('div',{"class":"answer expanded"}).find_all('p')[14].text
except:
c5 = ''
try:
c6 = soup.find('div',{"class":"answer expanded"}).find_all('p')[15].text
except:
c6 = ''
caution = (c1+c2+c3+c4+c5+c6).replace('\xa0','')
#Side Effects
try:
se1 = soup.find('div',{"class":"answer expanded"}).find_all('h4')[4].text
except:
se1 = ''
try:
se2 = soup.find('div',{"class":"answer expanded"}).find_all('p')[18].text
except:
se2 = ''
try:
se3 = soup.find('div',{"class":"answer expanded"}).find_all('ul')[1].text
except:
se3 = ''
try:
se4 = soup.find('div',{"class":"answer expanded"}).find_all('p')[19].text
except:
se4 = ''
try:
se5 = soup.find('div',{"class":"post-author-bio"}).text
except:
se5 = ''
se = (se1 + se2 + se3 + se4 + se5).replace('\n',' ')
for j in soup.find('div',{"class":"answer expanded"}).find_all('h4'):
if 'Product Code' in j.text:
prod_code = j.text
#prod_code = soup.find('div',{"class":"answer expanded"}).find_all('h4')[5].text #//div[#class='answer expanded']//h4
pharma = {"primary_name":main_name,
"secondary_name":sec_name,
"Generic_Name":generic_name,
"Description":desc,
"Directions":directions,
"Ingredients":ing,
"Caution":caution,
"Side_Effects":se,
"Product_Code":prod_code}
data.append(pharma)
But, each page is having different positions for the tags hence not giving correct data. So, I tried:-
soup.find('div',{"class":"answer expanded"}).find_all('h4')
which gives me the output:-
[<h4>Description </h4>,
<h4>Directions</h4>,
<h4>Ingredients</h4>,
<h4>Cautions</h4>,
<h4>Side Effects</h4>,
<h4>Product Code : 5513 </h4>]
I want to create a data frame where the description contains all the information given in the description, directions contain all the information of directions given on the web page.
for i in soup.find('div',{"class":"answer expanded"}).find_all('h4'):
if 'Description' in i.text:
print(soup.find('div',{"class":"answer expanded"}).findAllNext('p'))
but it prints all the after the soup.find('div',{"class":"answer expanded"}).find_all('h4'). but I want only the tags are giving me the description of the medicine and no others.
Can anyone suggest how to do this? Also, how to scrape the rate table from the page as it gives me values in unappropriate fashion?
You can try the next working example:
import requests
from bs4 import BeautifulSoup
import pandas as pd
data = []
r = requests.get('https://www.canadapharmacy.com/products/abilify-tablet')
soup = BeautifulSoup(r.text,"lxml")
try:
card = ''.join([x.get_text(' ',strip=True) for x in soup.select('div.answer.expanded')])
des = card.split('Directions')[0].replace('Description','')
#print(des)
drc = card.split('Directions')[1].split('Ingredients')[0]
#print(drc)
ingre= card.split('Directions')[1].split('Ingredients')[1].split('Cautions')[0]
#print(ingre)
cau=card.split('Directions')[1].split('Ingredients')[1].split('Cautions')[1].split('Side Effects')[0]
#print(cau)
se= card.split('Directions')[1].split('Ingredients')[1].split('Cautions')[1].split('Side Effects')[1]
#print(se)
except:
pass
data.append({
'Description':des,
'Directions':drc,
'Ingredients':ingre,
'Cautions':cau,
'Side Effects':se
})
print(data)
# df = pd.DataFrame(data)
# print(df)
Output:
[{'Description': " Abilify Tablet (Aripiprazole) Abilify (Aripiprazole) is a medication prescribed to treat or manage different conditions, including: Agitation associated with schizophrenia or bipolar mania (injection formulation only) Irritability associated with autistic disorder Major depressive disorder , adjunctive treatment Mania and mixed episodes associated with Bipolar I disorder Tourette's disorder Schizophrenia Abilify works by activating different neurotransmitter receptors located in brain cells. Abilify activates D2 (dopamine) and 5-HT1A (serotonin) receptors and blocks 5-HT2A (serotonin) receptors. This combination of receptor activity is responsible for the treatment effects of Abilify. Conditions like schizophrenia, major depressive disorder, and bipolar disorder are caused by neurotransmitter imbalances in the brain. Abilify helps to correct these imbalances and return the normal functioning of neurons. ", 'Directions': ' Once you are prescribed and buy Abilify, then take Abilify exactly as prescribed by your
doctor. The dose will vary based on the condition that you are treating. The starting dose of Abilify ranges from 2-15 mg once daily, and the recommended dose for most conditions is between 5-15 mg once daily. The maximum dose is 30 mg once daily. Take Abilify with or without food. ', 'Ingredients': ' The active ingredient in Abilify medication is aripiprazole . ', 'Cautions': ' Abilify and other antipsychotic medications have been associated with an increased risk of death in elderly patients with dementia-related psychosis. When combined with other dopaminergic agents, Abilify can increase the risk of neuroleptic malignant syndrome. Abilify can cause metabolic changes and in some cases can induce high blood sugar in people with and without diabetes . Abilify can also weight gain and increased risk of dyslipidemia. Blood glucose should be monitored while taking Abilify. Monitor for low blood pressure and heart rate while taking Abilify; it can cause orthostatic hypertension which may lead to dizziness or fainting. Use with caution in patients with a history of seizures. ', 'Side Effects': ' The side effects of Abilify vary greatly depending
on what condition is being treated, what other medications are being used concurrently, and what dose is being taken. Speak with your doctor or pharmacist for a full list of side effects that apply to you. Some of the most common side effects include: Akathisia Blurred vision Constipation Dizziness Drooling Extrapyramidal disorder Fatigue Headache Insomnia Nausea Restlessness Sedation Somnolence Tremor Vomiting Buy Abilify online from Canada Pharmacy . Abilify can be purchased online with a valid prescription from a doctor. About Dr. Conor Sheehy (Page Author) Dr. Sheehy (BSc Molecular Biology, PharmD) works a clinical pharmacist specializing in cardiology, oncology, and ambulatory care. He’s a board-certified pharmacotherapy specialist (BCPS), and his experience working one-on-one with patients to fine tune their medication and therapy plans for optimal results makes him a valuable subject matter expert for our pharmacy. Read More.... IMPORTANT NOTE: The above information is intended to increase awareness of health information
and does not suggest treatment or diagnosis. This information is not a substitute for individual medical attention and should not be construed to indicate that use of the drug is safe, appropriate, or effective for you. See your health care professional for medical advice and treatment. Product Code : 5513'}]

Storing keyvalue as header and value text as rows using data frame in python using beautiful soup

for imo in imos:
...
...
keys_div= soup.find_all("div", {"class","col-4 keytext"})
values_div = soup.find_all("div",{"class","col-8 valuetext"})
for key, value in zip(keys_div, values_div):
print(key.text + ": " + value.text)
'......
Output:
Ship Name: MAERSK ADRIATIC
Shiptype: Chemical/Products Tanker
IMO/LR No.: 9636632
Gross: 23,297
Call Sign: 9V3388
Deadweight: 37,538
MMSI No.: 566429000
Year of Build: 2012
Flag: Singapore
Status: In Service/Commission
Operator: Handytankers K/S
Shipbuilder: Hyundai Mipo Dockyard Co Ltd
ShipType: Chemical/Products Tanker
Built: 2012
GT: 23,297
Deadweight: 37,538
Length Overall: 184.000
Length (BP): 176.000
Length (Reg): 177.460
Bulbous Bow: Yes
Breadth Extreme: 27.430
Breadth Moulded: 27.400
Draught: 11.500
Depth: 17.200
Keel To Mast Height: 46.900
Displacement: 46565
T/CM: 45.0
This is the output for one imo, i want to store this output in dataframe and write to csv, the csv will have the keytext as header and value text as rows for all the IMO's please help me on how to do it
All you have to do is add the results to a list and then output that list to a dataframe.
import pandas as pd
filepath = r"C\users\test\test_file.csv"
output_data = []
for imo in imos:
keys_div = [i.text for i in soup.find_all("div", {"class","col-4 keytext"})]
values_div = [i.text for i in soup.find_all("div",{"class","col-8 valuetext"})]
dict1 = dict(zip(keys_div, values_div))
output_data.append(dict1)
df = pd.DataFrame(output_data)
df.to_csv(filepath, index=False)

How do I perform a search on a text file from a users input

The question I am trying to answer is
Query if a book title is available and present option of (a) increasing stock level or (b) decreasing the stock level, due to a sale. If the stock level is decreased to zero indicate to the user that the book is currently out of stock.
This is the text file
#Listing showing sample book details
#AUTHOR, TITLE, FORMAT, PUBLISHER, COST?, STOCK, GENRE
P.G. Wodehouse, Right Ho Jeeves, hb, Penguin, 10.99, 5, fiction
A. Pais, Subtle is the Lord, pb, OUP, 12.99, 2, biography
A. Calaprice, The Quotable Einstein, pb, PUP, 7.99, 6, science
M. Faraday, The Chemical History of a Candle, pb, Cherokee, 5.99, 1, science
C. Smith, Energy and Empire, hb, CUP, 60, 1, science
J. Herschel, Popular Lectures, hb, CUP, 25, 1, science
C.S. Lewis, The Screwtape Letters, pb, Fount, 6.99, 16, religion
J.R.R. Tolkein, The Hobbit, pb, Harper Collins, 7.99, 12, fiction
C.S. Lewis, The Four Loves, pb, Fount, 6.99, 7, religion
E. Heisenberg, Inner Exile, hb, Birkhauser, 24.95, 1, biography
G.G. Stokes, Natural Theology, hb, Black, 30, 1, religion
And this is the code i have so far
def Task5():
again = 'y'
while again == 'y':
desc = input('Enter the title of the book you would like to search for: ')
for bookrecord in book_list:
if desc in book_list:
print('Book found')
else:
print('Book not found')
break
again = input('\nWould you like to search again(press y for yes)').lower()
i already have a function which reads from the text file:
book_list = []
def readbook():
infile = open('book_data_file.txt')
for row in infile:
start = 0 # used to start at the beginning of each line
string_builder = []
if not(row.startswith('#')):
for index in range(len(row)):
if row[index] ==',' or index ==len(row)-1:
string_builder.append(row[start:index])
start = index+1
book_list.append(string_builder)
infile.close()
Any one have an idea on how i complete this task? :)
Get the titles from the book_list variable.
titles = [data[1].strip() for data in book_list]
Remove any white-space from the desc variable.
desc = desc.strip()
For instance If I'm searching for Popular Lectures book, but If I type Popular Lectures then I couldn't find it in the book_list Therefore you should remove the white characters from the input.
If the book is avail, then get the book name and stock value from the book_list
info = [(title, int(book_list[idx][5].strip())) for idx, title in enumerate(titles) if desc in title][0]
bk_nm, stock = info
Print the current situation
if stock == 0:
print("{} is currently not avail".format(bk_nm))
else:
print("{} is avail w/ stock {}".format(bk_nm, stock))
Example:
Enter the title of the book you would like to search for: Popular Lectures
Popular Lectures is avail w/ stock 1
Would you like to search again(press y for yes)y
Enter the title of the book you would like to search for: Chemical
The Chemical History of a Candle is avail w/ stock 1
Would you like to search again(press y for yes)n
Code:
book_list = []
def task5():
titles = [data[1].strip() for data in book_list]
again = 'y'
while again == 'y':
desc = input('Enter the title of the book you would like to search for: ')
desc = desc.strip() # Remove any white space
stock = [int(book_list[idx][5].strip()) for idx, title in enumerate(titles) if desc in title][0]
if stock == 0:
print("{} is currently not avail".format(desc))
else:
print("{} is avail w/ stock {}".format(desc, stock))
again = input('\nWould you like to search again(press y for yes)').lower()
def read_txt():
infile = open('book_data_file.txt')
for row in infile:
start = 0 # used to start at the beginning of each line
string_builder = []
if not (row.startswith('#')):
for index in range(len(row)):
if row[index] == ',' or index == len(row) - 1:
string_builder.append(row[start:index])
start = index + 1
book_list.append(string_builder)
infile.close()
if __name__ == '__main__':
read_txt()
task5()

Search in List; Display names based on search input

I have sought different articles here about searching data from a list, but nothing seems to be working right or is appropriate in what I am supposed to implement.
I have this pre-created module with over 500 list (they are strings, yes, but is considered as list when called into function; see code below) of names, city, email, etc. The following are just a chunk of it.
empRecords="""Jovita,Oles,8 S Haven St,Daytona Beach,Volusia,FL,6/14/1965,32114,386-248-4118,386-208-6976,joles#gmail.com,http://www.paganophilipgesq.com,;
Alesia,Hixenbaugh,9 Front St,Washington,District of Columbia,DC,3/3/2000,20001,202-646-7516,202-276-6826,alesia_hixenbaugh#hixenbaugh.org,http://www.kwikprint.com,;
Lai,Harabedian,1933 Packer Ave #2,Novato,Marin,CA,1/5/2000,94945,415-423-3294,415-926-6089,lai#gmail.com,http://www.buergimaddenscale.com,;
Brittni,Gillaspie,67 Rv Cent,Boise,Ada,ID,11/28/1974,83709,208-709-1235,208-206-9848,bgillaspie#gillaspie.com,http://www.innerlabel.com,;
Raylene,Kampa,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,12/19/2001,46514,574-499-1454,574-330-1884,rkampa#kampa.org,http://www.hermarinc.com,;
Flo,Bookamer,89992 E 15th St,Alliance,Box Butte,NE,12/19/1957,69301,308-726-2182,308-250-6987,flo.bookamer#cox.net,http://www.simontonhoweschneiderpc.com,;
Jani,Biddy,61556 W 20th Ave,Seattle,King,WA,8/7/1966,98104,206-711-6498,206-395-6284,jbiddy#yahoo.com,http://www.warehouseofficepaperprod.com,;
Chauncey,Motley,63 E Aurora Dr,Orlando,Orange,FL,3/1/2000,32804,407-413-4842,407-557-8857,chauncey_motley#aol.com,http://www.affiliatedwithtravelodge.com
"""
a = empRecords.strip().split(";")
And I have the following code for searching:
import empData as x
def seecity():
empCitylist = list()
for ct in x.a:
empCt = ct.strip().split(",")
empCitylist.append(empCt)
t = sorted(empCitylist, key=lambda x: x[3])
for c in t:
city = (c[3])
print(city)
live_city = input("Enter city: ")
for cy in city:
if live_city in cy:
print(c[1])
# print("Name: "+ c[1] + ",", c[0], "| Current City: " + c[3])
Forgive my idiotic approach as I am new to Python. However, what I am trying to do is user will input the city, then the results should display the employee's last name, first name who are living in that city (I dunno if I made sense lol)
By the way, the code I used above doesn't return any answers. It just loops to the input.
Thank you for helping. Lovelots. <3
PS: the format of the empData is: first name, last name, address, city, country, birthday, zip, phone, and email
You can use the csv module to read easily a file with comma separated values
import csv
with open('test.csv', newline='') as csvfile:
records = list(csv.reader(csvfile))
def search(data, elem, index):
out = list()
for row in data:
if row[index] == elem:
out.append(row)
return out
#test
print(search(records, 'Orlando', 3))
Based on your original code, you can do it like this:
# Make list of list records, sorted by city
t = sorted((ct.strip().split(",") for ct in x.a), key=lambda x: x[3])
# List cities
print("Cities in DB:")
for c in t:
city = (c[3])
print("-", city)
# Define search function
def seecity():
live_city = input("Enter city: ")
for c in t:
if live_city == c[3]:
print("Name: "+ c[1] + ",", c[0], "| Current City: " + c[3])
seecity()
Then, after you understand what's going on, do as #Hoxha Alban suggested, and use the csv module.
The beauty of python lies in list comprehension.
empRecords="""Jovita,Oles,8 S Haven St,Daytona Beach,Volusia,FL,6/14/1965,32114,386-248-4118,386-208-6976,joles#gmail.com,http://www.paganophilipgesq.com,;
Alesia,Hixenbaugh,9 Front St,Washington,District of Columbia,DC,3/3/2000,20001,202-646-7516,202-276-6826,alesia_hixenbaugh#hixenbaugh.org,http://www.kwikprint.com,;
Lai,Harabedian,1933 Packer Ave #2,Novato,Marin,CA,1/5/2000,94945,415-423-3294,415-926-6089,lai#gmail.com,http://www.buergimaddenscale.com,;
Brittni,Gillaspie,67 Rv Cent,Boise,Ada,ID,11/28/1974,83709,208-709-1235,208-206-9848,bgillaspie#gillaspie.com,http://www.innerlabel.com,;
Raylene,Kampa,2 Sw Nyberg Rd,Elkhart,Elkhart,IN,12/19/2001,46514,574-499-1454,574-330-1884,rkampa#kampa.org,http://www.hermarinc.com,;
Flo,Bookamer,89992 E 15th St,Alliance,Box Butte,NE,12/19/1957,69301,308-726-2182,308-250-6987,flo.bookamer#cox.net,http://www.simontonhoweschneiderpc.com,;
Jani,Biddy,61556 W 20th Ave,Seattle,King,WA,8/7/1966,98104,206-711-6498,206-395-6284,jbiddy#yahoo.com,http://www.warehouseofficepaperprod.com,;
Chauncey,Motley,63 E Aurora Dr,Orlando,Orange,FL,3/1/2000,32804,407-413-4842,407-557-8857,chauncey_motley#aol.com,http://www.affiliatedwithtravelodge.com
"""
rows = empRecords.strip().split(";")
data = [ r.strip().split(",") for r in rows ]
then you can use any condition to filter the list, like
print ( [ "Name: " + emp[1] + "," + emp[0] + "| Current City: " + emp[3] for emp in data if emp[3] == "Washington" ] )
['Name: Hixenbaugh,Alesia| Current City: Washington']

Why does my else statement execute even after if statement condition has been met?

I'm using a for loop to search a text for a value usinf if else statement inside the for loop. Even when my search condition has been met, my else block is also being executed.
This is the text I am using to search for the value:
SKU Product Desc. Pack/ Size QtyUOM Price Extension
1 WL140.111 Clam Tuatua Medium NZ / 20-34 pcs per kilogram / 30.00 KG HK$109.25 HK$3,277.50
Locations: KIT - Butchery (8%) 30.00 Edit Line Edit Alloc
This is my code:
whole_details = re.compile(r'Item([\$\w\s\.\/\-\,:()%]+)(?:Sub Total)')
wd = whole_details.search(text)
wd_text = wd.group(1)
products = ["Yoghurt Passionfruit Organic", "Yoghurt Plain Organic Vegan", "Clam Tuatua Medium 20-", "Clam Tuatua Medium NZ /", "Oyster Pacific NZ /"]
for product in products:
if wd_text.find(product) != -1:
re_qty = re.compile(rf'{product}\s([\d.]+)')
qty_search = re_qty.search(wd_text)
qty = qty_search.group(1)
print("Product Description : " + product)
print("Quantity : " + qty)
else:
print("No product")
This the output I am getting now:
Product Description : Clam Tuatua Medium NZ /
Quantity : 20
No products

Categories