I have a code here that scrapes data from a specific website which is https://vancouver.craigslist.org/search/ela. My problem is when I execute my code, it gives me an error of 'list' object has no attribute 'get_attribute' in the line asdf = images.get_attribute("src"). I am using selenium library in scraping the data. What I want is to insert the image url from my table which is named images but I cannot. What is wrong with my code? I am not familiar with python yet thats why I am asking questions. Thanks a lot for consideration.
Current code
x = driver.find_elements_by_class_name('hdrlnk')
y = driver.find_elements_by_xpath('//p[#class="result-info"]/span[#class="result-meta"]//span[#class="result-price"]')
images = driver.find_elements_by_xpath('//*[#id="sortable-results"]/ul/li/a/img')
for img in images:
print(img.get_attribute('src'))
for i in range(len(x)):
asdf = images.get_attribute("src")
prod = (x[i].text)
price = (y[i].text)
image = asdf
sql = """INSERT INTO products (name,price,image) VALUES (%s,%s,%s)"""
mycursor.execute(sql,(prod,price,image))
mydb.commit()
When I comment this line
for img in images:
print(img.get_attribute('src'))
and remove the asdf and image variable, I am able to insert the data and also when I comment this line of code and remain the print for images,
#for i in range(len(x)):
#asdf = images.get_attribute("src")
#prod = (x[i].text)
#price = (y[i].text)
#image = asdf
#sql = """INSERT INTO products (name,price,image) VALUES (%s,%s,%s)"""
#mycursor.execute(sql,(prod,price,image))
#mydb.commit()
I got the result I want which is like this
https://images.craigslist.org/00z0z_4cqgwC5PIXs_300x300.jpg
https://images.craigslist.org/00J0J_f6AnAonGjXd_300x300.jpg
https://images.craigslist.org/00606_mtKNjKREOO_300x300.jpg
https://images.craigslist.org/00U0U_l5t0QnjZEPt_300x300.jpg
https://images.craigslist.org/00505_gIXt1C8aeqk_300x300.jpg
https://images.craigslist.org/00N0N_6P1GmSiL2vI_300x300.jpg
Sample data for x and y variable in i loop:
x = Spigen Magnetic Car Phone Mount
y= $20
What do I need to do in order to insert the image url with the product name and images in a single row? TIA.
EDIT. I tried #terahertz's answer and rewrite my code like this
x = driver.find_elements_by_class_name('hdrlnk')
y = driver.find_elements_by_xpath('//p[#class="result-info"]/span[#class="result-meta"]//span[#class="result-price"]')
images = driver.find_elements_by_xpath('//*[#id="sortable-results"]/ul/li/a/img')
for img in images:
# print(img.get_attribute('src'))
for i in range(len(x)):
asdf = img.get_attribute("src")
prod = (x[i].text)
price = (y[i].text)
image = asdf
sql = """INSERT INTO products (name,price,image) VALUES (%s,%s,%s)"""
mycursor.execute(sql,(prod,price,image))
mydb.commit()
Current DB datas
+-----+------------------------------------------------------------------------+--------+-------------------------------------------------------------+
| id | name | price | image |
+-----+------------------------------------------------------------------------+--------+-------------------------------------------------------------+
| 1 | Spigen Magnetic Car Phone Mount | $20 | https://images.craigslist.org/00i0i_7PvHxDMvR2o_300x300.jpg |
| 2 | Netgear Nighthawk x6 r8000 wireless router | $120 | https://images.craigslist.org/00i0i_7PvHxDMvR2o_300x300.jpg |
| 3 | iPod Touch 8gb 2nd generation - Loaded with Classic Rock | $60 | https://images.craigslist.org/00i0i_7PvHxDMvR2o_300x300.jpg |
| 4 | 3 plug 3.1A fast USB wallplugs | $10 | https://images.craigslist.org/00i0i_7PvHxDMvR2o_300x300.jpg |
| 5 | Audio and Video Cables | $3 | https://images.craigslist.org/00i0i_7PvHxDMvR2o_300x300.jpg |
| 6 | Like New Samsung 50" HD TV ForSale | $400 | https://images.craigslist.org/00i0i_7PvHxDMvR2o_300x300.jpg |
| 7 | SONY Alarm Clock | $20 | https://images.craigslist.org/00i0i_7PvHxDMvR2o_300x300.jpg |
| 8 | Bowers & Wilkins P7 Wireless MINT | $450 | https://images.craigslist.org/00i0i_7PvHxDMvR2o_300x300.jpg |
+-----+------------------------------------------------------------------------+--------+-------------------------------------------------------------+
Now I can insert into my database, BUT the problem is the image column has the same value with other. Its like only one image url inserted. And when I visited the link, the product name and the image doesnt match.
Change asdf = images.get_attribute("src") to asdf = img.get_attribute("src")
Your outer loop is accessing each item in the images list with the variable img. But in your inner loop, you are accessing the images list.
Related
I'm working with a car accidents dataset, and I want to create a folium heatmap with time, that shows all the casualties with their location. Here's a sample of the data called df_weather_casualties:
|Date2 | Latitude | Longitude | Number_of_Casualties | Weather_Details
----------------------------------------------------------------------------------
|2005-01-04 | 51.489096 | -0.191170 | 1 | Raining no high winds
|2005-01-05 | 51.520075 | -0.211708 | 1 | Fine no high winds
|2005-01-06 | 51.525301 | -0.206458 | 1 | Fine no high winds
|2005-01-06 | 51.482442 | -0.173862 | 1 | Fine no high winds
|2005-01-10 | 51.495752 | -0.156618 | 1 | Fine no high winds
Because there are duplicate dates (as there are multiple casualties in a day) I've created a time index with:
time_index = list(df_weather_casualties['Date2'].sort_values().astype('str').unique())
I've created a list of lists meaning each element of the data will contain another list which represents Latitude, Longitude, Number_of_Casualties, and Weather_Details of each date with the following code:
df_weather_casualties['Date2'] = df_weather_casualties['Date2'].sort_values(ascending=True)
weather_casualties_data = []
for _, d in df_weather_casualties.groupby('Date2'):
weather_casualties_data.append([[row['Latitude'], row['Longitude'], row['Number_of_Casualties'], row['Weather_Details']] for _, row in d.iterrows()])
So the first element looks like this:
[[51.516575, -0.08126, 1, 'Fine no high winds'],
[51.512515, -0.130576, 1, 'Fine no high winds'],
[51.542651, -0.148234, 1, 'Raining no high winds']]
I've created a folium map with the following code:
hmt = folium.Map(location=[55.000, -2.0000],
tiles='cartodbdark_matter',
zoom_start=5,
control_scale=True)
HeatMapWithTime(
weather_casualties_data,
index=time_index,
auto_play=False,
blur=1.0,
radius=8,
max_opacity=0.4
).add_to(hmt)
hmt
How can I add a filter menu to the map that filters through different "Weather_Detail" values, only showing the accidents belonging to each weather detail? I've seen people do it, but I've been unable to figure it out.
This code does what I want with a normal map, but I'm trying to get it to work with the time element:
hm = folium.Map(location=[55.000, -2.0000],
tiles='cartodbdark_matter',
zoom_start=5,
control_scale=True)
for weather in df_weather_casualties["Weather_Details"].unique():
weather_group = folium.FeatureGroup(name=weather)
HeatMap(data=df_weather_casualties[df_weather_casualties["Weather_Details"] == weather][["Latitude", "Longitude"]], blur=1.0, radius=8, max_opacity=0.4).add_to(weather_group)
weather_group.add_to(hm)
folium.LayerControl().add_to(hm)
hm
Any help or pointers would be great.
Thanks.
I have a model with some columns, between them there are 2 columns: equipment_id (a CharField) and date_saved (a DateTimeField).
I have multiple rows with the same equipment_id and but different date_saved (each time the user saves the record I save the now date time).
I want to retrieve the record that has a specific equipment_id and is the latest saved, i.e.:
| Equipment_id | Date_saved |
| --- ----- | --------------------- -------- |
| 1061a | 26-DEC-2020 10:10:23|
| 1061a | 26-DEC-2020 10:11:52|
| 1061a | 26-DEC-2020 10:22:03|
| 1061a | 26-DEC-2020 10:31:15|
| 1062a | 21-DEC-2020 10:11:52|
| 1062a | 25-DEC-2020 10:22:03|
| 1073a | 20-DEC-2020 10:31:15|
I want to retrieve for example the latest equipment_id=1061.
I have tried various approach without success:
prg = Program.objects.filter(equipment_id=id)
program = Program.objects.latest('date_saved')
when I use program I get the latest record saved with no relation to the previous filter
You can chain the filtering as,
result = Program.objects.filter(equipment_id=id).latest('date_saved')
I need to create tree showing calculation of formula at each level. I receive data in as dictionaries
level1={'Operating Cash Flow':['Total Revenue','Operating Expenses']}
level2={'Total Revenue':['Net Income','Change in Asset'],'Operating Expenses':['Non Cash Expense','XYZ']}
And So on. Idea is to create complex trackers using publicly available data. And then output(in excel) it in a hierarchical format to show it's calculation. Something like this
+-------------------------------------------------------+
| Operating |
| Cash Flow |
+-------------------------------------------------------+
| 60 |
+-------------------------------------------------------+
| Total Revenue | Operating Expense |
+------------------------------+------------------------+
| 10 | 50 |
+------------------------------+------------------------+
| Net Income | Change in asset | Non cash Expense | XYZ |
+------------+-----------------+------------------+-----+
| 20 | -10 | 40 | 10 |
+------------+-----------------+------------------+-----+
So I tried to make a tree like structure for this.
class Node(object):
def __init__(self):
self.score=0
self.fields=[]
self.weights=1
self.children=[]
def add_child(self,obj):
self.children.append(obj)
self.weights= self.weights + obj.weights -1
def enter_score(self,value):
self.score=value
where Node.fields will store their fields and Node.children points to there value. Since field names in dict are str, I am not sure how to make this complete tree. Also any help regarding how to output this as excel after tree is done would be helpful.
I am writing Python code to show items in a store .... as I am still learning I want to know how to make a table which looks exactly like a table made by using Texttable ....
My code is
Goods = ['Book','Gold']
Itemid= [711001,711002]
Price= [200,50000]
Count= [100,2]
Category= ['Books','Jewelry']
titles = ['', 'Item Id', 'Price', 'Count','Category']
data = [titles] + list(zip(Goods, Itemid, Price, Count, Category))
for i, d in enumerate(data):
line = '|'.join(str(x).ljust(12) for x in d)
print(line)
if i == 0:
print('=' * len(line))
My Output:
|Item Id |Price |Count |Category
================================================================
Book |711001 |200 |100 |Books
Gold |711002 |50000 |2 |Jewelry
Output I want:
+------+---------+-------+-------+-----------+
| | Item Id | Price | Count | Category |
+======+=========+=======+=======+===========+
| Book | 711001 | 200 | 100 | Books |
+------+---------+-------+-------+-----------+
| Gold | 711002 | 50000 | 2 | Jewelry |
+------+---------+-------+-------+-----------+
You code is building your output by hand, using string.join(). You can do it that way but it is very tedious. Use string formatting instead.
To help you along here is one line:
content_format = "| {Goods:4.4s} | {ItemId:<7d} | {Price:<5d} | {Count:<5d} | {Category:9s} |"
output_line = content_format.format(Goods="Book",ItemId=711001,Price=200,Count=100,Category="Books")
Texttable adjusts its cell widths to fit the data. If you want to do the same, then you will have to put computed field widths in content_format instead of using numeric literals the way I have done in the example above. Again, here is one example to get you going:
content_format = "| {Goods:4.4s} | {ItemId:<7d} | {Price:<5d} | {Count:<5d} | {Category:{CategoryWidth}s} |"
output_line = content_format.format(Goods="Book",ItemId=711001,Price=200,Count=100,Category="Books",CategoryWidth=9)
But if you already know how to do this using Texttable, why not use that? Your comment says it's not available in Python: not true, I just downloaded version 0.9.0 using pip.
I have a database with rows of image locations, I also have a list of image locations which is generated when I run the script. I want to keep the database in sync with the generated list. I would just drop the table, but this table has vote information which I need to keep. If I delete the image, I don't want there to be an entry, but if I add an image I want to be able to keep the votes for all of the other images
example:
[db]
Name | Path | vote_count
image1 | path/to/image1.jpg | 1
image2 | path/to/image2.jpg | 4
image3 | path/to/image3.jpg | 2
[list]
path/to/image1.jpg
path/to/image2.jpg
path/to/image3.jpg
path/to/image4.jpg
I want to compare the list to the database and if there is an added image I want to see the db do the following:
[db]
Name | Path | vote_count
image1 | path/to/image1.jpg | 1
image2 | path/to/image2.jpg | 4
image3 | path/to/image3.jpg | 2
image4 | path/to/image4.jpg | 0
What is a good way to accomplish this?
I have this so far:
def ScanImages(request):
files = []
fileRoots = []
for root, directories, filenames in os.walk('/web/static/web/images/'):
for filename in filenames:
files.append(os.path.join(root,filename))
fileRoots.append(root)
Assuming you have django model VotableImage, you can get list of one of it's fields by calling db_path_list = VotableImage.objects.values_list('Path', flat=True) and then check each value for presence in files list (that you created by script)