Attempting to grab certain Elements

Attempting to grab certain Elements - python

I am new to lxml module in Python.
I am trying to parse data from a website: https://weather.com/weather/tenday/l/USCA1037:1:US
I am trying to grab the text of :
<span classname="narrative" class="narrative">
Cloudy. Low 49F. Winds WNW at 10 to 20 mph.
</span>
However, I am getting my xpath all mixed up.
To be exact, the location of this line is
//*[#id="twc-scrollabe"]/table/tbody/tr[4]/td[2]/span
I've attempted as the following
import requests
import lxml.html
from lxml import etree
html = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")
element_object = lxml.html.fromstring(html.content) # htmlelement object returns bytes
# element_object has root of <html>
table = element_object.xpath('//div[#class="twc-table-scroller"]')[0]
day_of_week = table.xpath('.//span[#class="date-time"]/text()') # returns list of items from "dates-time"
dates = table.xpath('.//span[#class="day-detail clearfix"]/text()')
td = table.xpath('.//tbody/tr/td/span[contains(#class, "narrative")]')
print td
# print td displays an empty list.
I would like my program to also parse "Cloudy. Low 49F. Winds WNW at 10 to 20 mph."
Please help...

Some <td> have title= with description
import requests
import lxml.html
html = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")
element_object = lxml.html.fromstring(html.content)
table = element_object.xpath('//div[#class="twc-table-scroller"]')[0]
td = table.xpath('.//tr/td[#class="twc-sticky-col"]/#title')
print(td)
Result
['Mostly cloudy skies early, then partly cloudy after midnight. Low 48F. Winds SSW at 5 to 10 mph.',
'Mainly sunny. High 66F. Winds WNW at 5 to 10 mph.',
'Sunny. High 71F. Winds NW at 5 to 10 mph.',
'A mainly sunny sky. High 69F. Winds W at 5 to 10 mph.',
'Some clouds in the morning will give way to mainly sunny skies for the afternoon. High 67F. Winds WSW at 5 to 10 mph.',
'Considerable clouds early. Some decrease in clouds later in the day. High 67F. Winds WSW at 5 to 10 mph.',
'Partly cloudy. High near 65F. Winds WSW at 5 to 10 mph.',
'Cloudy skies early, then partly cloudy in the afternoon. High 61F. Winds WSW at 10 to 20 mph.',
'Sunny skies. High 62F. Winds WNW at 10 to 20 mph.',
'Mainly sunny. High 61F. Winds WNW at 10 to 20 mph.',
'Sunny along with a few clouds. High 64F. Winds WNW at 10 to 15 mph.',
'Mostly sunny skies. High around 65F. Winds WNW at 10 to 15 mph.',
'Mostly sunny skies. High 66F. Winds WNW at 10 to 20 mph.',
'Mainly sunny. High around 65F. Winds WNW at 10 to 20 mph.',
'A mainly sunny sky. High around 65F. Winds WNW at 10 to 20 mph.']
There is no <tbody> in HTML but web browser may display it in DevTool - so don't use tbody in xpath.
Some text is in <span></span> but some in <span><span></span></span>
import requests
import lxml.html
html = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")
element_object = lxml.html.fromstring(html.content)
table = element_object.xpath('//div[#class="twc-table-scroller"]')[0]
td = table.xpath('.//tr/td//span/text()')
print(td)
Result
['Tonight', 'APR 21', 'Partly Cloudy', '--', '48', '10', '%', 'SSW 7 mph ', '85', '%',
'Mon', 'APR 22', 'Sunny', '66', '51', '10', '%', 'WNW 9 mph ', '67', '%',
'Tue', 'APR 23', 'Sunny', '71', '53', '0', '%', 'NW 8 mph ', '59', '%',
'Wed', 'APR 24', 'Sunny', '69', '52', '10', '%', 'W 9 mph ', '71', '%',
'Thu', 'APR 25', 'Partly Cloudy', '67', '51', '10', '%', 'WSW 9 mph ', '71', '%',
'Fri', 'APR 26', 'Partly Cloudy', '67', '51', '10', '%', 'WSW 9 mph ', '69', '%',
'Sat', 'APR 27', 'Partly Cloudy', '65', '50', '10', '%', 'WSW 9 mph ', '71', '%',
'Sun', 'APR 28', 'AM Clouds/PM Sun', '61', '49', '20', '%', 'WSW 13 mph ', '75', '%',
'Mon', 'APR 29', 'Sunny', '62', '48', '10', '%', 'WNW 14 mph ', '63', '%',
'Tue', 'APR 30', 'Sunny', '61', '49', '0', '%', 'WNW 14 mph ', '61', '%',
'Wed', 'MAY 1', 'Mostly Sunny', '64', '50', '0', '%', 'WNW 12 mph ', '60', '%',
'Thu', 'MAY 2', 'Mostly Sunny', '65', '50', '0', '%', 'WNW 12 mph ', '61', '%',
'Fri', 'MAY 3', 'Mostly Sunny', '66', '51', '0', '%', 'WNW 13 mph ', '61', '%',
'Sat', 'MAY 4', 'Sunny', '65', '51', '0', '%', 'WNW 14 mph ', '62', '%',
'Sun', 'MAY 5', 'Sunny', '65', '51', '0', '%', 'WNW 14 mph ', '63', '%']

If you want to grab text like Sunny. High 66F. Winds WNW at 5 to 10 mph., you can get them from the title attributes of <td>.
This should work.
td = table.xpath('.//tbody/tr/td[#class="description"]/#title')

Related

Appending item from a list to a Pandas df with rules

I'm creating a random meal generator. The idea is that it randomly selects meals from a dictionary and applies a day of the week in which to have that meal. It will just pick meals for a week so there will only be 7 recipes chosen. Here is my code so far:
meals = {'Recipe': ['Chicken Kebab', 'Chicken Balti', 'Chicken Stir Fry', 'Chicken Curry', 'Cola Chicken', 'Chicken Fajita Pie',
'Chicken in Black Bean Sauce', 'Stuffed Meatballs', 'Pesto Pasta'],
'Book': ['1', '1', '1', '1', '1', '1', '1', '1', '1'],
'Page': ['48', '50', '52', '58', '72', '74', '80', '87', '108'],
'Category': ['Normal', 'Curry', 'Asian', 'Curry', 'Normal', 'Normal', 'Asian', 'Pasta', 'Pasta'],
'Vegetarian': ['No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes']}
df = pd.DataFrame(meals)
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
print (df)
mealdays = []
while len(meals['Recipe']) != len(days):
days.append(" ")
while len(meals['Recipe']) != len(mealdays):
day = random.choice(days)
mealdays.append(day)
days.remove(day)
random.shuffle(days)
df['Days'] = mealdays
print (df)
This works great for picking the meals at random, however what I want to do now is introduce some rules so that certain 'Category's aren't selected multiple times. For example, if a 'Curry' category recipe has been randomly selected for 'Mon', then I wouldn't want another 'Curry' Category recipe for the rest of the week. I'm assuming this would be an If statement within the second While loop, but I'm not sure what it would be.

IIUC use sample to shuffle your df, then find and drop duplicates of curry and pasta:
cat = ["Curry", "Pasta"]
df = df.sample(frac=1)
s = df[df["Category"].isin(cat)].drop_duplicates("Category", keep="last").index
print (df[~df.index.isin(s)])
Recipe Book Page Category Vegetarian
6 Chicken in Black Bean Sauce 1 80 Asian No
4 Cola Chicken 1 72 Normal No
2 Chicken Stir Fry 1 52 Asian No
1 Chicken Balti 1 50 Curry No
8 Pesto Pasta 1 108 Pasta Yes
0 Chicken Kebab 1 48 Normal No
5 Chicken Fajita Pie 1 74 Normal No

I have modified your data so that you have at least 7 categories, otherwise it wouldn't be possible to add your constraint. In any case, you could groupby the category and take 1 sample from each, then take a random sample of 7 of those, and map the days to them.
meals = {'Recipe': ['Chicken Kebab', 'Chicken Balti', 'Chicken Stir Fry', 'Chicken Curry', 'Cola Chicken', 'Chicken Fajita Pie',
'Chicken in Black Bean Sauce', 'Stuffed Meatballs', 'Pesto Pasta'],
'Book': ['1', '1', '1', '1', '1', '1', '1', '1', '1'],
'Page': ['48', '50', '52', '58', '72', '74', '80', '87', '108'],
'Category': ['Normal', 'Curry', 'Asian', 'Curry', 'American', 'Diet', 'Asian', 'Pasta', 'Frozen'],
'Vegetarian': ['No', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes']}
df = pd.DataFrame(meals)
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
df.loc[df.groupby('Category').sample(1).sample(7).index, 'Days'] = days
df.fillna('', inplace=True)
Output
Recipe Book Page Category Vegetarian Days
0 Chicken Kebab 1 48 Normal No Sun
1 Chicken Balti 1 50 Curry No
2 Chicken Stir Fry 1 52 Asian No
3 Chicken Curry 1 58 Curry No Fri
4 Cola Chicken 1 72 American No Tue
5 Chicken Fajita Pie 1 74 Diet No Sat
6 Chicken in Black Bean Sauce 1 80 Asian No Wed
7 Stuffed Meatballs 1 87 Pasta No Mon
8 Pesto Pasta 1 108 Frozen Yes Thu

I came up with this. It works, as there are not that many days in a week. But Henry's answer looks much more efficient.
certain_categories = ['Curry']
pool = df
picked = []
for day in days:
meal = pool.sample(1)
df.loc[meal.index, 'Days'] = day
if meal['Category'].item() in certain_categories:
picked.append(meal['Category'].item())
pool = df[df['Days'].isnull() & ~df['Category'].isin(picked)]
df.fillna('', inplace=True)
print(df)
Recipe Book Page Category Vegetarian Days
0 Chicken Kebab 1 48 Normal No
1 Chicken Balti 1 50 Curry No
2 Chicken Stir Fry 1 52 Asian No Sun
3 Chicken Curry 1 58 Curry No Thu
4 Cola Chicken 1 72 Normal No Tue
5 Chicken Fajita Pie 1 74 Normal No Fri
6 Chicken in Black Bean Sauce 1 80 Asian No Wed
7 Stuffed Meatballs 1 87 Pasta No Mon
8 Pesto Pasta 1 108 Pasta Yes Sat

how can i scrape data from a url on the network?

Here i would like to create a program which scrapes data from https://www.futbin.com/21/player/560/aubameyang located at the bottom of the page is the daily and hourly graph sections , the hourly graph is what i want which can be found in the network section of the inspect element which is called https://www.futbin.com/21/playerPrices?player=188567&rids=84074647&_=1608811830598 this gives me a list for all platforms (ps,xbox,pc) of the recent sales history using the LCPrice , LCPrice2 etc... That is what id like to scrape/extract.
Each player are also used by an id in this example for this player the id is 188567 found via the network tab which gives a list of prices , my current code is this :
it doesn't print/give back anything any help would be appreciated
import requests
from datetime import datetime
player_ids = {
'Arturo Vidal': 181872,
'Pierre-Emerick Aubameyang': 188567,
'Robert Lewandowski': 188545,
'Jerome Boateng': 183907,
'Sergio Ramos': 155862,
'Antoine Griezmann': 194765,
'David Alaba': 197445,
'Paulo Dybala': 211110,
'Radja Nainggolan': 178518
}
for (name,id) in player_ids.items():
r = requests.get('https://www.futbin.com/21/playerPrices?player={0}'.format(id))
data = r.json()
print(name)
print("-"*20)
#Change ps to xbox or pc to get other prices
for price in data['ps']:
price = price[1]
print(price)

Question should be improved, but based on my understanding you searching for something like the following example.
What makes the difference
Accessing the data for player and console the right way
data[str(id)]['prices']['ps'].values()
Example:
import requests
from datetime import datetime
player_ids = {
'Arturo Vidal': 181872,
'Pierre-Emerick Aubameyang': 188567,
'Robert Lewandowski': 188545,
'Jerome Boateng': 183907,
'Sergio Ramos': 155862,
'Antoine Griezmann': 194765,
'David Alaba': 197445,
'Paulo Dybala': 211110,
'Radja Nainggolan': 178518
}
for (name,id) in player_ids.items():
r = requests.get('https://www.futbin.com/21/playerPrices?player={0}'.format(id))
data = r.json()
print(name)
print("-"*20)
psPrices = list(data[str(id)]['prices']['ps'].values())
print(psPrices)
xboxPrices = list(data[str(id)]['prices']['xbox'].values())
print(xboxPrices)
Output:
Arturo Vidal
--------------------
['0', '0', '0', '0', '0', '10 weeks ago', '3,600', '65,000', '0']
['0', '0', '0', '0', '0', '10 weeks ago', '2,100', '37,500', '100']
Pierre-Emerick Aubameyang
--------------------
['59,000', '59,000', '0', '0', '0', '13 mins ago', '12,250', '230,000', '21']
['57,000', '57,500', '58,000', '58,000', '58,000', '14 mins ago', '11,000', '210,000', '23']
Robert Lewandowski
--------------------
['72,500', '72,500', '72,500', '72,500', '72,500', '14 mins ago', '6,000', '110,000', '63']
['73,500', '73,500', '73,500', '73,500', '73,500', '2 mins ago', '7,400', '140,000', '49']
Jerome Boateng
--------------------
['1,400', '1,400', '1,400', '1,400', '1,400', '15 mins ago', '700', '10,000', '7']
['1,300', '1,300', '1,300', '1,300', '1,300', '4 mins ago', '700', '10,000', '6']
Sergio Ramos
--------------------
['50,000', '50,500', '50,500', '50,500', '50,500', '19 mins ago', '8,000', '150,000', '29']
['51,000', '51,000', '51,000', '51,000', '0', '15 mins ago', '7,200', '140,000', '32']
Antoine Griezmann
--------------------
['29,250', '29,250', '29,250', '29,250', '29,250', '35 mins ago', '2,800', '50,000', '56']
['32,750', '32,750', '33,000', '33,000', '33,000', '37 mins ago', '2,900', '55,000', '57']
David Alaba
--------------------
['0', '0', '0', '0', '0', '14 mins ago', '700', '10,000', '100']
['0', '0', '0', '0', '0', '16 mins ago', '700', '11,000', '100']
Paulo Dybala
--------------------
['36,000', '36,000', '36,000', '36,250', '36,500', '19 mins ago', '3,600', '65,000', '52']
['37,500', '37,500', '37,500', '38,000', '38,000', '1 min ago', '3,100', '55,000', '66']
Radja Nainggolan
--------------------
['2,100', '2,100', '2,100', '2,100', '2,100', '21 mins ago', '700', '10,000', '15']
['1,900', '1,900', '1,900', '1,900', '1,900', '32 mins ago', '700', '10,000', '12']

Python, Take Multiple Lists and Putting into pd.Dataframe

I have seen a variety of answers to this question (like this one), and have had no success in getting my lists into one dataframe. I have one header list (meant to be column headers), and then a variable that has multiple records in it:
list1 = ['Rank', 'Athlete', 'Distance', 'Runs', 'Longest', 'Avg. Pace', 'Elev. Gain']
list2 = (['1', 'Jack', '57.4 km', '4', '21.7 km', '5:57 /km', '994 m']
['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m']
['3', 'Kelsey', '32.6 km', '2', '21.3 km', '5:46 /km', '141 m'])
When I try something like:
df = pd.DataFrame(list(zip(['1', 'Jack, '57.4 km', '4', '21.7 km', '5:57 /km', '994 m'],
# ['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m'])))
It lists all the attributes as their own rows, like so:
0 1
0 1 2
1 Jack Jill
2 57.4 km 34.0 km
3 4 2
4 21.7 km 17.9 km
5 5:57 /km 5:27 /km
6 994 m 152 m
How do I get this into a frame that has list1 as the headers, and the rest of the data neatly squared away?

Given
list1 = ['Rank', 'Athlete', 'Distance', 'Runs', 'Longest', 'Avg. Pace', 'Elev. Gain']
list2 = (['1', 'Jack', '57.4 km', '4', '21.7 km', '5:57 /km', '994 m'],
['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m'],
['3', 'Kelsey', '32.6 km', '2', '21.3 km', '5:46 /km', '141 m'])
do
pd.DataFrame(list2, columns=list1)
which returns
Rank Athlete Distance Runs Longest Avg. Pace Elev. Gain
0 1 Jack 57.4 km 4 21.7 km 5:57 /km 994 m
1 2 Jill 34.0 km 2 17.9 km 5:27 /km 152 m
2 3 Kelsey 32.6 km 2 21.3 km 5:46 /km 141 m

Change your second list into a list of lists and then
df = pd.DataFrame(columns = list1, data = list2)

Python - TypeError: expecting string or bytes object

After much research I cannot figure out why I receive this error in my code.
I'm trying to export a Pandas Dataframe to my Oracle table. I have successfully done this hundreds of times on other data tables but this one keeps producing errors.
Here is my Dataframe, which I read in with pd.read_excel and appended three of my own columns with simple df['column_name'] = variable commands:
S USTAINABLE H ARVEST S ECTOR| QUOTA LISTING APRIL 16 2013 Unnamed: 1 \
1 DATE TRADE ID
2 04/02/13 130014
3 0 0
4 0 0
5 0 0
6 FY13 QUOTA – TO BUY 0
7 DATE TRADE ID
8 3/26/13 130006
9 4/9/13 130012
10 3/26/13 130007
11 3/26/13 130001
12 3/26/13 130009
13 4/9/13 130013
14 3/26/13 130010
15 3/26/13 130008
16 3/26/13 130011
17 1 0
Unnamed: 2 Unnamed: 3 Unnamed: 4 email_year \
1 AVAILABLE STOCK AMOUNT BUY PRICE 2013
2 WINTER SNE 12000 TRADE IN RETURN FOR 2013
3 0 0 HADDOCK GOM, 2013
4 0 0 YELLOWTAIL GOM, OR 2013
5 0 0 WITCH - OFFERS 2013
6 0 0 0 2013
7 DESIRED STOCK AMOUNT BUY PRICE 2013
8 COD GBE ANY OFFERS 2013
9 COD GBW UP TO 100,000 0.3 2013
10 COD GBW ANY OFFERS 2013
11 COD GOM INQUIRE 1.5 2013
12 WINTER GB ANY OFFERS 2013
13 WINTER SNE UP TO 100,000 0.3 2013
14 WINTER SNE ANY OFFERS 2013
15 YELLOWTAIL GB ANY OFFERS 2013
16 YELLOWTAIL GOM ANY TRADE FOR GB STOCKS -\nOFFERS 2013
17 0 0 0 2013
email_month email_day
1 4 16
2 4 16
3 4 16
4 4 16
5 4 16
6 4 16
7 4 16
8 4 16
9 4 16
10 4 16
11 4 16
12 4 16
13 4 16
14 4 16
15 4 16
16 4 16
17 4 16
My code fails on the export line cursor.executemany(sql_query, exported_data) with the error:
Traceback (most recent call last):
File "Z:\Code\successful_excel_pdf_code.py", line 74, in <module>
cursor.executemany(sql_query, exported_data)
TypeError: expecting string or bytes object
Here is my relevant code:
df = pd.read_excel(file_path)
df = df.fillna(0)
df = df.ix[1:]
cursor = con.cursor()
exported_data = [tuple(x) for x in df.values]
#exported_data = [str(x) for x in df.values]
#print("exported_data:", exported_data)
sql_query = ("INSERT INTO FISHTABLE(date_posted, stock_id, species, pounds, advertised_price, email_year, email_month, email_day, sector_name, ask)" "VALUES(:1, :2, :3, :4, :5, :6, :7, :8, 'Sustainable Harvest Sector', '1')")
cursor.executemany(sql_query, exported_data)
con.commit() #commit to database
cursor.close()
con.close()
Here is a printout of exported_data:
[('DATE', 'TRADE ID', 'AVAILABLE STOCK', 'AMOUNT', 'BUY PRICE', '2013', '4', '16'), ('04/02/13', 130014, 'WINTER SNE', 12000, 'TRADE IN RETURN FOR', '2013', '4', '16'), (0, 0, 0, 0, 'HADDOCK GOM,', '2013', '4', '16'), (0, 0, 0, 0, 'YELLOWTAIL GOM, OR', '2013', '4', '16'), (0, 0, 0, 0, 'WITCH - OFFERS', '2013', '4', '16'), ('FY13 QUOTA – TO BUY', 0, 0, 0, 0, '2013', '4', '16'), ('DATE', 'TRADE ID', 'DESIRED STOCK', 'AMOUNT', 'BUY PRICE', '2013', '4', '16'), ('3/26/13', 130006, 'COD GBE', 'ANY', 'OFFERS', '2013', '4', '16'), ('4/9/13', 130012, 'COD GBW', 'UP TO 100,000', 0.3, '2013', '4', '16'), ('3/26/13', 130007, 'COD GBW', 'ANY', 'OFFERS', '2013', '4', '16'), ('3/26/13', 130001, 'COD GOM', 'INQUIRE', 1.5, '2013', '4', '16'), ('3/26/13', 130009, 'WINTER GB', 'ANY', 'OFFERS', '2013', '4', '16'), ('4/9/13', 130013, 'WINTER SNE', 'UP TO 100,000', 0.3, '2013', '4', '16'), ('3/26/13', 130010, 'WINTER SNE', 'ANY', 'OFFERS', '2013', '4', '16'), ('3/26/13', 130008, 'YELLOWTAIL GB', 'ANY', 'OFFERS', '2013', '4', '16'), ('3/26/13', 130011, 'YELLOWTAIL GOM', 'ANY', 'TRADE FOR GB STOCKS -\nOFFERS', '2013', '4', '16'), (1, 0, 0, 0, 0, '2013', '4', '16')]
1) I thought the error could be from a lot of NaNs being scattered throughout the Dataframe, so I replaced them with 0's and it still fails.
2) I then thought the error could be from trying to export the first couple rows which held no valuable information, so I deleted the first row with df = df.ix[1:] but it still fails.
3) I also thought it could be failing because of the values in my email_year/month/day columns, so I changed them all to strings before putting them into my Dataframe, but it still fails.
4) I tried changing the exported_data command to a str instead of a tuple but that only changed the error to cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number. Also, it has always worked fine as a tuple when exporting other Dataframes.
5) I thought the error could be from my Oracle columns not allowing either numbers or letters, but they are all set to all VarChar2 so that isn't the cause of the error either.
I'd appreciated any help solving this, thanks.

Based on the export data noted above, the problem you are experiencing is due to the fact that the data in one row is not the same type as the data in subsequent rows. In your case, in one row you have the value '04/02/13' (as a string) and in the next row you have the value 0 (as an integer). You will need to make sure that the data type is consistent for the column across all rows.

Appending objects to list in a loop - not what I expected

I am appending a object to a list like this:
json_object = []
nodes = soup.findAll(params["node_name"])
for node in nodes:
obj = tags
for element in node:
if element.name != None:
obj[element.name] = str(element.text)
print obj
json_object.append(obj)
print json_object
Here is the output of the first two iterations:
{'sl_no': '1', 'sl_runs': '98', 'sl_name': 'Khumalo S', 'sl_wins': '12', 'sl_level': '-19.30', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '12', 'sl_place': '8', 'sl_second': '16', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '12.24', 'sl_winplace': '51.02', 'date_to': '20 November 2013', 'sl_fourth': '10', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R1 018 300'}
[{'sl_no': '1', 'sl_runs': '98', 'sl_name': 'Khumalo S', 'sl_wins': '12', 'sl_level': '-19.30', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '12', 'sl_place': '8', 'sl_second': '16', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '12.24', 'sl_winplace': '51.02', 'date_to': '20 November 2013', 'sl_fourth': '10', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R1 018 300'}]
{'sl_no': '2', 'sl_runs': '41', 'sl_name': 'Marcus A', 'sl_wins': '12', 'sl_level': '-8.70', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '3', 'sl_place': '2', 'sl_second': '3', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '29.27', 'sl_winplace': '48.78', 'date_to': '20 November 2013', 'sl_fourth': '2', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R690 750'}
[{'sl_no': '2', 'sl_runs': '41', 'sl_name': 'Marcus A', 'sl_wins': '12', 'sl_level': '-8.70', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '3', 'sl_place': '2', 'sl_second': '3', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '29.27', 'sl_winplace': '48.78', 'date_to': '20 November 2013', 'sl_fourth': '2', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R690 750'}, {'sl_no': '2', 'sl_runs': '41', 'sl_name': 'Marcus A', 'sl_wins': '12', 'sl_level': '-8.70', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '3', 'sl_place': '2', 'sl_second': '3', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '29.27', 'sl_winplace': '48.78', 'date_to': '20 November 2013', 'sl_fourth': '2', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R690 750'}]
As you can see it prints the first object with sl_no 1, then it adds it to the list
Then it prints the object with sl_no 2 but then both objects in the list is 2, and not 1 and 2 a I would have expected it... So at the end of the iteration the whole list has only the last object in as many times as the iterations were happening??
Why is this happening?

The problem is that obj is the same object each time. You append it to the list json several times, so then that is a list that contains a number of references to the same object. obj changes over time, and then if you print the list you see the same object printed out several times.
Using
obj = tags.copy()
instead makes obj a new object each time (not a mere reference to the same object as tags, but a reference to a new dictionary with the same contents). So changes to this obj only affect this obj.

This behavior happens because of
obj = tags
Actually, you are editing the content of tags on every iteration, resulting on an array with duplicated rows.
To solve your problem, just create a new instance of tags on each iteration.
For example:
obj = []

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Attempting to grab certain Elements - python

If you want to grab text like Sunny. High 66F. Winds WNW at 5 to 10 mph., you can get them from the title attributes of <td>. This should work. td = table.xpath('.//tbody/tr/td[#class="description"]/#title')

Related

Appending item from a list to a Pandas df with rules

how can i scrape data from a url on the network?

Python, Take Multiple Lists and Putting into pd.Dataframe

Python - TypeError: expecting string or bytes object

Appending objects to list in a loop - not what I expected

Categories

Resources