how can i scrape data from a url on the network?

how can i scrape data from a url on the network? - python

Here i would like to create a program which scrapes data from https://www.futbin.com/21/player/560/aubameyang located at the bottom of the page is the daily and hourly graph sections , the hourly graph is what i want which can be found in the network section of the inspect element which is called https://www.futbin.com/21/playerPrices?player=188567&rids=84074647&_=1608811830598 this gives me a list for all platforms (ps,xbox,pc) of the recent sales history using the LCPrice , LCPrice2 etc... That is what id like to scrape/extract.
Each player are also used by an id in this example for this player the id is 188567 found via the network tab which gives a list of prices , my current code is this :
it doesn't print/give back anything any help would be appreciated
import requests
from datetime import datetime
player_ids = {
'Arturo Vidal': 181872,
'Pierre-Emerick Aubameyang': 188567,
'Robert Lewandowski': 188545,
'Jerome Boateng': 183907,
'Sergio Ramos': 155862,
'Antoine Griezmann': 194765,
'David Alaba': 197445,
'Paulo Dybala': 211110,
'Radja Nainggolan': 178518
}
for (name,id) in player_ids.items():
r = requests.get('https://www.futbin.com/21/playerPrices?player={0}'.format(id))
data = r.json()
print(name)
print("-"*20)
#Change ps to xbox or pc to get other prices
for price in data['ps']:
price = price[1]
print(price)

Question should be improved, but based on my understanding you searching for something like the following example.
What makes the difference
Accessing the data for player and console the right way
data[str(id)]['prices']['ps'].values()
Example:
import requests
from datetime import datetime
player_ids = {
'Arturo Vidal': 181872,
'Pierre-Emerick Aubameyang': 188567,
'Robert Lewandowski': 188545,
'Jerome Boateng': 183907,
'Sergio Ramos': 155862,
'Antoine Griezmann': 194765,
'David Alaba': 197445,
'Paulo Dybala': 211110,
'Radja Nainggolan': 178518
}
for (name,id) in player_ids.items():
r = requests.get('https://www.futbin.com/21/playerPrices?player={0}'.format(id))
data = r.json()
print(name)
print("-"*20)
psPrices = list(data[str(id)]['prices']['ps'].values())
print(psPrices)
xboxPrices = list(data[str(id)]['prices']['xbox'].values())
print(xboxPrices)
Output:
Arturo Vidal
--------------------
['0', '0', '0', '0', '0', '10 weeks ago', '3,600', '65,000', '0']
['0', '0', '0', '0', '0', '10 weeks ago', '2,100', '37,500', '100']
Pierre-Emerick Aubameyang
--------------------
['59,000', '59,000', '0', '0', '0', '13 mins ago', '12,250', '230,000', '21']
['57,000', '57,500', '58,000', '58,000', '58,000', '14 mins ago', '11,000', '210,000', '23']
Robert Lewandowski
--------------------
['72,500', '72,500', '72,500', '72,500', '72,500', '14 mins ago', '6,000', '110,000', '63']
['73,500', '73,500', '73,500', '73,500', '73,500', '2 mins ago', '7,400', '140,000', '49']
Jerome Boateng
--------------------
['1,400', '1,400', '1,400', '1,400', '1,400', '15 mins ago', '700', '10,000', '7']
['1,300', '1,300', '1,300', '1,300', '1,300', '4 mins ago', '700', '10,000', '6']
Sergio Ramos
--------------------
['50,000', '50,500', '50,500', '50,500', '50,500', '19 mins ago', '8,000', '150,000', '29']
['51,000', '51,000', '51,000', '51,000', '0', '15 mins ago', '7,200', '140,000', '32']
Antoine Griezmann
--------------------
['29,250', '29,250', '29,250', '29,250', '29,250', '35 mins ago', '2,800', '50,000', '56']
['32,750', '32,750', '33,000', '33,000', '33,000', '37 mins ago', '2,900', '55,000', '57']
David Alaba
--------------------
['0', '0', '0', '0', '0', '14 mins ago', '700', '10,000', '100']
['0', '0', '0', '0', '0', '16 mins ago', '700', '11,000', '100']
Paulo Dybala
--------------------
['36,000', '36,000', '36,000', '36,250', '36,500', '19 mins ago', '3,600', '65,000', '52']
['37,500', '37,500', '37,500', '38,000', '38,000', '1 min ago', '3,100', '55,000', '66']
Radja Nainggolan
--------------------
['2,100', '2,100', '2,100', '2,100', '2,100', '21 mins ago', '700', '10,000', '15']
['1,900', '1,900', '1,900', '1,900', '1,900', '32 mins ago', '700', '10,000', '12']

Related

Python, Take Multiple Lists and Putting into pd.Dataframe

I have seen a variety of answers to this question (like this one), and have had no success in getting my lists into one dataframe. I have one header list (meant to be column headers), and then a variable that has multiple records in it:
list1 = ['Rank', 'Athlete', 'Distance', 'Runs', 'Longest', 'Avg. Pace', 'Elev. Gain']
list2 = (['1', 'Jack', '57.4 km', '4', '21.7 km', '5:57 /km', '994 m']
['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m']
['3', 'Kelsey', '32.6 km', '2', '21.3 km', '5:46 /km', '141 m'])
When I try something like:
df = pd.DataFrame(list(zip(['1', 'Jack, '57.4 km', '4', '21.7 km', '5:57 /km', '994 m'],
# ['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m'])))
It lists all the attributes as their own rows, like so:
0 1
0 1 2
1 Jack Jill
2 57.4 km 34.0 km
3 4 2
4 21.7 km 17.9 km
5 5:57 /km 5:27 /km
6 994 m 152 m
How do I get this into a frame that has list1 as the headers, and the rest of the data neatly squared away?

Given
list1 = ['Rank', 'Athlete', 'Distance', 'Runs', 'Longest', 'Avg. Pace', 'Elev. Gain']
list2 = (['1', 'Jack', '57.4 km', '4', '21.7 km', '5:57 /km', '994 m'],
['2', 'Jill', '34.0 km', '2', '17.9 km', '5:27 /km', '152 m'],
['3', 'Kelsey', '32.6 km', '2', '21.3 km', '5:46 /km', '141 m'])
do
pd.DataFrame(list2, columns=list1)
which returns
Rank Athlete Distance Runs Longest Avg. Pace Elev. Gain
0 1 Jack 57.4 km 4 21.7 km 5:57 /km 994 m
1 2 Jill 34.0 km 2 17.9 km 5:27 /km 152 m
2 3 Kelsey 32.6 km 2 21.3 km 5:46 /km 141 m

Change your second list into a list of lists and then
df = pd.DataFrame(columns = list1, data = list2)

Attempting to grab certain Elements

I am new to lxml module in Python.
I am trying to parse data from a website: https://weather.com/weather/tenday/l/USCA1037:1:US
I am trying to grab the text of :
<span classname="narrative" class="narrative">
Cloudy. Low 49F. Winds WNW at 10 to 20 mph.
</span>
However, I am getting my xpath all mixed up.
To be exact, the location of this line is
//*[#id="twc-scrollabe"]/table/tbody/tr[4]/td[2]/span
I've attempted as the following
import requests
import lxml.html
from lxml import etree
html = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")
element_object = lxml.html.fromstring(html.content) # htmlelement object returns bytes
# element_object has root of <html>
table = element_object.xpath('//div[#class="twc-table-scroller"]')[0]
day_of_week = table.xpath('.//span[#class="date-time"]/text()') # returns list of items from "dates-time"
dates = table.xpath('.//span[#class="day-detail clearfix"]/text()')
td = table.xpath('.//tbody/tr/td/span[contains(#class, "narrative")]')
print td
# print td displays an empty list.
I would like my program to also parse "Cloudy. Low 49F. Winds WNW at 10 to 20 mph."
Please help...

Some <td> have title= with description
import requests
import lxml.html
html = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")
element_object = lxml.html.fromstring(html.content)
table = element_object.xpath('//div[#class="twc-table-scroller"]')[0]
td = table.xpath('.//tr/td[#class="twc-sticky-col"]/#title')
print(td)
Result
['Mostly cloudy skies early, then partly cloudy after midnight. Low 48F. Winds SSW at 5 to 10 mph.',
'Mainly sunny. High 66F. Winds WNW at 5 to 10 mph.',
'Sunny. High 71F. Winds NW at 5 to 10 mph.',
'A mainly sunny sky. High 69F. Winds W at 5 to 10 mph.',
'Some clouds in the morning will give way to mainly sunny skies for the afternoon. High 67F. Winds WSW at 5 to 10 mph.',
'Considerable clouds early. Some decrease in clouds later in the day. High 67F. Winds WSW at 5 to 10 mph.',
'Partly cloudy. High near 65F. Winds WSW at 5 to 10 mph.',
'Cloudy skies early, then partly cloudy in the afternoon. High 61F. Winds WSW at 10 to 20 mph.',
'Sunny skies. High 62F. Winds WNW at 10 to 20 mph.',
'Mainly sunny. High 61F. Winds WNW at 10 to 20 mph.',
'Sunny along with a few clouds. High 64F. Winds WNW at 10 to 15 mph.',
'Mostly sunny skies. High around 65F. Winds WNW at 10 to 15 mph.',
'Mostly sunny skies. High 66F. Winds WNW at 10 to 20 mph.',
'Mainly sunny. High around 65F. Winds WNW at 10 to 20 mph.',
'A mainly sunny sky. High around 65F. Winds WNW at 10 to 20 mph.']
There is no <tbody> in HTML but web browser may display it in DevTool - so don't use tbody in xpath.
Some text is in <span></span> but some in <span><span></span></span>
import requests
import lxml.html
html = requests.get("https://weather.com/weather/tenday/l/USCA1037:1:US")
element_object = lxml.html.fromstring(html.content)
table = element_object.xpath('//div[#class="twc-table-scroller"]')[0]
td = table.xpath('.//tr/td//span/text()')
print(td)
Result
['Tonight', 'APR 21', 'Partly Cloudy', '--', '48', '10', '%', 'SSW 7 mph ', '85', '%',
'Mon', 'APR 22', 'Sunny', '66', '51', '10', '%', 'WNW 9 mph ', '67', '%',
'Tue', 'APR 23', 'Sunny', '71', '53', '0', '%', 'NW 8 mph ', '59', '%',
'Wed', 'APR 24', 'Sunny', '69', '52', '10', '%', 'W 9 mph ', '71', '%',
'Thu', 'APR 25', 'Partly Cloudy', '67', '51', '10', '%', 'WSW 9 mph ', '71', '%',
'Fri', 'APR 26', 'Partly Cloudy', '67', '51', '10', '%', 'WSW 9 mph ', '69', '%',
'Sat', 'APR 27', 'Partly Cloudy', '65', '50', '10', '%', 'WSW 9 mph ', '71', '%',
'Sun', 'APR 28', 'AM Clouds/PM Sun', '61', '49', '20', '%', 'WSW 13 mph ', '75', '%',
'Mon', 'APR 29', 'Sunny', '62', '48', '10', '%', 'WNW 14 mph ', '63', '%',
'Tue', 'APR 30', 'Sunny', '61', '49', '0', '%', 'WNW 14 mph ', '61', '%',
'Wed', 'MAY 1', 'Mostly Sunny', '64', '50', '0', '%', 'WNW 12 mph ', '60', '%',
'Thu', 'MAY 2', 'Mostly Sunny', '65', '50', '0', '%', 'WNW 12 mph ', '61', '%',
'Fri', 'MAY 3', 'Mostly Sunny', '66', '51', '0', '%', 'WNW 13 mph ', '61', '%',
'Sat', 'MAY 4', 'Sunny', '65', '51', '0', '%', 'WNW 14 mph ', '62', '%',
'Sun', 'MAY 5', 'Sunny', '65', '51', '0', '%', 'WNW 14 mph ', '63', '%']

If you want to grab text like Sunny. High 66F. Winds WNW at 5 to 10 mph., you can get them from the title attributes of <td>.
This should work.
td = table.xpath('.//tbody/tr/td[#class="description"]/#title')

Sorting a list of strings with a specific method

let's say i have a list of strings like this
L = ['5', '3', '4', '1', '2', '2 3 5', '2 4 8', '5 22 1 37', '5 22 1 22', '5 22 1 23', ....]
How can i sort this list so that i would have something like this:
L = ['1', '2', '3','4', '5', '2 3 5', '2 4 8', '5 22 1 22', ' 5 22 1 23', '5 22 1 37', ...]
basically i need to order the list based on the first different number between 2 strings

You could sort using a tuple:
L = ['5', '3', '4', '1', '2', '2 3 5', '2 4 8', '5 22 1 37', '5 22 1 22', '5 22 1 23']
result = sorted(L, key=lambda x: (len(x.split()),) + tuple(map(int, x.split())))
print(result)
Output
['1', '2', '3', '4', '5', '2 3 5', '2 4 8', '5 22 1 22', '5 22 1 23', '5 22 1 37']
The idea is to use as key a tuple where the first element is the amount of numbers in the string and the rest is the tuple of numbers. For example for '2 3 5' the key is (3, 2, 3, 5)
As suggested by #PM2Ring you could use a def function instead of a lambda:
def key(x):
numbers = tuple(map(int, x.split()))
return (len(numbers),) + numbers

A slightly different approach than #Daniel's one.
idx = sorted(range(len(L)), key=lambda i: int(''.join(L[i].split())))
L = [L[i] for i in idx]
output
['1',
'2',
'3',
'4',
'5',
'2 3 5',
'2 4 8',
'5 22 1 22',
'5 22 1 23',
'5 22 1 37']

How to sort a list of strings which contain space-separated numbers? [duplicate]

This question already has answers here:
Sorting a list of strings with a specific method
(2 answers)
Closed 4 years ago.
For example: list
['14', '15 20 1', '17', '10 25 40 3', '8']
This list must be sorted in ascending order with respect to its length and, with the same length, ordered in increasing order with respect to the first number in which they differ
This is the list I expect:
['8', '14', '17', '15 20 1', '10 25 40 3']
How can I sort this?
I tried to transform the list of strings into a list of list but to no avail:
l.sort(key=lambda x: (len(x),len(x[0]),x))
The problem is when I have a list like this:
['10 11 12 13 4','10 11 12 13 2']
The length is the same but the last number is smaller.

def cf(k):
t = tuple(map(int, k.split()))
return len(t), t
x = ['14', '15 20 1', '17', '10 25 40 3', '8']
x.sort(key=cf)
Output:
['8', '14', '17', '15 20 1', '10 25 40 3']

You could use sorted and use key so that items are sorted according to their length and order respectively:
sorted(s, key = lambda x: (len(x), list(map(int,x.split()))))
['8', '14', '17', '15 20 1', '10 25 40 3']
A clearer example:
s = ['12 1 3', '1', '0', '10 2', '10 3', '12 3 1 ', '12 1 2']
sorted(s, key = lambda x: (len(x), list(map(int,x.split()))))
['0', '1', '10 2', '10 3', '12 1 2', '12 1 3', '12 3 1 ']

Appending objects to list in a loop - not what I expected

I am appending a object to a list like this:
json_object = []
nodes = soup.findAll(params["node_name"])
for node in nodes:
obj = tags
for element in node:
if element.name != None:
obj[element.name] = str(element.text)
print obj
json_object.append(obj)
print json_object
Here is the output of the first two iterations:
{'sl_no': '1', 'sl_runs': '98', 'sl_name': 'Khumalo S', 'sl_wins': '12', 'sl_level': '-19.30', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '12', 'sl_place': '8', 'sl_second': '16', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '12.24', 'sl_winplace': '51.02', 'date_to': '20 November 2013', 'sl_fourth': '10', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R1 018 300'}
[{'sl_no': '1', 'sl_runs': '98', 'sl_name': 'Khumalo S', 'sl_wins': '12', 'sl_level': '-19.30', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '12', 'sl_place': '8', 'sl_second': '16', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '12.24', 'sl_winplace': '51.02', 'date_to': '20 November 2013', 'sl_fourth': '10', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R1 018 300'}]
{'sl_no': '2', 'sl_runs': '41', 'sl_name': 'Marcus A', 'sl_wins': '12', 'sl_level': '-8.70', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '3', 'sl_place': '2', 'sl_second': '3', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '29.27', 'sl_winplace': '48.78', 'date_to': '20 November 2013', 'sl_fourth': '2', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R690 750'}
[{'sl_no': '2', 'sl_runs': '41', 'sl_name': 'Marcus A', 'sl_wins': '12', 'sl_level': '-8.70', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '3', 'sl_place': '2', 'sl_second': '3', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '29.27', 'sl_winplace': '48.78', 'date_to': '20 November 2013', 'sl_fourth': '2', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R690 750'}, {'sl_no': '2', 'sl_runs': '41', 'sl_name': 'Marcus A', 'sl_wins': '12', 'sl_level': '-8.70', 'country': 'SA', 'date_from': '01 November 2013', 'sl_third': '3', 'sl_place': '2', 'sl_second': '3', 'stat_desc': u'Top Jockeys in South Africa ONLY 01 November 2013 to 20 November 2013', 'sl_wins_pc': '29.27', 'sl_winplace': '48.78', 'date_to': '20 November 2013', 'sl_fourth': '2', 'stat_type': u'Jockeys', 'region': 'South Africa ONLY', 'sl_stake_earned': 'R690 750'}]
As you can see it prints the first object with sl_no 1, then it adds it to the list
Then it prints the object with sl_no 2 but then both objects in the list is 2, and not 1 and 2 a I would have expected it... So at the end of the iteration the whole list has only the last object in as many times as the iterations were happening??
Why is this happening?

The problem is that obj is the same object each time. You append it to the list json several times, so then that is a list that contains a number of references to the same object. obj changes over time, and then if you print the list you see the same object printed out several times.
Using
obj = tags.copy()
instead makes obj a new object each time (not a mere reference to the same object as tags, but a reference to a new dictionary with the same contents). So changes to this obj only affect this obj.

This behavior happens because of
obj = tags
Actually, you are editing the content of tags on every iteration, resulting on an array with duplicated rows.
To solve your problem, just create a new instance of tags on each iteration.
For example:
obj = []

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how can i scrape data from a url on the network? - python

Related

Python, Take Multiple Lists and Putting into pd.Dataframe

Attempting to grab certain Elements

Sorting a list of strings with a specific method

How to sort a list of strings which contain space-separated numbers? [duplicate]

Appending objects to list in a loop - not what I expected

Categories

Resources