get length of list per scraped link

get length of list per scraped link - python

I am quite new to Python and I need your professional advice.
What I want in the end is, that I get the lenght of the injury_list per player. The players are stored in PlayerLinks
playerLinks = ['https://www.transfermarkt.de/Serge Gnabry/verletzungen/spieler/159471',
'https://www.transfermarkt.de/Jamal Musiala/verletzungen/spieler/580195',
'https://www.transfermarkt.de/Douglas Costa/verletzungen/spieler/75615',
'https://www.transfermarkt.de/Joshua Kimmich/verletzungen/spieler/161056',
'https://www.transfermarkt.de/Alexander Nübel/verletzungen/spieler/195778',
'https://www.transfermarkt.de/Kingsley Coman/verletzungen/spieler/243714',
'https://www.transfermarkt.de/Christopher Scott/verletzungen/spieler/503162',
'https://www.transfermarkt.de/Corentin Tolisso/verletzungen/spieler/190393',
'https://www.transfermarkt.de/Leon Goretzka/verletzungen/spieler/153084',
'https://www.transfermarkt.de/Javi Martínez/verletzungen/spieler/44017',
'https://www.transfermarkt.de/Tiago Dantas/verletzungen/spieler/429987',
'https://www.transfermarkt.de/Robert Lewandowski/verletzungen/spieler/38253',
'https://www.transfermarkt.de/Lucas Hernández/verletzungen/spieler/281963',
'https://www.transfermarkt.de/Josip Stanisic/verletzungen/spieler/483046',
'https://www.transfermarkt.de/Thomas Müller/verletzungen/spieler/58358',
'https://www.transfermarkt.de/Benjamin Pavard/verletzungen/spieler/353366',
'https://www.transfermarkt.de/Bouna Sarr/verletzungen/spieler/190685',
'https://www.transfermarkt.de/Leroy Sané/verletzungen/spieler/192565',
'https://www.transfermarkt.de/Manuel Neuer/verletzungen/spieler/17259',
'https://www.transfermarkt.de/David Alaba/verletzungen/spieler/59016',
'https://www.transfermarkt.de/Niklas Süle/verletzungen/spieler/166601',
'https://www.transfermarkt.de/Tanguy Nianzou/verletzungen/spieler/538996',
'https://www.transfermarkt.de/Ron-Thorben Hoffmann/verletzungen/spieler/317444',
'https://www.transfermarkt.de/Jérôme Boateng/verletzungen/spieler/26485',
'https://www.transfermarkt.de/Alphonso Davies/verletzungen/spieler/424204',
'https://www.transfermarkt.de/Eric Maxim Choupo-Moting/verletzungen/spieler/45660',
'https://www.transfermarkt.de/Marc Roca/verletzungen/spieler/336869']
injury_list = []
name_list = []
With the code below I get a list of all injuries of all playersLinks.
However, the lists are not from equal size. And I need the name of each player next to the injuries of that specific player.
I tried the following:
However, the lenght of injury_list is a random numebr then and not the number per player.
How do I get instead the lenght of the injury_list by player?
In order that I have the correct names next to the injuries.
for p in range(len(playerLinks)):
page = playerLinks[p]
response = requests.get(page, headers={'User-Agent': 'Custom5'})
print(response.status_code)
injury_data = response.text
soup = BeautifulSoup(injury_data, 'html.parser')
table = soup.find(id="yw1")
injurytypes = table.select("td[class='hauptlink']")
for j in range(len(injurytypes)):
all_injuries = [injury.text for injury in injurytypes]
injury_list.extend(all_injuries)
image = soup.find("div", {"class": "dataBild"})
for j in range(len(image)):
names = image.find("img").get("title")
name_list.append(''.join(names))
name_list_def = name_list * len(injury_list)
Through the img tag I get the names of the players.
Do you have any advice?
Thanks a lot!

player_inj_numb=[]
for url in (playerLinks):
player_name = url.split('/')[3]
response = requests.get(url, headers={'User-Agent': 'Custom5'})
print(response.status_code)
injury_data = response.text
soup = BeautifulSoup(injury_data, 'html.parser')
table = soup.find(id="yw1")
nbInjury = len(table.findAll("tr"))
player_inj_numb.append((player_name,nbInjury-1))
print(player_inj_numb)
which outputs:
[('Serge Gnabry', 15), ('Jamal Musiala', 0), ('Douglas Costa', 15), ('Joshua Kimmich', 12), ('Alexander Nübel', 2), ('Kingsley Coman', 15), ('Christopher Scott', 0), ('Corentin Tolisso', 14), ('Leon Goretzka', 15), ('Javi Martínez', 15), ('Tiago Dantas', 0), ('Robert Lewandowski', 15), ('Lucas Hernández', 15), ('Josip Stanisic', 8), ('Thomas Müller', 13), ('Benjamin Pavard', 9), ('Bouna Sarr', 8), ('Leroy Sané', 11), ('Manuel Neuer', 15), ('David Alaba', 15), ('Niklas Süle', 13), ('Tanguy Nianzou', 4), ('Ron-Thorben Hoffmann', 4), ('Jérôme Boateng', 15), ('Alphonso Davies', 3), ('Eric Maxim Choupo-Moting', 15), ('Marc Roca', 7)]
I got the name from the url, since it is already there, no need for additional scraping.
The number of injurys is equal the number of row in the table minus the first row which is the table header.
Please note that some player have more than 15 injurys so you will need to get the subsequent page in those cases.

Related

Python Reportlab - Wordwrap on Table is splitting words rather than at spaces

I created a PDF in reportlab using a canvas:
self.pdf = canvas.Canvas(f'{file_name}.pdf', pagesize=A4)
I create tables within tables to create my document but one of my tables is not wrapping the way I expect it to. Rather than linebreaking at spaces, it does so between words as seen below.
The code below is the code I used to create the table. It is a bit long as I did make sure that the cells I'm passing into the Table() are all Paragraph().
def _discount_table(self, width_list):
# Table Name
table_name = Paragraph('DISCOUNTS', self.header_style_grey)
# Create Header
header = [Paragraph('NAME', self.table_header_style_left)]
header += [Paragraph(x, self.table_header_style_right) for x in self.unique_discount_list]
header += [Paragraph('TOTAL', self.table_header_style_right)]
# Process Data
discount_data = [[Paragraph(cell, self.table_style2) for cell in row] for row in self.discount_data]
data = [[child_row[0]] + disc for child_row, disc in zip(self.fees_data, discount_data)]
# Create Footer
table_footer = [Paragraph('') for _ in range(len(header) - 2)]
table_footer += [Paragraph('TOTAL', self.table_header_style_right),
Paragraph(f'{self.discount_total:,.2f}', self.table_header_style_right)]
# Create Table
bg_color = self.header_style_grey.textColor
table = Table([header] + data + [table_footer], colWidths=width_list)
table.setStyle([
('GRID', (0, 0), (-1, -1), 1, 'black'),
('BACKGROUND', (0, 0), (-1, 0), bg_color),
('TEXTCOLOR', (0, 0), (-1, 0), 'white'),
('BACKGROUND', (-2, -1), (-1, -1), bg_color),
('TEXTCOLOR', (-2, -1), (-1, -1), 'white'),
('FONTNAME', (-2, -1), (-1, -1), 'Helvetica-Bold'),
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
('ALIGN', (1, 0), (-1, -1), 'RIGHT'),
('ROWBACKGROUNDS', (0, 1), (-1, -2), ['lightgrey', 'white']),
])
return [table_name, table]
(To note that child_row[0] is already a Paragraph - this is found on the line 12 above)
The styling I used is imported from another python file as follows:
self.table_style2 = ParagraphStyle('table_style')
self.table_style2.wordWrap = 'CJK'
self.table_style2.alignment = TA_RIGHT
self.table_style = ParagraphStyle('table_style')
self.table_style.wordWrap = 'CJK'
self.table_header_style_right = ParagraphStyle('table_header_style', self.table_style)
self.table_header_style_right.textColor = colors.HexColor('#FFFFFF')
self.table_header_style_right.fontName = 'Helvetica-Bold'
self.table_header_style_right.alignment = TA_RIGHT
self.table_header_style_right.wordWrap = 'CJK'
self.table_header_style_left = ParagraphStyle('table_header_style', self.table_style)
self.table_header_style_left.textColor = colors.HexColor('#FFFFFF')
self.table_header_style_left.fontName = 'Helvetica-Bold'
self.table_header_style_left.wordWrap = 'CJK'
So I am really lost and need help. Why is the table not wrapping correctly?

I was able to fix the table wrap issue when I removed the wordWrap = 'CJK' portion of the code. I saw in a video that a Paragraph() will automatically wordWrap so I'm guessing there was some issue with how those two elements overlap
self.table_style2 = ParagraphStyle('table_style')
# self.table_style2.wordWrap = 'CJK'
self.table_style2.alignment = TA_RIGHT
self.table_style = ParagraphStyle('table_style')
# self.table_style.wordWrap = 'CJK'
self.table_header_style_right = ParagraphStyle('table_header_style', self.table_style)
self.table_header_style_right.textColor = colors.HexColor('#FFFFFF')
self.table_header_style_right.fontName = 'Helvetica-Bold'
self.table_header_style_right.alignment = TA_RIGHT
# self.table_header_style_right.wordWrap = 'CJK'
self.table_header_style_left = ParagraphStyle('table_header_style', self.table_style)
self.table_header_style_left.textColor = colors.HexColor('#FFFFFF')
self.table_header_style_left.fontName = 'Helvetica-Bold'
# self.table_header_style_left.wordWrap = 'CJK'

Issue when finding element from a zipped list will automatic empty the zipped list

I am trying to search an typed item from QTlistwidget, and when selected it will then search again to see if exist from the ziped list.
However, the first time running it will able to find the match. The second time i select, the zip_cmsdata will return me an empty list
For example:
1st time search "Hello", and then selected "Hello" from the Qtlistwdiget
output:
item click
[('Hello', 1, 'US', None)]
If i click (select) Hello from the QTlistwdiget again, or search other items. I will get the following output
item click
[]
I am guessing the cmsdata were only added once, so the second time it got cleared. If this is true, how can i solve it?
Please find my code below:
from asyncio.windows_events import NULL
from re import A
import sys
from PyQt5.uic import loadUi
from PyQt5 import QtWidgets, QtCore
from PyQt5.QtWidgets import QDialog, QApplication, QStackedWidget, QWidget, QTabWidget, QTableView, QListView
from PyQt5.QtGui import QPixmap
from PyQt5.QtCore import QStringListModel
import requests
import images
import json
cmsdata = []
cmsdisplayname = []
cid = []
cmscountry = []
parentId = []
zip_cmsdata = []
class PerfectConsoleSearch(QDialog):
def __init__(self):
super(PerfectConsoleSearch, self).__init__()
loadUi("pfsearch.ui",self)
with open('cmsdata.json') as json_file:
data = json.load(json_file)
global cmsdata
cmsdata = data
#print(data)
#print(cmsdata)
#print(type(cmsdata))
#self.login_button.clicked.connect(self.gotologin)
#self.cms_display_list.setStringList(self.cmsdata)
#self.cms_display_list.setModel(main.cmsdata)
self.load_cmslist()
print(self.customer_searchbar.text())
self.customer_searchbar.textChanged.connect(lambda: self.Search(self.customer_searchbar.text()))
self.cms_display_list.itemClicked.connect(self.show_select_item)
def load_cmslist(self):
n = 0
a=[]
#print(cmsdata['customerList'][0]['displayName'])
while n < len(cmsdata['customerList']):
#print(cmsdata['customerList'][n]['displayName'])
cmsdisplay_name = cmsdata['customerList'][n]['displayName']
cms_cid = cmsdata['customerList'][n]['cid']
cms_country = cmsdata['customerList'][n]['country']
cms_parentId = cmsdata['customerList'][n]['parentId']
a.append(cmsdisplay_name)
global cmsdisplayname, cid, cmscountry, parentId
cmsdisplayname.append(cmsdisplay_name)
cid.append(cms_cid)
cmscountry.append(cms_country)
parentId.append(cms_parentId)
self.cms_hidden_list.addItem(str(cmsdisplay_name))
n=n+1
global zip_cmsdata
zip_cmsdata = zip(cmsdisplayname,cid,cmscountry,parentId)
def Search(self, text):
#print(text)
#print(cmsdisplayname)
#print(self.cms_display_list.count)
self.cms_display_list.clear()
items = self.cms_hidden_list.findItems(text, QtCore.Qt.MatchFlag.MatchContains)
#print(items)
#print(type(items))
for i in items:
#print(i.text())
self.cms_display_list.addItem(i.text())
def show_select_item(self):
print('item click')
item = self.cms_display_list.selectedItems()[0]
print(item.text())
#self.label.setText(text_0 + item.text())
#Find account info
print([i for i in zip_cmsdata if item.text() in i])

A zip object is an iterable, which means that it can only be iterated over once. Since it is a global, it is never renewed and returns nothing.
a = [i for i in range(0,10)]
b = [i for i in range(20,30)]
Z = zip(a,b)
print("first run")
for i in Z:
print(i)
print("second run")
for i in Z:
print(i)
Output:
first run
(0, 20)
(1, 21)
(2, 22)
(3, 23)
(4, 24)
(5, 25)
(6, 26)
(7, 27)
(8, 28)
(9, 29)
second run
You can fix this by regenerating your zip object:
a = [i for i in range(0,10)]
b = [i for i in range(20,30)]
print("first run")
Z = zip(a,b)
for i in Z:
print(i)
print("second run")
Z = zip(a,b)
for i in Z:
print(i)
Ouput:
first run
(0, 20)
(1, 21)
(2, 22)
(3, 23)
(4, 24)
(5, 25)
(6, 26)
(7, 27)
(8, 28)
(9, 29)
second run
(0, 20)
(1, 21)
(2, 22)
(3, 23)
(4, 24)
(5, 25)
(6, 26)
(7, 27)
(8, 28)
(9, 29)

Collecting places using Python and Google Places API

I want to collect the places around my city, Pekanbaru, with latlong (0.507068, 101.447777) and I will convert it to the dataset. Dataset (it contains place_name, place_id, lat, long and type columns).
Below is the script that I tried.
import json
import urllib.request as url_req
import time
import pandas as pd
NATAL_CENTER = (0.507068,101.447777)
API_KEY = 'API'
API_NEARBY_SEARCH_URL = 'https://maps.googleapis.com/maps/api/place/nearbysearch/json'
RADIUS = 30000
PLACES_TYPES = [('airport', 1), ('bank', 2)] ## TESTING
# PLACES_TYPES = [('airport', 1), ('bank', 2), ('bar', 3), ('beauty_salon', 3), ('book_store', 1), ('cafe', 1), ('church', 3), ('doctor', 3), ('dentist', 2), ('gym', 3), ('hair_care', 3), ('hospital', 2), ('pharmacy', 3), ('pet_store', 1), ('night_club', 2), ('movie_theater', 1), ('school', 3), ('shopping_mall', 1), ('supermarket', 3), ('store', 3)]
def request_api(url):
response = url_req.urlopen(url)
json_raw = response.read()
json_data = json.loads(json_raw)
return json_data
def get_places(types, pages):
location = str(NATAL_CENTER[0]) + "," + str(NATAL_CENTER[1])
mounted_url = ('%s'
'?location=%s'
'&radius=%s'
'&type=%s'
'&key=%s') % (API_NEARBY_SEARCH_URL, location, RADIUS, types, API_KEY)
results = []
next_page_token = None
if pages == None: pages = 1
for num_page in range(pages):
if num_page == 0:
api_response = request_api(mounted_url)
results = results + api_response['results']
else:
page_url = ('%s'
'?key=%s'
'&pagetoken=%s') % (API_NEARBY_SEARCH_URL, API_KEY, next_page_token)
api_response = request_api(str(page_url))
results += api_response['results']
if 'next_page_token' in api_response:
next_page_token = api_response['next_page_token']
else: break
time.sleep(1)
return results
def parse_place_to_list(place, type_name):
# Using name, place_id, lat, lng, rating, type_name
return [
place['name'],
place['place_id'],
place['geometry']['location']['lat'],
place['geometry']['location']['lng'],
type_name
]
def mount_dataset():
data = []
for place_type in PLACES_TYPES:
type_name = place_type[0]
type_pages = place_type[1]
print("Getting into " + type_name)
result = get_places(type_name, type_pages)
result_parsed = list(map(lambda x: parse_place_to_list(x, type_name), result))
data += result_parsed
dataframe = pd.DataFrame(data, columns=['place_name', 'place_id', 'lat', 'lng', 'type'])
dataframe.to_csv('places.csv')
mount_dataset()
But the script returned with Empty DataFrame.
How to solve and got the right Dataset?

I am afraid the scraping of the data and storing it is prohibited by the Terms of Service of Google Maps Platform.
Have a look at the Terms of Service prior to advance with the implementation. The paragraph 3.2.4 'Restrictions Against Misusing the Services' reads
(a) No Scraping. Customer will not extract, export, or otherwise scrape Google Maps Content for use outside the Services. For example, Customer will not: (i) pre-fetch, index, store, reshare, or rehost Google Maps Content outside the services; (ii) bulk download Google Maps tiles, Street View images, geocodes, directions, distance matrix results, roads information, places information, elevation values, and time zone details; (iii) copy and save business names, addresses, or user reviews; or (iv) use Google Maps Content with text-to-speech services.
source: https://cloud.google.com/maps-platform/terms/#3-license
Sorry to be bearer of bad news.

Compare datetime with Django birthday objects

I have a question about my script. I want to know all people who have more than 16 years from my Database. I want to check this when user triggers the function.
I have this function :
def Recensement_array(request) :
date = datetime.now().year
print date # I get year from now
birthday = Identity.objects.values_list('birthday', flat=True) # Return list with all birthday values
for element in birthday :
if date - element < 117 :
print "ok < 117"
else :
print "ok > 117"
From print date I get :
2017
From print birthday I get :
<QuerySet [datetime.date(1991, 12, 23), datetime.date(1900, 9, 12), datetime.date(1900, 9, 12), datetime.date(1900, 9, 12), datetime.date(1900, 9, 12), datetime.date(1089, 9, 22), datetime.date(1900, 9, 12), datetime.date(1900, 9, 12), datetime.date(1089, 9, 22), datetime.date(1089, 9, 22), datetime.date(1089, 9, 22), datetime.date(1089, 9, 22), datetime.date(1990, 12, 12)]>
So my goal is to substract date with birthday and compare if date - birthday = 16 years, I print element, else nothing.
I get two problems :
How extract only year from birthday ?
Then the comparison method is between int and tuple up to now. If I could extract only year from birthday, it should work right ?
Thank you
EDIT :
For example I want to get all people who had 16 years old since the begining of this year or will get 16 years old before the first year :
def Recensement_array(request) :
today = datetime.now()
age_16 = (today - relativedelta(years=16))
result = Identity.objects.filter(birthday__range=[age_16, today]).order_by('lastname')
paginator = Paginator(result, 3)
page = request.GET.get('page', 1)
try:
result = paginator.page(page)
except PageNotAnInteger:
result = paginator.page(1)
except EmptyPage:
result = paginator.page(paginator.num_pages)
context = {
"Identity":Identity,
"age_16":age_16,
"datetime" : datetime,
"result" : result,
"PageNotAnInteger":PageNotAnInteger,
}
return render(request, 'Recensement_resume.html', context)

If you need filter records with some specific year you can just use __year method of date field:
age_16 = (today - relativedelta(years=16))
result = Identity.objects.filter(birthday__year=age_16.year).order_by‌('last‌name')

Django models using conditions

My models.py looks like following:
class Exercise(models.Model):
#Field for storing exercise type
EXERCISE_TYPE_CHOICES = (
(1, 'Best stretch'),
(2, 'Butterfly reverse'),
(3, 'Squat row'),
(4, 'Plank'),
(5, 'Push up'),
(6, 'Side plank'),
(7, 'Squat'),
)
exercise_type = models.IntegerField(choices=EXERCISE_TYPE_CHOICES)
#Field for storing exercise name
-- Here comes the logic --
#Field for storing intensity level
INTENSITY_LEVEL_CHOICES = (
(1, 'Really simple'),
(2, 'Rather Simple'),
(3, 'Simple'),
(4, 'Okay'),
(5, 'Difficult'),
(6, 'Rather Difficult'),
(7, 'Really Difficult'),
)
intensity_level = models.IntegerField(choices=INTENSITY_LEVEL_CHOICES)
#Field for storing video url for a particular exercise
video_url = models.URLField()
#Field for storing description of the exercise
description = models.CharField(max_length=500)
I want to have a field called 'exercise_name' for Class Exercise, but in the following way:
For exercise_type=1 it should be 'Best stretch'
For exercise_type=1 it should be 'Butterfly reverse' and so on.
How can I achieve this? Or if not this way, is there a better way to do this?
Bottom line: My Exercise should have following fields- Type, Name, Description, Video_url

If you want to get string representation based on exercise_type, simply use get_exercise_type_display(). It will returns based on EXERCISE_TYPE_CHOICES.
ex = Exercise(exercise_type=1, intensity_level=1)
ex.get_exercise_type_display() # => 'Best stretch'
ex.get_intensity_level_display() # => 'Really simple'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

get length of list per scraped link - python

Related

Python Reportlab - Wordwrap on Table is splitting words rather than at spaces

Issue when finding element from a zipped list will automatic empty the zipped list

Collecting places using Python and Google Places API

Compare datetime with Django birthday objects

Django models using conditions

Categories

Resources