Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
I'm building a scraping spider and I would like some help on how to extract the right information out of each response in Python
response.css(".print-acta-temp::text").get()
'TEMPORADA 2021-2022'
I would like to know how to collect only the 2021-2022. Should I use the str command?
response.css(".print-acta-data::text").get()
'Data: 14-05-2022, 19:00h'
I need to extract only the date into one variable and the time into another variable.
response.css(".print-acta-comp::text").get()
' CADET PRIMERA DIVISIÓ - GRUP 2'
I need to collect the data before the first space, the data collected between the 2 spaces and finally the number into another variable.
response.css(".print-acta-jornada::text").get()
'Jornada 28'
I need to collect the data after the first space.
if you trust the website to produce the data you want exactly followed by 'TEMPORADA ' all the time you can use
tu_string = 'TEMPORADA 2021-2022'
nueva_string = tu_string.replace('TEMPORADA ','')
print (nueva_string)
like, there's regex and all of that, but you can worry about learning that later, tbh.
I need to collect the data before the first space, the data collected
between the 2 spaces and finally the number into another variable.
a simple way to do this is to split
teva_string = 'CADET PRIMERA DIVISIÓ - GRUP 2'
teva_lista = teva_string.split(' ')
print (teva_lista)
Any decision on how to parse a string is going to depend on one's assumptions about what form the strings are going to take. In the particular case of 'TEMPORADA 2021-2022', doing my_string.split(' ')[1] will get the years. 'Data: 14-05-2022, 19:00h'.split(' ') will get the list ['Data: 14-05-2022,, '19:00h'], while 'Data: 14-05-2022, 19:00h'.split('-') will get ['Data: 14-05-2022', ' 19:00h']. You can also use datetime libraries or regular expressions, with the latter allowing for more customization if the form of your data varies.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I wanted to scrape some data from a JSON response. Here is the link
I need the values in lessonTypes. I want to export all the values separated with a comma.
So Theorieopleidingen has 4 values Beroepsopleidingen has 8 and so on.
I want to dynamically scrape so even if the num of values is changing, it always scrapes all with comma seperated.
Sorry if my explanation is week.
Since it's a JSON object, why don't you use just requests and do what(ever) you want (with the data).
For example:
import requests
url = "https://www.cbr.nl/web/show?id=289168&langid=43&channel=json&cachetimeout=-1&elementHolder=289170&ssiObjectClassName=nl.gx.webmanager.cms.layout.PagePart&ssiObjectId=285674&contentid=3780&examtype=B"
for value in requests.get(url).json()['lessonTypes'].values():
print(value)
Output:
['Motor', 'Auto', 'Bromfiets', 'Tractor']
['Bus', 'Aanhangwagen achter bus', 'Vrachtauto', 'Aanhangwagen achter vrachtauto', 'Heftruck', 'ADR', 'Taxi', 'Tractor']
['Aangepaste auto', 'Automaat personenauto']
['Motor', 'Auto', 'Aanhangwagen achter auto', 'Bromfiets', 'Brommobiel']
EDIT:
To access individual keys and their values you might want to try this for example:
import requests
url = "https://www.cbr.nl/web/show?id=289168&langid=43&channel=json&cachetimeout=-1&elementHolder=289170&ssiObjectClassName=nl.gx.webmanager.cms.layout.PagePart&ssiObjectId=285674&contentid=3780&examtype=B"
lesson_types = requests.get(url).json()['lessonTypes']
print(list(lesson_types.keys()))
print("\n".join(lesson_types['Theorieopleidingen']))
Output:
['Theorieopleidingen', 'Beroepsopleidingen', 'Bijzonderheden', 'Praktijkopleidingen']
Motor
Auto
Bromfiets
Tractor
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am having trouble to query my database maybe someone could give me a hand.
I am using a django application so I guess Sqlite3 >> and the output I would like to get is the score value
b = Answer.objects.get(id = 23)
which give me an output of :
<Answer: Answer to questionID '4' : AnswerID '23'>
when I do :
b.values
I get a dict in the form :
['{
"1)Long descriptive text":Score,
"2)Long descriptive text":Score,
"3)Long descriptive text":Score,
"4)Long descriptive text":Score
}']
with score beeing an Integer from 0 to 100 so for example "Long descriptive text":85
I need to extract the score using a query but I can't manage to do it
Normaly for a Dict[key:value] I would do a Dict[key] but here I do not know how to do it
could you give me a hand
Thx you very much
This looks suspiciously like Django If so:
so b = Answer.objects.get(id = 23) is not truely that - what you are seeing is the str function of the Answer when you print it off. because you used .get rather then a .filter you get the object rather then a QuerySet (which you can think of as being a list).
Basically, I suspect you shouldn't be using values, but accessing the data... something like
b = Answer.objects.get(id=..)
b.score
or if you wanted to loop over other answers:
answers = Answer.objects.filter(...)
for a in answers:
a.score
for what the .score is, look in your models.py file - look what parameters is has (things looking like score = models.IntegerField() etc, then you would use a.score)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a problem I'd like to know if it's worth spending the time trying to solve with Python. I have a large CSV file of scientific names of fishes. I would like to cross-reference that CSV file with a large database of fish morphology information (www.fishbase.ca) and have the code return the maximum length of each fish. Basically, I need to create code that will search the fishbase website for each fish, then find the maximum length info on the page and return it to me in a CSV file. The last two parts are relatively straightforward, but the first part is where I'm stuck. Thanks in advance.
It looks like you can generate the url directly from the genus and species, ie
rainbow trout (oncorhynchus mykiss) becomes
http://www.fishbase.ca/summary/Oncorhynchus-mykiss.html
so something like
def make_url(genus, species):
return (
"http://www.fishbase.ca/summary/{}-{}.html"
.format(genus.title(), species.lower())
)
Looking at the page source, the html is severely unsemantic; while parsing html with regular expressions is evil and awful, I really think it's the easiest method in this case:
import re
fishlength = re.compile("max length : ([\d.]+) ([cm]{1,2})", re.I).search
def get_length_in_cm(html):
m = fishlength(html)
if m: # match found
value = float(m.group(1))
unit = m.group(2)
if unit == "cm":
return value
elif unit == "m":
return value * 100.
else:
raise ValueError("Unknown unit: {}".format(unit))
else:
raise ValueError("Length not found")
then grabbing each page,
import csv
import requests
from time import sleep
DELAY = 2
GENUS_COL = 4
SPECIES_COL = 5
with open("fish.csv") as inf:
next(inf) # skip header row
for row in csv.reader(inf):
url = make_url(row[GENUS_COL], row[SPECIES_COL])
# should add error handling, in case
# that page doesn't exist
html = requests.get(url).text
length = get_length_in_cm(html)
# now store the length value somewhere
# be nice, don't pound their site
sleep(DELAY)
So, in order to use the information in other web applications you will need to use an API to get hold of their data.
Fishbase.ca (or .org) does not have an official public-facing API. There is some chat in 2013 about creating a RESTful API which would be just the ticket for what you need, but this hasn't happened yet (don't hold your breath).
An alternative is using the name of the fish you need to lookup, dropping that into the URI (eg www.fishbase.ca/fish/Rainbow+Trout) and then using Xquery or similar to drill down the DOM to find the maximum length.
Unfortunately, fishbase does not have the sort of URIs needed for this method either, this is the URI for Rainbow Trout - uses an ID rather than a name to easily look up.
I would suggest looking into another data provider looking for either of these two APIs.
Regarding the second method: the site owners may not appreicate you using their website in this manner. Ask them beforehand if you can.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
mother_dict=
{'son_dict':{'Name':'Jason','Age':26},'daughter_dict':{'Name':'Emma','Age':19}}
father_dict={}
father_dict['Child']=mother_dict[son_dict]
I need a way to replace father_dict['Child'] with a dictionary from mother_dict based on input.
I've tried deleting the contents of father_dict and replacing them with the contents of mother_dict with .update(), but that of course adds the whole dictionary, I've tried using input() to ask the user for a child, so if they said 'Jason' it would replace 'Child' with son_dict, but when I got into families with ten or so kids there would need to be ten functions, and if the children's names changed then both the functions and the dictionaries would need to be re-written. I'm hung up on using input to grab a specific dictionary from mother_dict and copying it to father_dict.
Maybe something like the following?
choice = ''
mother_dict= {'son_dict':{'Name':'Jason','Age':26},'daughter_dict':{'Name':'Emma','Age':19}}
father_dict = {}
while choice not in mother_dict:
choice = raw_input('Which dict do you want? ')
father_dict[choice] = mother_dict[choice]
This code gets input until the input is valid (it is in mother_dict), and then it adds that input to father_dict.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Hey guys am new to python development..I am studying python on my way
I have just tested a simple code which includes assigning two variables with single at one line
Here is my snippet:
name = 1
somevariable = "hellow am new to python"
print somevariable[name]
And i got an output "e".
I did not understand what it means. I just tried out a random example .Is it allowed to do like this in python .or is it with arrays. Please help me to find an appropriate answer. Any help would be appreciated.
EDIt
Can we store a variable information to other variable in python
For eg
name = 1
age = 2
string = "yeah am a man"
name[age] = stringname = 1
My qus is that can we store the value 1 to age ?..AM new to python ..Sorry for the bad question
First of all you need to read basic of python first, because from your snippet clearly says that you don't know what is mutable and immutable object in python.
And for your question,this name[age] = stringname = 1 is not allowed.
First you will name Error for age after that you will get int object is not allowed for item assignment.
About list:
About Dictionary:
I'm not quite sure what you're trying to achieve, but it sounds a bit like you're trying to store multiple attributes (e.g name and age). If so, you could use a dict. e.g.
# initialise the dict
user = {}
# Add some data
user["name"] = "User"
user["age"] = 1
To retrieve the variables, just use e.g. user["name"]