Dynamically scrape JSON values [closed]

Dynamically scrape JSON values [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I wanted to scrape some data from a JSON response. Here is the link
I need the values in lessonTypes. I want to export all the values separated with a comma.
So Theorieopleidingen has 4 values Beroepsopleidingen has 8 and so on.
I want to dynamically scrape so even if the num of values is changing, it always scrapes all with comma seperated.
Sorry if my explanation is week.

Since it's a JSON object, why don't you use just requests and do what(ever) you want (with the data).
For example:
import requests
url = "https://www.cbr.nl/web/show?id=289168&langid=43&channel=json&cachetimeout=-1&elementHolder=289170&ssiObjectClassName=nl.gx.webmanager.cms.layout.PagePart&ssiObjectId=285674&contentid=3780&examtype=B"
for value in requests.get(url).json()['lessonTypes'].values():
print(value)
Output:
['Motor', 'Auto', 'Bromfiets', 'Tractor']
['Bus', 'Aanhangwagen achter bus', 'Vrachtauto', 'Aanhangwagen achter vrachtauto', 'Heftruck', 'ADR', 'Taxi', 'Tractor']
['Aangepaste auto', 'Automaat personenauto']
['Motor', 'Auto', 'Aanhangwagen achter auto', 'Bromfiets', 'Brommobiel']
EDIT:
To access individual keys and their values you might want to try this for example:
import requests
url = "https://www.cbr.nl/web/show?id=289168&langid=43&channel=json&cachetimeout=-1&elementHolder=289170&ssiObjectClassName=nl.gx.webmanager.cms.layout.PagePart&ssiObjectId=285674&contentid=3780&examtype=B"
lesson_types = requests.get(url).json()['lessonTypes']
print(list(lesson_types.keys()))
print("\n".join(lesson_types['Theorieopleidingen']))
Output:
['Theorieopleidingen', 'Beroepsopleidingen', 'Bijzonderheden', 'Praktijkopleidingen']
Motor
Auto
Bromfiets
Tractor

Related

python soup response parsing header and value [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed yesterday.
Improve this question
i have soup response text with multiple group and sub groups.
i want to get automatic all groups and their values .
how can i do it ?
In the end, I want to get the title and the value for each group. The best thing for me is for each group to have its values separately.
OrderedDict([('#id',
'boic'),
('mc:id',
'boic'),
('mc:ocb-conditions',
OrderedDict([('mc:rule-deactivated',
'true'),
('mc:international',
'true')])),
('mc:cb-actions',
OrderedDict([('mc:allow',
'false')]))])
My goal is to get to a state where I get the following output:
'#id','boic'
'mc:id','boic'
'mc:ocb-conditions'
'mc:rule-deactivated','true'
'mc:international', 'true'
'mc:cb-actions'
'mc:allow','false'
i try to use
' '.join(BeautifulSoup(soup_response, "html.parser").findAll(text=True))
and got all values But I'm missing the titles of the values.

String Operations and Manipulation in Python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 months ago.
Improve this question
I'm building a scraping spider and I would like some help on how to extract the right information out of each response in Python
response.css(".print-acta-temp::text").get()
'TEMPORADA 2021-2022'
I would like to know how to collect only the 2021-2022. Should I use the str command?
response.css(".print-acta-data::text").get()
'Data: 14-05-2022, 19:00h'
I need to extract only the date into one variable and the time into another variable.
response.css(".print-acta-comp::text").get()
' CADET PRIMERA DIVISIÓ - GRUP 2'
I need to collect the data before the first space, the data collected between the 2 spaces and finally the number into another variable.
response.css(".print-acta-jornada::text").get()
'Jornada 28'
I need to collect the data after the first space.

if you trust the website to produce the data you want exactly followed by 'TEMPORADA ' all the time you can use
tu_string = 'TEMPORADA 2021-2022'
nueva_string = tu_string.replace('TEMPORADA ','')
print (nueva_string)
like, there's regex and all of that, but you can worry about learning that later, tbh.
I need to collect the data before the first space, the data collected
between the 2 spaces and finally the number into another variable.
a simple way to do this is to split
teva_string = 'CADET PRIMERA DIVISIÓ - GRUP 2'
teva_lista = teva_string.split(' ')
print (teva_lista)

Any decision on how to parse a string is going to depend on one's assumptions about what form the strings are going to take. In the particular case of 'TEMPORADA 2021-2022', doing my_string.split(' ')[1] will get the years. 'Data: 14-05-2022, 19:00h'.split(' ') will get the list ['Data: 14-05-2022,, '19:00h'], while 'Data: 14-05-2022, 19:00h'.split('-') will get ['Data: 14-05-2022', ' 19:00h']. You can also use datetime libraries or regular expressions, with the latter allowing for more customization if the form of your data varies.

I want to print names of employees who have both work number and mobile number.below is my json body [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I want to print firstname of employees who have both work number and mobile number. Below is my json body. I am facing difficulty in getting inside phoneNumbers attribute. My final output should be: "Adhi as Adhi has both work and mobile numbers".
I am not able to iterate the inner dictionary of phoneNumbers attribute.Can you please help me on this.
This is my python code
for i in Data['users']:
for j in i['phoneNumbers']:
for i in range(len(j)):
if j['type']=="work" and j['type']=="mobile":
print("Firstname",i['firstName'])

You can loop over the users and check if the work and mobile number are present:
for user in Data['users']:
has_mobile_number = False
has_work_number = False
for phonenumber in user['phoneNumbers']:
if phonenumber['type'] == 'work':
has_work_number = True
if phonenumber['type'] == 'mobile':
has_mobile_number = True
if has_work_number and has_mobile_number:
print('Firstname', user['firstName'])
Also, I recommend not using i and j when not talking about indexes. In you code, i is a dict representing a user and j is a dict representing a phone. I replaced them with user and phonenumber for more clarity in the code above.

how to fetch only content from table by avoiding unwanted codes [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I was trying to fetch text content from table which works well but along with result it print unwanted codes
my code is here
searchitem = searchme.objects.filter(face = after) .values_list ("tale" , flat = True)
the contents are text
the result I receive is "querySet Prabhakaran seachitem"
but I only want o get result "Prabhakaran"
model is this
class searchme ( models.Model):
face = models.TextField()
tale = models.TextField ()

From the official django documentation :
A common need is to get a specific field value of a certain model instance. To achieve that, use values_list() followed by a get() call:
So use:
searchme.objects.values_list('tale', flat=True).get(face=after)

Access online data using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to access the federal reserve bank data at https://fred.stlouisfed.org/series/FEDFUNDS
what is the code I can write to access this database and then put it in a dictionary? Or do I have to download the file first and save it on my computer?

The easiest way to pull that data in would be to download and parse the CSV file listed under the "Download" button.
You can use the Requests library to download the file, then use the native CSV library.

See https://stackoverflow.com/a/32400969/9214517 for how to do it.
Let's say you allow to keep the data in a pandas DataFrame (as the link above do), this is the code:
import pandas as pd
import requests
import io
url = "https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=on&txtcolor=%23444444&ts=12&tts=12&width=968&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=FEDFUNDS&scale=left&cosd=1954-07-01&coed=2018-10-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Monthly&fam=avg&fgst=lin&fgsnd=2009-06-01&line_index=1&transformation=lin&vintage_date=2018-11-28&revision_date=2018-11-28&nd=1954-07-01"
s = requests.get(url).content.decode("utf-8")
df = pd.read_csv(io.StringIO(s)
Then your df will be:
DATE FEDFUNDS
0 1954-07-01 0.80
1 1954-08-01 1.22
2 1954-09-01 1.06
3 1954-10-01 0.85
4 1954-11-01 0.83
....
And if you insist on a dict, use this instead of the last line above to convert your CSV data s:
mydict = dict([line.split(",") for line in s.splitlines()])
The key is how to get the URL: Hit the download button on the page you quoted, and copy the link to CSV.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dynamically scrape JSON values [closed] - python

Related

python soup response parsing header and value [closed]

String Operations and Manipulation in Python [closed]

I want to print names of employees who have both work number and mobile number.below is my json body [closed]

how to fetch only content from table by avoiding unwanted codes [closed]

Access online data using python [closed]

Categories

Resources