How to convert gzip.GzipFile to dictionary? - python

I have a gz format file. The file is very big and the first line is as follow:
{"originaltitle":"Leasing Specialist - WPM Real Estate Management","workexperiences":[{"company":"Home Properties","country":"US","customizeddaterange":"","daterange":{"displaydaterange":"","startdate":null,"enddate":null},"description":"Responsibilities: Inspect tour routes, models and show apartments daily to ensure cleanliness. Greeting prospective residents; determining the needs and preferences of the prospect and professionally present specific apartments while providing information regarding features and benefits. Answering incoming calls in a cheerful and professional manner. Handle each call accordingly whether it is a prospect call or an irate resident that just moved in. Develop and maintain Resident relations through the courtesy of on-site personnel, promptness of maintenance calls, and knowledge of community policies. Learn to develop professional sales and closing techniques. Accompany prospects to model apartments and discusses size and layout of rooms, available facilities, such as swimming pool and saunas, location of shopping centers, services available, and terms of lease. Demonstrate thorough knowledge and use of lead tracking system. Make follow-up calls to prospective Residents who did not fill out an application. Compile and update listings of available rental units.","location":"Baltimore, MD","normalizedtitle":"leasing specialist","title":"Leasing Specialist"},{"company":"WPM Real Estate Management","country":"US","customizeddaterange":"1 year, 3 months","daterange":{"displaydaterange":"July 2017 to October 2018","startdate":{"displaydate":"July 2017","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"October 2018","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Inspect tour routes, models and show apartments daily to ensure cleanliness. Greeting prospective residents; determining the needs and preferences of the prospect and professionally present specific apartments while providing information regarding features and benefits. Answering incoming calls in a cheerful and professional manner. Handle each call accordingly whether it is a prospect call or an irate resident that just moved in. Develop and maintain Resident relations through the courtesy of on-site personnel, promptness of maintenance calls, and knowledge of community policies. Learn to develop professional sales and closing techniques. Accompany prospects to model apartments and discusses size and layout of rooms, available facilities, such as swimming pool and saunas, location of shopping centers, services available, and terms of lease. Demonstrate thorough knowledge and use of lead tracking system. Make follow-up calls to prospective Residents who did not fill out an application. Compile and update listings of available rental units.","location":"Baltimore, MD","normalizedtitle":"leasing specialist","title":"Leasing Specialist"},{"company":"Westminster Management","country":"US","customizeddaterange":"1 year","daterange":{"displaydaterange":"June 2016 to June 2017","startdate":{"displaydate":"June 2016","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"June 2017","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Tour vacant units and model with future prospects.Process applications. Answer emails and incoming phone calls. Prepare lease agreement for signing. Collect all monies that is due on dateof move-in. Enter resident repair orders for resident. Walk vacant units to ensure that the unit is ready for show. Complete residency and employment verifications. Income qualify all applicants.","location":"Baltimore, MD","normalizedtitle":"leasing consultant","title":"Leasing Consultant"},{"company":"MARYLAND MANAGEMENT COMPANY","country":"US","customizeddaterange":"1 year, 1 month","daterange":{"displaydaterange":"April 2015 to May 2016","startdate":{"displaydate":"April 2015","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"May 2016","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Lease apartments, sign lease agreements, complete residence maintenance repairrequest, answer phones, customer service, processed prospects applications, opened and closedinventory, responded to Level One emails Accomplishments: I was able to successfully finish FairHousing requirements. The first month I was able to properly and accurately process a application and move-in documents. Skills Used: The skills I used while at Americana were strong team work, strongcommunication, interpersonal, and leadership.","location":"Glen Burnie, MD","normalizedtitle":"leasing agent","title":"Leasing Agent"},{"company":"Amazon.com","country":"US","customizeddaterange":"1 year, 5 months","daterange":{"displaydaterange":"September 2014 to February 2016","startdate":{"displaydate":"September 2014","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"February 2016","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: I assure customers are receiving the correct merchandise in a timely fashion.And evaluate inventoryAccomplishments:I exceeded Amazon expectations of receiving 2800 items per hour, which allowed me to train otherassociates, building confidence and skills.Skills Used:The skills i used while performing my task were strong leadership, strong communications, and beingdetailed orientated.","location":"Baltimore, MD","normalizedtitle":"customer service representative","title":"Customer Service Representative"},{"company":"Carmax Superstore","country":"US","customizeddaterange":"1 year, 2 months","daterange":{"displaydaterange":"February 2014 to April 2015","startdate":{"displaydate":"February 2014","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"April 2015","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities:Greet customersSearch for the right vehicle that best suits the customers needs and wantsSubmit financial applicationsAssist customer with the purchasing process and document signingEnter customers information for appraisal offerAssist customer with purchasing Car Max extended warrantiesConducted follow- up on a daily, weekly, and monthy basisAccomplishments:I was acknowledged by the district for having 100% in Car Max extended warranties. Also I wasacknowledged by the district for having one of the highest Voice Of Customer survey scores. I passedthe 6 week training, obtaining my sales licenseSkills Used:I demonstrate strong communication, interpersonal and listening skills. I also have strongorganizational skills.","location":"Nottingham, MD","normalizedtitle":"sales consultant","title":"Certified Sales Consultant"},{"company":"rue21","country":"US","customizeddaterange":"1 year, 8 months","daterange":{"displaydaterange":"June 2011 to February 2013","startdate":{"displaydate":"June 2011","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"February 2013","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Managed profit goals on a daily basisCustomer ServiceReceived Incoming shipmentDelivered daily bank depsoitsMaintained store appearanceOverlooked sales associates performanceCreated daily goals for each sales associateAccomplishments:The impact that I was able to have during my time at Rue21, I was able to build a strong team of individuals who were scored top in the region for Customer Service.Skills Used:I demonstrated strong leadership and verbal communication.","location":"Dundalk, MD","normalizedtitle":"assistant store manager","title":"Assistant Store Manager"},{"company":"Shaws Jewelers","country":"US","customizeddaterange":"1 year, 5 months","daterange":{"displaydaterange":"November 2009 to April 2011","startdate":{"displaydate":"November 2009","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"April 2011","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Customer serviceGeneral office( typing, faxing, )Made outgoing calls to valued customersCleaned and maintained show cases and lunch roomPrepared jewlery repair tickets for outgoing shipmentAccomplishments:During my time at Shaws Jewelers I was able to demonstrate excellent customer service.Also I wasable to achieve personal profit goals and credit application goals on a daily basis. I was acknowledged and rewarded by my DM for excellent team participation and over achieving the 6 standards on a dailybasis.Skills Used:I demonstrated strong verbal and listening skills. Also I have excellent interpersonal skills.","location":"Dundalk, MD","normalizedtitle":"sales associate","title":"Sales Associate"}],"skillslist":[{"monthsofexperience":0,"text":"yardi"},{"monthsofexperience":0,"text":"marketing"},{"monthsofexperience":0,"text":"outlook"},{"monthsofexperience":0,"text":"receptionist"},{"monthsofexperience":0,"text":"management"}],"url":"/r/Lashannon-Felton/1062d3b8cbb13886","additionalinfo":""}\n'
I am not familiar with gzip.GzipFile format.
Is there a way to make it a dictionary?

You will want to make use of the json module and the gzip module in Python, both of which are part of the Python Standard Library.
The gzip module provides the GzipFile class, as well as the open(),
compress() and decompress() convenience functions. The GzipFile class
reads and writes gzip-format files, automatically compressing or
decompressing the data so that it looks like an ordinary file object.
To read the compressed file, you can call gzip.open().
Opening the file with the default rb mode, will return a gzip.GzipFile object, from which you can obtain a bytes-like object by calling read().
Then, using json.loads(), you can convert the raw data into a usable Python object -- a dictionary.
The snippet below is a simple demonstration of this in action:
import gzip
import json
with gzip.open('gzipped_file.json.gz', 'rb') as f:
raw_json = f.read()
data = json.loads(raw_json)
print(type(data))
# Prints <class 'dict'>
print(data)
# Prints {'originaltitle': 'Leasing Specialist - WPM Real Estate Management', 'workexperience ...
print(data['workexperiences'][0]['company'])
# Prints Home Properties

Related

can't commit data to the database due to unknown characters in python

I am scraping some websites and storing the data in my database. Sometimes I get character maps to error, which I think is due to non-ASCII characters. Since I am scraping many websites with texts in different languages, I could not solve my issue in a general and efficient way.
an error example
Message: 'commit exception GRANTS.GOV'
Arguments: (UnicodeEncodeError('charmap', 'The Embassy of the United States in Nur-Sultan and the Consulate General of the United States in Almaty announces an open competition for past participants (“alumni”) of U.S. government-funded and U.S. government-sponsored exchange programs to submit applications to the 2021 Alumni Engagement Innovation Fund (AEIF) 2021.\xa0\xa0We seek proposals from teams of at least two alumni that meet all program eligibility requirements below. Exchange alumni interested in participating in AEIF 2021 should submit proposals to KazakhstanAlumni#state.gov\xa0by March 31, 2021, 18:00 Nur-Sultan time.\xa0\nAEIF provides alumni of U.S. sponsored and facilitated exchange programs with funding to expand on skills gained during their exchange experience to design and implement innovative solutions to global challenges facing their community. Since its inception in 2011, AEIF has funded nearly 500 alumni-led projects around the world through a competitive global competition.\n\nThis year, the U.S. Mission to Kazakhstan will accept proposals managed by teams of at least two (2) alumni that support the following theme:\n\u25cf\xa0\xa0\xa0\xa0\xa0\xa0Mental health awareness, promotion of mental wellbeing and resiliency.\nGoals. Projects may support one or more of the following goals:\nGoal 1: Increase in public understanding of mental health issues,\xa0its signs and strategies for providing timely help;\nGoal 2: Increase in public understanding of resources, methods, and tools that promote mental health and resiliency, especially among at-risk audiences; American best practices to promote mental health.\nGoal 3: Combatting stigma around mental health issues and dispelling common myths.\n\nFor full package of required forms please Related Documents section.', 1098, 1099, 'character maps to <undefined>'),)
my code :
title ='..............'
description ='......'
op = Op(
website='',
op_link='',
title='it might be a long text coming form websites,
description= it might be a long text coming from websites.,
organization_id=org_id,
close_date=',
checksum=singleData['checksum'],
published_date='',
language_id=lang_id,
is_open=1)
try:
session.add(op)
session.commit()
session.flush()
....
....
Please note: it should work on a Linux system; my database (Mysql) is in a Linux system.
I mostly face the issue with title and description, which can be in many languages and any length. How can I make encode it correctly so that I don't get any error while committing to the database?
Thank you

Python API request - For Loop causing Index errors

Fairly new to Python.... struggling with the for loop in my code, specifically the assignment of Key: 'topic_title'.
I keep receiving a "list index out of range" error. The JSON response at the "solicitation_topics" is nested so I believe I need to pass the index and this works when trying to access directly from the python terminal, however within the function I keep getting the error. Any help would be greatly appreciated.
import requests, json
def get_solicitations():
# api-endpoint
URL = "https://www.sbir.gov/api/solicitations.json"
# defining a params dict for the parameters to be sent to the API
PARAMS = {"keyword": 'sbir'}
# sending get requfiest and saving the response as response object
r = requests.get(url = URL, params = PARAMS)
# extracting data in json format
api_data = r.json()
# storing selected json data into a dict
solicitations = []
for data in api_data:
temp = {
'solicitation_title': data['solicitation_title'],
'program': data['program'],
'agency': data['agency'],
'branch': data['branch'],
'close_date': data['close_date'],
'solicitation_link': data['sbir_solicitation_link'],
'topic_title': data['solicitation_topics'][0]['topic_title'],
}
solicitations.append(temp)
return (solicitations)
A snippet of the JSON response looks like this:
[
{
"solicitation_title": "Interactive Digital Media STEM Resources for Pre- College and Informal Science Education Audiences (SBIR) (R43/R44 Clinical Trial Not Allowed) ",
"solicitation_number": "PAR-20-244 ",
"program": "SBIR",
"phase": "BOTH",
"agency": "Department of Health and Human Services",
"branch": "National Institutes of Health",
"solicitation_year": "2020",
"release_date": "2020-06-25",
"open_date": "2020-08-04",
"close_date": "2022-09-03",
"application_due_date": [
"2020-09-04",
"2021-09-03",
"2022-09-02"
],
"occurrence_number": null,
"sbir_solicitation_link": "https://www.sbir.gov/node/1703169",
"solicitation_agency_url": "https://grants.nih.gov/grants/guide/pa-files/PAR-20-244.html",
"current_status": "open",
"solicitation_topics": [
{
"topic_title": "Interactive Digital Media STEM Resources for Pre-College and Informal Science Education Audiences (SBIR) (R43/R44 Clinical Trial Not Allowed) ",
"branch": "National Institutes of Health",
"topic_number": "PAR-20-244 ",
"topic_description": "The educational objective of this FOA is to provide opportunities for eligible SBCs to submit NIH SBIR grant applications to develop IDM STEM products that address student career choice and health and medicine topics for: (1) pre-kindergarten to grade 12 (P-12) students and teachers or (2) informal science education (ISE) audiences. The second educational objective is to inform the American public that their quality of health is defined by lifestyle. If this message is understood, people can begin to live longer and reduce the healthcare burden to society. Therefore, this FOA also encourages IDM STEM products that will increase public health literacy and stimulate behavioral changes towards a healthier lifestyle. The research objective of this FOA is the development of new educational products that will advance our understanding of how IDM STEM-based gaming can improve student learning. It is anticipated that increasing underserved and minority student achievement in STEM fields through IDM STEM resources will encourage these students to pursue health-related careers that will increase their economic and social opportunities. A diverse health care workforce will help to expand health care access for the underserved, foster research in neglected areas of societal need, and enrich the pool of managers and policymakers to meet the needs of a diverse population.\r\n\r\nIDM is a bridge technology that converts game-based activities from a social pastime to a powerful educational tool that challenges students with problem solving, conceptual reasoning and goal-oriented decision making. Well-designed IDM products mimic successful teacher pedagogy and exploit student interest in games for learning. IDM STEM products also integrate imbedded learning, e.g., what the student knows and new knowledge gained in the gaming process, into problem solving skills. IDM products provide real time student assessment. Unlike standardized classroom testing where student achievement is a pass or fail process, IDM-based assessment is interactive, does not punish the student, and provides feedback on how to move to the next level of play. IDM products are intended to generate long-term changes in student performance, educational outcomes and career choices.\r\n\r\nThis FOA also encourages IDM STEM products that will increase public health literacy and stimulate behavioral changes towards a healthier lifestyle. Types of applications submitted to this FOA may vary with the target audience, scientific content, educational purpose and method of delivery. IDM STEM products may include but are not limited to: game-based curricula, resources that promote attitude changes toward learning, new skills development, teamwork and group activities, public participation in scientific research (citizen science) projects, and behavioral changes in lifestyle and health. IDM STEM products designed to increase the number of underserved students, e.g., American Indian, Alaska Native, Pacific Islanders, African American, Hispanic, disabled, or otherwise underrepresented individuals considering careers in basic, behavioral or clinical research are encouraged.\r\n\r\nIDM STEM products may be designed for use in-classroom or out-of-classroom settings, e.g., as supplements to existing classroom curricula, for after-school science clubs, libraries, hospital waiting rooms and science museums. IDM products may target children in group settings or individually, with or without adult or teacher participation or supervision.\r\n\r\nThe proposed project may use any IDM gaming technology or platform but the platform chosen should be accessible to the target group.\r\n\r\n",
"sbir_topic_link": "https://www.sbir.gov/node/1703171",
"subtopics": []
}
]
},
]
Replicating your code, it looks like solicitation_topics can be an empty list. I added this line to your function:
print(f"title = {data['solicitation_title']}, topics: {data['solicitation_topics']}")
And I found this (one of several) empties:
title = PHS 2020 Omnibus Solicitation of the NIH, CDC and FDA for Small Business Innovation Research Grant Applications (Parent SBIR [R43/R44] Clinical Trial Not Allowed), topics: []
You will need to figure out how to guard against that.
If you want to skip the empty ones you could put a continue at the top of the loop:
if not data['solicitation_topics']:
continue
Or if you want to still preserve the solicitations with no topics, you should generate the title you want above, and then use that in your temp:
if data['solicitation_topics']:
topic_title = data['solicitation_topics'][0]['topic_title']
else:
topic_title = 'Not Supplied'

Financial data Python API

I'm developing a financial app, and I'm in need for a financial data API which can provide me historical end of day prices for the stocks, both American and European stocks, news, dividends history, sector and industry information.
I'm having a hard time finding such an API or a data provider. Due to I'm just starting out I wish to find a data provider who is either free or reasonable priced, not interested in paying thousands of dollars for the data.
Does anyone here have any experience with such an API and any recommendations?
Its worth to check out:
https://iexcloud.io/docs/api/
https://finnhub.io/
Both available free
I like these packages:
https://github.com/alvarobartt/investpy – for anything on stock exchanges.
https://github.com/man-c/pycoingecko – for anything crypto.
Consider using https://github.com/ymyke/tessa, it unifies and simplifies access to investpy and pycoingecko. It also takes care of caching and rate limiting. (Disclosure: I'm the author or tessa.)

PayPal Adaptive Payments vs Braintree Merchants

Which is better considering this situation - using PayPal adaptive payments for parallel payment processing and integration, or using a Braintree merchants account for payment processing and syncing with a 3rd-party for payment integration? Can PayPal adaptive payments do it all?
Suppose you have a business called Sewing Bliss & Co. It has two founders. One founder lives in NYC and creates baby clothing, the other founder lives in CA and creates quilts. They both sell their products through their ecommerce web app called www.sewingbliss.co.
But, they want their income to be organized into two separate streams - whenever a visitor purchases a baby item it goes into the NYC founder's PayPal account. And whenever a quilt is purchased it goes to the CA founder's PayPal account. If a customer purchases both a baby item and a quilt in one transaction, the transaction goes through and is divided up respectively. To the customer, the transaction appears seamless.
I am a web developer (mostly Python/Django). I am curious how this would best be implemented. Any insights?
So I image you would have two different pages, one for the quilt and one for the baby items. Now you could set up individual PayPal buttons for each page in which each button will send money to a different account. I would suggest checking out django-paypal.
The documentation on django-paypal can be a little confusing so read carefully.

Is there any API for Oneworld Alliance?

Does anyone know if there is an API for Oneworld Alliance ? The idea is to program in Python a Traveling Salesman, who visits all airports in the system based on actual available flight connections.
I don't think Oneworld Alliance themselves, or other airlines or alliances, have their own APIs. Not sure whether asking this is ontopic to SO.
Try the search engines and booking sites: Travelocity, Expedia, Hotwire, cheaptickets...
For example by Google here's Expedia Affiliate Network.
Kayak apparently used to have a beta API but it was pulled due to misuse.
Not sure how easy it is to scrape Oneworld's site or timetables, I wouldn't start there.
Remember the airlines have a negative incentive to allow their data to be scraped, whereas the search engines have a positive incentive (within reasonable limits). So start with the latter.
When you say "based on actual available flight connections", I presume you just check whether airline X has a route connecting city A to city B, not at actual seat inventory on specific dates and times, which seems needless. Do you need durations and frequencies?
Btw, there are 900 hits on SO on "Traveling Salesman", you might be able to reuse someone else's data.

Categories