Related
I am scraping some websites and storing the data in my database. Sometimes I get character maps to error, which I think is due to non-ASCII characters. Since I am scraping many websites with texts in different languages, I could not solve my issue in a general and efficient way.
an error example
Message: 'commit exception GRANTS.GOV'
Arguments: (UnicodeEncodeError('charmap', 'The Embassy of the United States in Nur-Sultan and the Consulate General of the United States in Almaty announces an open competition for past participants (“alumni”) of U.S. government-funded and U.S. government-sponsored exchange programs to submit applications to the 2021 Alumni Engagement Innovation Fund (AEIF) 2021.\xa0\xa0We seek proposals from teams of at least two alumni that meet all program eligibility requirements below. Exchange alumni interested in participating in AEIF 2021 should submit proposals to KazakhstanAlumni#state.gov\xa0by March 31, 2021, 18:00 Nur-Sultan time.\xa0\nAEIF provides alumni of U.S. sponsored and facilitated exchange programs with funding to expand on skills gained during their exchange experience to design and implement innovative solutions to global challenges facing their community. Since its inception in 2011, AEIF has funded nearly 500 alumni-led projects around the world through a competitive global competition.\n\nThis year, the U.S. Mission to Kazakhstan will accept proposals managed by teams of at least two (2) alumni that support the following theme:\n\u25cf\xa0\xa0\xa0\xa0\xa0\xa0Mental health awareness, promotion of mental wellbeing and resiliency.\nGoals. Projects may support one or more of the following goals:\nGoal 1: Increase in public understanding of mental health issues,\xa0its signs and strategies for providing timely help;\nGoal 2: Increase in public understanding of resources, methods, and tools that promote mental health and resiliency, especially among at-risk audiences; American best practices to promote mental health.\nGoal 3: Combatting stigma around mental health issues and dispelling common myths.\n\nFor full package of required forms please Related Documents section.', 1098, 1099, 'character maps to <undefined>'),)
my code :
title ='..............'
description ='......'
op = Op(
website='',
op_link='',
title='it might be a long text coming form websites,
description= it might be a long text coming from websites.,
organization_id=org_id,
close_date=',
checksum=singleData['checksum'],
published_date='',
language_id=lang_id,
is_open=1)
try:
session.add(op)
session.commit()
session.flush()
....
....
Please note: it should work on a Linux system; my database (Mysql) is in a Linux system.
I mostly face the issue with title and description, which can be in many languages and any length. How can I make encode it correctly so that I don't get any error while committing to the database?
Thank you
I have a gz format file. The file is very big and the first line is as follow:
{"originaltitle":"Leasing Specialist - WPM Real Estate Management","workexperiences":[{"company":"Home Properties","country":"US","customizeddaterange":"","daterange":{"displaydaterange":"","startdate":null,"enddate":null},"description":"Responsibilities: Inspect tour routes, models and show apartments daily to ensure cleanliness. Greeting prospective residents; determining the needs and preferences of the prospect and professionally present specific apartments while providing information regarding features and benefits. Answering incoming calls in a cheerful and professional manner. Handle each call accordingly whether it is a prospect call or an irate resident that just moved in. Develop and maintain Resident relations through the courtesy of on-site personnel, promptness of maintenance calls, and knowledge of community policies. Learn to develop professional sales and closing techniques. Accompany prospects to model apartments and discusses size and layout of rooms, available facilities, such as swimming pool and saunas, location of shopping centers, services available, and terms of lease. Demonstrate thorough knowledge and use of lead tracking system. Make follow-up calls to prospective Residents who did not fill out an application. Compile and update listings of available rental units.","location":"Baltimore, MD","normalizedtitle":"leasing specialist","title":"Leasing Specialist"},{"company":"WPM Real Estate Management","country":"US","customizeddaterange":"1 year, 3 months","daterange":{"displaydaterange":"July 2017 to October 2018","startdate":{"displaydate":"July 2017","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"October 2018","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Inspect tour routes, models and show apartments daily to ensure cleanliness. Greeting prospective residents; determining the needs and preferences of the prospect and professionally present specific apartments while providing information regarding features and benefits. Answering incoming calls in a cheerful and professional manner. Handle each call accordingly whether it is a prospect call or an irate resident that just moved in. Develop and maintain Resident relations through the courtesy of on-site personnel, promptness of maintenance calls, and knowledge of community policies. Learn to develop professional sales and closing techniques. Accompany prospects to model apartments and discusses size and layout of rooms, available facilities, such as swimming pool and saunas, location of shopping centers, services available, and terms of lease. Demonstrate thorough knowledge and use of lead tracking system. Make follow-up calls to prospective Residents who did not fill out an application. Compile and update listings of available rental units.","location":"Baltimore, MD","normalizedtitle":"leasing specialist","title":"Leasing Specialist"},{"company":"Westminster Management","country":"US","customizeddaterange":"1 year","daterange":{"displaydaterange":"June 2016 to June 2017","startdate":{"displaydate":"June 2016","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"June 2017","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Tour vacant units and model with future prospects.Process applications. Answer emails and incoming phone calls. Prepare lease agreement for signing. Collect all monies that is due on dateof move-in. Enter resident repair orders for resident. Walk vacant units to ensure that the unit is ready for show. Complete residency and employment verifications. Income qualify all applicants.","location":"Baltimore, MD","normalizedtitle":"leasing consultant","title":"Leasing Consultant"},{"company":"MARYLAND MANAGEMENT COMPANY","country":"US","customizeddaterange":"1 year, 1 month","daterange":{"displaydaterange":"April 2015 to May 2016","startdate":{"displaydate":"April 2015","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"May 2016","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Lease apartments, sign lease agreements, complete residence maintenance repairrequest, answer phones, customer service, processed prospects applications, opened and closedinventory, responded to Level One emails Accomplishments: I was able to successfully finish FairHousing requirements. The first month I was able to properly and accurately process a application and move-in documents. Skills Used: The skills I used while at Americana were strong team work, strongcommunication, interpersonal, and leadership.","location":"Glen Burnie, MD","normalizedtitle":"leasing agent","title":"Leasing Agent"},{"company":"Amazon.com","country":"US","customizeddaterange":"1 year, 5 months","daterange":{"displaydaterange":"September 2014 to February 2016","startdate":{"displaydate":"September 2014","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"February 2016","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: I assure customers are receiving the correct merchandise in a timely fashion.And evaluate inventoryAccomplishments:I exceeded Amazon expectations of receiving 2800 items per hour, which allowed me to train otherassociates, building confidence and skills.Skills Used:The skills i used while performing my task were strong leadership, strong communications, and beingdetailed orientated.","location":"Baltimore, MD","normalizedtitle":"customer service representative","title":"Customer Service Representative"},{"company":"Carmax Superstore","country":"US","customizeddaterange":"1 year, 2 months","daterange":{"displaydaterange":"February 2014 to April 2015","startdate":{"displaydate":"February 2014","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"April 2015","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities:Greet customersSearch for the right vehicle that best suits the customers needs and wantsSubmit financial applicationsAssist customer with the purchasing process and document signingEnter customers information for appraisal offerAssist customer with purchasing Car Max extended warrantiesConducted follow- up on a daily, weekly, and monthy basisAccomplishments:I was acknowledged by the district for having 100% in Car Max extended warranties. Also I wasacknowledged by the district for having one of the highest Voice Of Customer survey scores. I passedthe 6 week training, obtaining my sales licenseSkills Used:I demonstrate strong communication, interpersonal and listening skills. I also have strongorganizational skills.","location":"Nottingham, MD","normalizedtitle":"sales consultant","title":"Certified Sales Consultant"},{"company":"rue21","country":"US","customizeddaterange":"1 year, 8 months","daterange":{"displaydaterange":"June 2011 to February 2013","startdate":{"displaydate":"June 2011","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"February 2013","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Managed profit goals on a daily basisCustomer ServiceReceived Incoming shipmentDelivered daily bank depsoitsMaintained store appearanceOverlooked sales associates performanceCreated daily goals for each sales associateAccomplishments:The impact that I was able to have during my time at Rue21, I was able to build a strong team of individuals who were scored top in the region for Customer Service.Skills Used:I demonstrated strong leadership and verbal communication.","location":"Dundalk, MD","normalizedtitle":"assistant store manager","title":"Assistant Store Manager"},{"company":"Shaws Jewelers","country":"US","customizeddaterange":"1 year, 5 months","daterange":{"displaydaterange":"November 2009 to April 2011","startdate":{"displaydate":"November 2009","granularity":"MONTH","isodate":{"date":null}},"enddate":{"displaydate":"April 2011","granularity":"MONTH","isodate":{"date":null}}},"description":"Responsibilities: Customer serviceGeneral office( typing, faxing, )Made outgoing calls to valued customersCleaned and maintained show cases and lunch roomPrepared jewlery repair tickets for outgoing shipmentAccomplishments:During my time at Shaws Jewelers I was able to demonstrate excellent customer service.Also I wasable to achieve personal profit goals and credit application goals on a daily basis. I was acknowledged and rewarded by my DM for excellent team participation and over achieving the 6 standards on a dailybasis.Skills Used:I demonstrated strong verbal and listening skills. Also I have excellent interpersonal skills.","location":"Dundalk, MD","normalizedtitle":"sales associate","title":"Sales Associate"}],"skillslist":[{"monthsofexperience":0,"text":"yardi"},{"monthsofexperience":0,"text":"marketing"},{"monthsofexperience":0,"text":"outlook"},{"monthsofexperience":0,"text":"receptionist"},{"monthsofexperience":0,"text":"management"}],"url":"/r/Lashannon-Felton/1062d3b8cbb13886","additionalinfo":""}\n'
I am not familiar with gzip.GzipFile format.
Is there a way to make it a dictionary?
You will want to make use of the json module and the gzip module in Python, both of which are part of the Python Standard Library.
The gzip module provides the GzipFile class, as well as the open(),
compress() and decompress() convenience functions. The GzipFile class
reads and writes gzip-format files, automatically compressing or
decompressing the data so that it looks like an ordinary file object.
To read the compressed file, you can call gzip.open().
Opening the file with the default rb mode, will return a gzip.GzipFile object, from which you can obtain a bytes-like object by calling read().
Then, using json.loads(), you can convert the raw data into a usable Python object -- a dictionary.
The snippet below is a simple demonstration of this in action:
import gzip
import json
with gzip.open('gzipped_file.json.gz', 'rb') as f:
raw_json = f.read()
data = json.loads(raw_json)
print(type(data))
# Prints <class 'dict'>
print(data)
# Prints {'originaltitle': 'Leasing Specialist - WPM Real Estate Management', 'workexperience ...
print(data['workexperiences'][0]['company'])
# Prints Home Properties
Fairly new to Python.... struggling with the for loop in my code, specifically the assignment of Key: 'topic_title'.
I keep receiving a "list index out of range" error. The JSON response at the "solicitation_topics" is nested so I believe I need to pass the index and this works when trying to access directly from the python terminal, however within the function I keep getting the error. Any help would be greatly appreciated.
import requests, json
def get_solicitations():
# api-endpoint
URL = "https://www.sbir.gov/api/solicitations.json"
# defining a params dict for the parameters to be sent to the API
PARAMS = {"keyword": 'sbir'}
# sending get requfiest and saving the response as response object
r = requests.get(url = URL, params = PARAMS)
# extracting data in json format
api_data = r.json()
# storing selected json data into a dict
solicitations = []
for data in api_data:
temp = {
'solicitation_title': data['solicitation_title'],
'program': data['program'],
'agency': data['agency'],
'branch': data['branch'],
'close_date': data['close_date'],
'solicitation_link': data['sbir_solicitation_link'],
'topic_title': data['solicitation_topics'][0]['topic_title'],
}
solicitations.append(temp)
return (solicitations)
A snippet of the JSON response looks like this:
[
{
"solicitation_title": "Interactive Digital Media STEM Resources for Pre- College and Informal Science Education Audiences (SBIR) (R43/R44 Clinical Trial Not Allowed) ",
"solicitation_number": "PAR-20-244 ",
"program": "SBIR",
"phase": "BOTH",
"agency": "Department of Health and Human Services",
"branch": "National Institutes of Health",
"solicitation_year": "2020",
"release_date": "2020-06-25",
"open_date": "2020-08-04",
"close_date": "2022-09-03",
"application_due_date": [
"2020-09-04",
"2021-09-03",
"2022-09-02"
],
"occurrence_number": null,
"sbir_solicitation_link": "https://www.sbir.gov/node/1703169",
"solicitation_agency_url": "https://grants.nih.gov/grants/guide/pa-files/PAR-20-244.html",
"current_status": "open",
"solicitation_topics": [
{
"topic_title": "Interactive Digital Media STEM Resources for Pre-College and Informal Science Education Audiences (SBIR) (R43/R44 Clinical Trial Not Allowed) ",
"branch": "National Institutes of Health",
"topic_number": "PAR-20-244 ",
"topic_description": "The educational objective of this FOA is to provide opportunities for eligible SBCs to submit NIH SBIR grant applications to develop IDM STEM products that address student career choice and health and medicine topics for: (1) pre-kindergarten to grade 12 (P-12) students and teachers or (2) informal science education (ISE) audiences. The second educational objective is to inform the American public that their quality of health is defined by lifestyle. If this message is understood, people can begin to live longer and reduce the healthcare burden to society. Therefore, this FOA also encourages IDM STEM products that will increase public health literacy and stimulate behavioral changes towards a healthier lifestyle. The research objective of this FOA is the development of new educational products that will advance our understanding of how IDM STEM-based gaming can improve student learning. It is anticipated that increasing underserved and minority student achievement in STEM fields through IDM STEM resources will encourage these students to pursue health-related careers that will increase their economic and social opportunities. A diverse health care workforce will help to expand health care access for the underserved, foster research in neglected areas of societal need, and enrich the pool of managers and policymakers to meet the needs of a diverse population.\r\n\r\nIDM is a bridge technology that converts game-based activities from a social pastime to a powerful educational tool that challenges students with problem solving, conceptual reasoning and goal-oriented decision making. Well-designed IDM products mimic successful teacher pedagogy and exploit student interest in games for learning. IDM STEM products also integrate imbedded learning, e.g., what the student knows and new knowledge gained in the gaming process, into problem solving skills. IDM products provide real time student assessment. Unlike standardized classroom testing where student achievement is a pass or fail process, IDM-based assessment is interactive, does not punish the student, and provides feedback on how to move to the next level of play. IDM products are intended to generate long-term changes in student performance, educational outcomes and career choices.\r\n\r\nThis FOA also encourages IDM STEM products that will increase public health literacy and stimulate behavioral changes towards a healthier lifestyle. Types of applications submitted to this FOA may vary with the target audience, scientific content, educational purpose and method of delivery. IDM STEM products may include but are not limited to: game-based curricula, resources that promote attitude changes toward learning, new skills development, teamwork and group activities, public participation in scientific research (citizen science) projects, and behavioral changes in lifestyle and health. IDM STEM products designed to increase the number of underserved students, e.g., American Indian, Alaska Native, Pacific Islanders, African American, Hispanic, disabled, or otherwise underrepresented individuals considering careers in basic, behavioral or clinical research are encouraged.\r\n\r\nIDM STEM products may be designed for use in-classroom or out-of-classroom settings, e.g., as supplements to existing classroom curricula, for after-school science clubs, libraries, hospital waiting rooms and science museums. IDM products may target children in group settings or individually, with or without adult or teacher participation or supervision.\r\n\r\nThe proposed project may use any IDM gaming technology or platform but the platform chosen should be accessible to the target group.\r\n\r\n",
"sbir_topic_link": "https://www.sbir.gov/node/1703171",
"subtopics": []
}
]
},
]
Replicating your code, it looks like solicitation_topics can be an empty list. I added this line to your function:
print(f"title = {data['solicitation_title']}, topics: {data['solicitation_topics']}")
And I found this (one of several) empties:
title = PHS 2020 Omnibus Solicitation of the NIH, CDC and FDA for Small Business Innovation Research Grant Applications (Parent SBIR [R43/R44] Clinical Trial Not Allowed), topics: []
You will need to figure out how to guard against that.
If you want to skip the empty ones you could put a continue at the top of the loop:
if not data['solicitation_topics']:
continue
Or if you want to still preserve the solicitations with no topics, you should generate the title you want above, and then use that in your temp:
if data['solicitation_topics']:
topic_title = data['solicitation_topics'][0]['topic_title']
else:
topic_title = 'Not Supplied'
I'm having a hard time understanding why the following output, which is supposed to be json format, and is generated from a datbase query, isnt parsable.
dbresponse = """
(('{"stopApprovalInd":true,"nonPersonalizedCardLine3":"Valued Cardholder","productRoutingBins":[{"requestType":"DIGITAL_ACCOUNT_REQUEST","bin":"342010002"}],"holdtimeSeconds":0,"defaultUpcInd":"false","updatedTimestamp":"2019-01-23T18:53:26.261Z","productName":"PB EGIFT (CAG) MAYO\'S $100","reloadMaxAmount":0,"issuerCompanyCode":"BKPB","updaterId":"IIKIFFFFFFPDZ0PDNQBGWMVGMRVTK9DM","sellStartDate":"2018-08-20T00:00:00.000+0000","taxIncludedInd":false,"taxPercent":0,"maxValueAmount":100,"proxyCardLength":19,"inventorySource":"BLAST","postReversalInd":false,"distributionChannel":"DIGITAL","multicardFlag":"N","companyCode":"BKPB","indentDataType":"0","activateOnShipment":"false","productRedemptionMethods":[],"upc":"07675030446","productFees":[{"feeAmount":0,"feeType":"PURCHASE"},{"feeAmount":0,"feeType":"TRANSACTION"},{"feeAmount":0,"feeType":"CUSTOMIZATION"}],"cpDivisionId":"2X2A9M5KRLXQWQ1SB","subGroupId":"YVAAN33PV7MZ3Z49V0F0J","defaultProductConfigurationId":"Y56X4RXNCY2R9ACSDH","isContentEnabled":false,"productFulfillments":[{"fulfillmentMethod":"EMAIL","fulfillmentType":"PRINT_ON_DEMAND"}],"taxAmount":0,"redeemLocationInd":false,"taxableInd":false,"productRedemptionLocations":[],"generationType":"0","serviceCode":"121","processorCompanyCode":"HP","productDisplayname":"MAYO\'s $100 eGift","baseValueAmount":100,"creatorId":"IIKIFFFFFFPDZ0PDNQBGWMVGMRVTK9DM","subsequentActivation":false,"generateProxyCardNumberInd":false,"productCategory":"CLOSELOOP","exclusionInd":false,"reloadableInd":false,"reversibleInd":false,"productLocale":[{"redemptionInstructions":"<p>Your E-gift Card is redeemable online at mayo.com and in \'s stores nationwide.<\\/p>","productTemplates":[{"templateType":"PRODUCTION","templateId":"Z6WSTWN9HABX8","templatePath":"https://blahblah.net/gcmimages/View/WGNX8/index.html"}],"localeCode":"en_US","inStoreInstructions":"<p>In Store: Print this entire page, and present it to a Mayo\'s Associate at checkout.<\\/p>","onlineInstructions":"<p>Online: Enter your E-Gift Card Number at checkout in the PAY WITH GIFT CARD box. If you have any questions, or to check your balance, please call 1-800-511-2752.<br>You may also scan your printed barcode at a price checker terminal in stores.<br>Your Mayo\'s E-Gift Card number is required for all inquiries.<\\/p>","productDescription":"Mayo\'s, the largest retail brand of Mayo\'s, Inc. (NYSE:M), delivers fashion and affordable luxury to customers at approximately 670 locations in 45 states, the District of Columbia, Puerto Rico and Guam, as well as to customers in the U.S. and more than 100 international destinations through its leading online store at macys.com. Via its stores, e-commerce site, mobile and social platforms, Mayo\'s offers distinctive assortments including the most desired family of exclusive and fashion brands for him, her and home. Mayo\'s is known for such epic events as Mayo\'s 4th of July Fireworks\xae and the Mayo\'s Thanksgiving Day Parade\xae. Building on a more than 150-year tradition, and with the collective support of customers and employees, Mayo\'s helps strengthen communities by supporting local and national charities giving more than $69 million each year to help make a difference in the lives of our customers","termsAndConditions":"Your Mayo\'s E-Gift Card number may be used to purchase any merchandise on-line at macys.com or in-store by following the instructions in the E-Gift Card email. You may not add value back onto the E-Gift Card, nor redeem it for cash or apply it as payment or credit to your credit card account. When you make a purchase with your E-Gift Card number, the value of your purchase plus any shipping/handling fees and sales tax, if applicable, will be automatically deducted from your \\"open to buy.\\" You may check any remaining value via the online Balance Inquiry function, or in-store by scanning the barcode at a price checker terminal or by calling 1-800-511-2752. Please safeguard your Mayo\'s E-Gift Card number. The bearer is responsible for its loss or theft. If your E-Gift Card is lost or stolen, and you have proof of purchase, we will issue you a replacement for the balance shown on our records. Your macys.com E-Gift Card number is required for all inquiries."}],"cardExpirationType":"0","productBarcodes":[{"barcodeType":"1D-CA128"}],"feeStrippingInd":false,"productLineId":"3G2D6R3YG85WZ570","createdTimestamp":"2018-08-20T18:53:46.435Z","inventoryLoadType":"HOT","variableInd":false,"isForcedResponseProduct":false,"reloadMinAmount":0,"entityId":"6APACRYFLXTLSKS","itemBuyerGroupId":"FASHION","provisioningType":"DIGITAL","productFulfillmentPartners":[],"currencyCode":"USD","proxyBin":"0"}',),)
"""
I cant treat it like a json because when I do a type(dbresponse) on the data (directly from the database), python tells me its a "tuple" data. But I know in order for me to be able to access the values, python needs to see the data as a dict.
I have tried converting it multiple different ways, all to no avail.
str(dbresponse)
eval(dict(dbresponse))
str(list(dbresponse)) # tried to convert it to a list, then to a str in hopes of then being able to convert it to dictionary. didnt work.
I dont know much about json.
My question is, is there a foolproof way to turn/convert the above data from tuple to dictionary? And once that is done, how do i iterate through all the keys/values in the data so I see what is available.
I tried pprinnt (pretty print) but that isn't so clear to me (i could be using it incorrectly).
If your response really is a string as shown here, and always has the same format, you could remove the ((' at the start and ',),) at the end. The rest is a valid JSON string:
import json
dbresponse = """(('{"stopApprovalInd":true,"nonPersonalizedCardLine3":"Valued Cardholder","productRoutingBins":[{"requestType":"DIGITAL_ACCOUNT_REQUEST","bin":"342010002"}],"holdtimeSeconds":0,"defaultUpcInd":"false","updatedTimestamp":"2019-01-23T18:53:26.261Z","productName":"PB EGIFT (CASHSTAR) MACY\'S $100","reloadMaxAmount":0,"issuerCompanyCode":"BKPB","updaterId":"IIKIFFFFFFPDZ0PDNQBGWMVGMRVTK9DM","sellStartDate":"2018-08-20T00:00:00.000+0000","taxIncludedInd":false,"taxPercent":0,"maxValueAmount":100,"proxyCardLength":19,"inventorySource":"BLAST","postReversalInd":false,"distributionChannel":"DIGITAL","multicardFlag":"N","companyCode":"BKPB","indentDataType":"0","activateOnShipment":"false","productRedemptionMethods":[],"upc":"07675030446","productFees":[{"feeAmount":0,"feeType":"PURCHASE"},{"feeAmount":0,"feeType":"TRANSACTION"},{"feeAmount":0,"feeType":"CUSTOMIZATION"}],"cpDivisionId":"2X2A9M5KRLXQWQ1SB","subGroupId":"YVAAN33PV7MZ3Z49V0F0J","defaultProductConfigurationId":"Y56X4RXNCY2R9ACSDH","isContentEnabled":false,"productFulfillments":[{"fulfillmentMethod":"EMAIL","fulfillmentType":"PRINT_ON_DEMAND"}],"taxAmount":0,"redeemLocationInd":false,"taxableInd":false,"productRedemptionLocations":[],"generationType":"0","serviceCode":"121","processorCompanyCode":"HP","productDisplayname":"Macy\'s $100 eGift","baseValueAmount":100,"creatorId":"IIKIFFFFFFPDZ0PDNQBGWMVGMRVTK9DM","subsequentActivation":false,"generateProxyCardNumberInd":false,"productCategory":"CLOSELOOP","exclusionInd":false,"reloadableInd":false,"reversibleInd":false,"productLocale":[{"redemptionInstructions":"<p>Your E-gift Card is redeemable online at macys.com and in Macy\'s stores nationwide.<\\/p>","productTemplates":[{"templateType":"PRODUCTION","templateId":"Z6WSTWN9HABX8","templatePath":"https://blahblah.net/gcmimages/View/WGNX8/index.html"}],"localeCode":"en_US","inStoreInstructions":"<p>In Store: Print this entire page, and present it to a Macy\'s Associate at checkout.<\\/p>","onlineInstructions":"<p>Online: Enter your E-Gift Card Number at checkout in the PAY WITH GIFT CARD box. If you have any questions, or to check your balance, please call 1-800-511-2752.<br>You may also scan your printed barcode at a price checker terminal in stores.<br>Your Macy\'s E-Gift Card number is required for all inquiries.<\\/p>","productDescription":"Macy\'s, the largest retail brand of Macy\'s, Inc. (NYSE:M), delivers fashion and affordable luxury to customers at approximately 670 locations in 45 states, the District of Columbia, Puerto Rico and Guam, as well as to customers in the U.S. and more than 100 international destinations through its leading online store at macys.com. Via its stores, e-commerce site, mobile and social platforms, Macy\'s offers distinctive assortments including the most desired family of exclusive and fashion brands for him, her and home. Macy\'s is known for such epic events as Macy\'s 4th of July Fireworks\xae and the Macy\'s Thanksgiving Day Parade\xae. Building on a more than 150-year tradition, and with the collective support of customers and employees, Macy\'s helps strengthen communities by supporting local and national charities giving more than $69 million each year to help make a difference in the lives of our customers","termsAndConditions":"Your Macy\'s E-Gift Card number may be used to purchase any merchandise on-line at macys.com or in-store by following the instructions in the E-Gift Card email. You may not add value back onto the E-Gift Card, nor redeem it for cash or apply it as payment or credit to your credit card account. When you make a purchase with your E-Gift Card number, the value of your purchase plus any shipping/handling fees and sales tax, if applicable, will be automatically deducted from your \\"open to buy.\\" You may check any remaining value via the online Balance Inquiry function, or in-store by scanning the barcode at a price checker terminal or by calling 1-800-511-2752. Please safeguard your Macy\'s E-Gift Card number. The bearer is responsible for its loss or theft. If your E-Gift Card is lost or stolen, and you have proof of purchase, we will issue you a replacement for the balance shown on our records. Your macys.com E-Gift Card number is required for all inquiries."}],"cardExpirationType":"0","productBarcodes":[{"barcodeType":"1D-CA128"}],"feeStrippingInd":false,"productLineId":"3G2D6R3YG85WZ570","createdTimestamp":"2018-08-20T18:53:46.435Z","inventoryLoadType":"HOT","variableInd":false,"isForcedResponseProduct":false,"reloadMinAmount":0,"entityId":"6APACRYFLXTLSKS","itemBuyerGroupId":"FASHION","provisioningType":"DIGITAL","productFulfillmentPartners":[],"currencyCode":"USD","proxyBin":"0"}',),)"""
json_str = dbresponse[3:-5]
data = json.loads(json_str)
print(data)
# {'stopApprovalInd': True, 'nonPersonalizedCardLine3': 'Valued Cardholder', 'productRoutingBins': [{'requestType': 'DIGITAL_ACCOUNT_REQUEST', 'bin': '342010002'}], 'holdtimeSeconds': 0, 'defaultUpcInd': 'false', 'updatedTimestamp': '2019-01-23T18:53:26.261Z', 'productName': "PB EGIFT (CASHSTAR) MACY'S $100", 'reloadMaxAmount': 0, 'issuerCompanyCode': ...
print(data['stopApprovalInd'])
# True
The inner content seems to be wrapped in single quote ', but contains unescaped single quotes (e.g. "MAYO'S"). This of course is a syntax error.
Despite the several levels of escaping seems the code that generated the content still isn't able to get it right.
As Thierry found however removing the first 4 and last 6 characters leaves a valid JSON (in this case) and you can parse it with json.loads(dbresponse[4:-6]).
For support other types of inputs we can use find and rfind:
import json
# dbresponse = ...
begin = dbresponse.find("'")
end = dbresponse.rfind("'")
data = json.loads(dbresponse[begin+1:end])
I'm trying to extract text from a PDF (https://www.sec.gov/litigation/admin/2015/34-76574.pdf) using PyPDF2, and the only result I'm getting is the following string:
b''
Here is my code:
import PyPDF2
import urllib.request
import io
url = 'https://www.sec.gov/litigation/admin/2015/34-76574.pdf'
remote_file = urllib.request.urlopen(url).read()
memory_file = io.BytesIO(remote_file)
read_pdf = PyPDF2.PdfFileReader(memory_file)
number_of_pages = read_pdf.getNumPages()
page = read_pdf.getPage(1)
page_content = page.extractText()
print(page_content.encode('utf-8'))
This code worked correctly on a few of the PDFs I'm working with (e.g. https://www.sec.gov/litigation/admin/2016/34-76837-proposed-amended-distribution-plan.pdf), but the others like the file above didn't work. Any idea what's wrong?
I don't know why pypdf2 can't extract the information from that PDF, but the package pdftotext can:
import pdftotext
from six.moves.urllib.request import urlopen
import io
url = 'https://www.sec.gov/litigation/admin/2015/34-76574.pdf'
remote_file = urlopen(url).read()
memory_file = io.BytesIO(remote_file)
pdf = pdftotext.PDF(memory_file)
# Iterate over all the pages
for page in pdf:
print(page)
Extracted
UNITED STATES OF AMERICA
Before the
SECURITIES AND EXCHANGE COMMISSION
SECURITIES EXCHANGE ACT OF 1934
Release No. 76574 / December 7, 2015
ADMINISTRATIVE PROCEEDING
File No. 3-16987
ORDER INSTITUTING CEASE-AND-DESIST
In the Matter of PROCEEDINGS, PURSUANT TO SECTION
21C OF THE SECURITIES EXCHANGE ACT
KEFEI WANG OF 1934, MAKING FINDINGS, AND
IMPOSING REMEDIAL SANCTIONS AND A
Respondent. CEASE-AND-DESIST ORDER
I.
The Securities and Exchange Commission (“Commission”) deems it appropriate and in the
public interest that cease-and-desist proceedings be, and hereby are, instituted pursuant to 21C of
the Securities Exchange Act of 1934 (“Exchange Act”) against Kefei Wang (“Respondent”).
II.
In anticipation of the institution of these proceedings, Respondent has submitted an Offer
of Settlement (the “Offer”) which the Commission has determined to accept. Solely for the
purpose of these proceedings and any other proceedings brought by or on behalf of the
Commission, or to which the Commission is a party, and without admitting or denying the findings
herein, except as to the Commission’s jurisdiction over him and the subject matter of these
proceedings, which are admitted, and except as provided herein in Section V, Respondent consents
to the entry of this Order Instituting Cease-and-Desist Proceedings, Pursuant to Section 21C of the
Securities Exchange Act of 1934, Making Findings, and Imposing Remedial Sanctions and a
Cease-and-Desist Order (“Order”), as set forth below.
III.
On the basis of this Order and Respondent’s Offer, the Commission finds1 that:
Summary
1. Respondent violated Section 15(a)(1) of the Exchange Act by acting as an
unregistered broker-dealer in connection with his representation of clients who were seeking U.S.
residency through the Immigrant Investor Program. Respondent helped effect certain individuals’
securities purchases in an EB-5 Regional Center. Respondent received a commission from that
Regional Center for each investment he facilitated.
Respondent
2. Kefei Wang, age 39, is a resident of China. During the relevant time period, he was
a U.S. resident and an owner of Nautilus Global Capital, LLC , a now defunct entity that was based
in Fremont, California.
Background
3. The United States Congress created the Immigrant Investor Program, also known as
“EB-5,” in 1990 to stimulate the U.S. economy through job creation and capital investment by
foreign investors. The Program offers EB-5 visas to individuals who invest $1 million in a new
commercial enterprise that creates or preserves at least 10 full-time jobs for qualifying U.S.
workers (or $500,000 in an enterprise located in a rural area or an area of high unemployment). A
certain number of EB-5 visas are set aside for investors in approved Regional Centers. A Regional
Center is defined as “any economic unit, public or private, which is involved with the promotion of
economic growth, including increased export sales, improved regional productivity, job creation,
and increased domestic capital investment.” 8 C.F.R. § 204.6(e) (2015).
4. Typical Regional Center investment vehicles are offered as limited partnership
interests. The partnership interests are securities, usually offered pursuant to one or more
exemptions from the registration requirements of the U.S. securities laws. The Regional Centers
are often managed by a person or entity which acts as a general partner of the limited partnership.
The Regional Centers, the investment vehicles, and the managers are collectively referred to herein
as “EB-5 Investment Offerers.”
5. Various EB-5 Investment Offerers paid commissions to anyone who successfully
sold limited partnership interests to new investors.
1
The findings herein are made pursuant to Respondent’s Offer of Settlement and are not
binding on any other person or entity in this or any other proceeding.
2
Respondent Received Commissions for His Clients’ EB-5 Investments
6. From at least January 2010 through May 2014, Respondent received a portion of
commissions from one EB-5 Investment Offerer totaling $40,000. The commissions constituted
his portion of the commissions that were paid pursuant to a written Agency Agreement between
Nautilus Global Capital and the EB-5 Investment Offerer. On one or more occasions the
commission was paid to a foreign bank account identified by the Respondent despite the fact that
the Respondent was U.S.-based during the relevant time period.
7. Respondent performed activities necessary to effectuate the transaction, including
recommending the specific EB-5 Investment Offerer referenced in paragraph 6 to his clients;
acting as a liaison between the EB-5 Investment Offerer and the investors; and facilitating the
transfer and/or documentation of investment funds to the EB-5 Investment Offerer. Respondent
received his portion of transaction-based commissions due to Nautilus Global Capital for its
services from that EB-5 Investment Offerer.
8. As a result of the conduct described above, Respondent violated Section 15(a)(1) of
the Exchange Act which makes it unlawful for any broker or dealer which is either a person other
than a natural person or a natural person not associated with a broker or dealer to make use of the
mails or any means or instrumentality of interstate commerce “to effect any transactions in, or to
induce or attempt to induce the purchase or sale of, any security” unless such broker or dealer is
registered in accordance with Section 15(b) of the Exchange Act.
IV.
In view of the foregoing, the Commission deems it appropriate to impose the sanctions
agreed to in Respondent Kefei Wang’s Offer.
Accordingly, pursuant to Section 21C of the Exchange Act, it is hereby ORDERED that:
A. Respondent shall cease and desist from committing or causing any violations and
any future violations of Section 15(a)(1) of the Exchange Act.
B. Respondent shall, within ten (10) days of the entry of this Order, pay disgorgement
of $40,000, prejudgment interest of $1,590, and a civil money penalty of $25,000 to the Securities
and Exchange Commission for transfer to the general fund of the United States Treasury in
accordance with Exchange Act Section 21F(g)(3). If timely payment of disgorgement and
prejudgment interest is not made, additional interest shall accrue pursuant to SEC Rule of Practice
600 [17 C.F.R. § 201.600]. If timely payment of the civil money penalty is not made, additional
interest shall accrue pursuant to 31 U.S.C. § 3717. Payment must be made in one of the following
ways:
(1) Respondent may transmit payment electronically to the Commission, which will
provide detailed ACH transfer/Fedwire instructions upon request;
3
(2) Respondent may make direct payment from a bank account via Pay.gov through the
SEC website at http://www.sec.gov/about/offices/ofm.htm; or
(3) Respondent may pay by certified check, bank cashier’s check, or United States
postal money order, made payable to the Securities and Exchange Commission and
hand-delivered or mailed to:
Enterprise Services Center
Accounts Receivable Branch
HQ Bldg., Room 181, AMZ-341
6500 South MacArthur Boulevard
Oklahoma City, OK 73169
Payments by check or money order must be accompanied by a cover letter identifying
Kefei Wang as a Respondent in these proceedings, and the file number of these proceedings; a
copy of the cover letter and check or money order must be sent to Stephen L. Cohen, Associate
Director, Division of Enforcement, Securities and Exchange Commission, 100 F St., NE,
Washington, DC 20549-5553.
V.
It is further Ordered that, solely for purposes of exceptions to discharge set forth in Section
523 of the Bankruptcy Code, 11 U.S.C. § 523, the findings in this Order are true and admitted by
Respondent, and further, any debt for disgorgement, prejudgment interest, civil penalty or other
amounts due by Respondent under this Order or any other judgment, order, consent order, decree
or settlement agreement entered in connection with this proceeding, is a debt for the violation by
Respondent of the federal securities laws or any regulation or order issued under such laws, as set
forth in Section 523(a)(19) of the Bankruptcy Code, 11 U.S.C. § 523(a)(19).
By the Commission.
Brent J. Fields
Secretary
4
[Finished in 0.5s]
I think that there might be an issue with how you are extracting the pages try making a loop and calling each page separately like so
for i in range(0 , number_of_pages ):
pageObj = pdfReader.getPage(i)
page = pageObj.extractText()