Error when reading yahoofinancials JSON: Unexpected character found when decoding 'NaN' - python

I am trying to read a json, which I get from the python package 'yahoofinancials' (it pulls the data from Yahoo Finance):
import numpy as np
import pandas as pd
from yahoofinancials import YahooFinancials
yahoo_financials = YahooFinancials(ticker)
cash_statements = yahoo_financials.get_financial_stmts('annual', 'income')
cash_statements
pd.read_json(str(cash_statements).replace("'", '"'), orient='records')
However I get the error:
Unexpected character found when decoding 'NaN'

The problem is this command: str(cash_statements).replace("'", '"').
You tried to "convert" from a python dictionary to a json string, by replacing single with double quotes, which does not properly work.
Use the json.dump(cash_statements) function for converting your dictionary object into a json string.
Updated Code:
import numpy as np
import pandas as pd
from yahoofinancials import YahooFinancials
# ADJUSTMENT 1 - import json
import json
# just some sample data for testing
ticker = ['AAPL', 'MSFT', 'INTC']
yahoo_financials = YahooFinancials(ticker)
cash_statements = yahoo_financials.get_financial_stmts('annual', 'income')
# ADJUSTMENT 2 - dict to json
cash_statements_json = json.dumps(cash_statements)
pd.read_json(cash_statements_json, orient='records')

Check whether the file is available or the file name is correct because I got the same error while reading a .json file that was not in that folder and located somewhere else.

Related

Problem reading CSV file from URL in pandas python

I'm trying to read csv file in pandas from this url:
https://www.dropbox.com/s/uh7o7uyeghqkhoy/diabetes.csv
By doing this:
url = "https://www.dropbox.com/s/uh7o7uyeghqkhoy/diabetes.csv">
c = pd.read_csv(url)
Or by doing this:
import pandas as pd
import io
import requests
url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))
And i still get the same error message:
ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
Simply:
import pandas as pd
df = pd.read_csv("https://www.dropbox.com/s/uh7o7uyeghqkhoy/diabetes.csv?dl=1")
print(df)
You require a ?dl=1 at the end of your link.
Related
Add ?dl=1 to the end of the URL
import pandas as pd
url = "https://www.dropbox.com/s/uh7o7uyeghqkhoy/diabetes.csv?dl=1"
c = pd.read_csv(url)
print(c)
How to download dropbox csv file to pandas

i am trying to make a function that iterates through names and assign a serial number in a certain pattern then saves it in csv and JSON file

I am trying to build a function that iterates over a bunch of names in a CSV I give then extracts the last serial number written from JSON file then adding one for each name and putting serial number beside every name in the csv, but what i get is that the function generates the first serial number successfully and saves it in Json file but fails to add it in the csv via pandas and fails to update the number in the JSON file.
this is the code of the function:
from docx import Document
import pandas as pd
from datetime import datetime
import time
import os
from docx2pdf import convert
import json
date=datetime.date(datetime.now())
strdate=date.strftime("%d-%m-%Y")
year=date.strftime("%Y")
month=date.strftime("%m")
def genrateserial(a):
jsonFile1 = open("data_file.json", "r")
lastserial = jsonFile1.read()
jsonFile1.close()
for d in range(len(lastserial)):
if lastserial[d]=="\"":
lastserial[d].replace("\"","")
jsonFile1.close()
if strdate=="01" or (month[1]!=lastserial[8]):
num=1
last=f"JO/{year}{month}{num}"
data=f"{last}"
jsonstring=json.dumps(data)
jsonfile2=open("data_file.json", "w")
jsonfile2.write(jsonstring)
jsonfile2.close()
database = pd.read_csv(a)
df = pd.DataFrame(database)
df = df.dropna(axis=0)
for z in range(len(df.Name)):
newentry=f"JO/{year}{month}{num+1}"
jsonstring1=json.dumps(newentry)
jsonfile3=open("data_file.json","w")
jsonfile3.write(jsonstring1)
jsonfile3.close()
df.iloc[[z],3]=newentry
genrateserial('database.csv')

Convert CKAN data API call from bytes into Pandas DataFrame

I am trying to access data.gov.au datasets through their CKAN data API.
Unfortunately, the data API instructions are slightly outdated and do not seem to work. Instructions found here.
So far, I've worked out that I am meant to query the dataset using urllib.request.
import urllib.request
req = urllib.request.Request('https://data.sa.gov.au/data/api/3/action/datastore_search?resource_id=86d35483-feff-42b5-ac05-ad3186ac39de')
with urllib.request.urlopen(req) as response:
data = response.read()
This produces an object of type bytes that looks like a dictionary data structure, where the dataset seems to be stored in "records:".
I'm wondering how I can convert the data records into a Pandas DataFrame. I've tried converting the bytes object into a string and reading that as a json file, but the output is wrong.
# code that did not work
result = str(data, 'utf-8')
rdata = StringIO(result)
df = pd.read_json(rdata)
df
The output I would like to return looks like this:
Thanks!
Here is a solution that works:
import numpy as np
import pandas as pd
import requests
import json
url = "https://data.sa.gov.au/data/api/3/action/datastore_search?resource_id=86d35483-feff-42b5-ac05-ad3186ac39de"
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4, sort_keys=True)
print(content)
df = pd.read_json(content)
df.to_csv("output.csv")
df = pd.json_normalize(df['result']['records'])
You actually were near the solution. It is only the last step df=pd.json_normalize(df['result']['records']) you were missing.

Read_csv from URL into Jupyter

Hi I am unable to read CSV file from the URL by using
import pandas as pd
import numpy as np
data_url = 'https://data.baltimorecity.gov/Financial/Real-Property-Taxes/27w9-urtv.csv'
df = pd.read_csv(data_url)
df.head()
I got an error: "not acceptable"
I also tried different codes importing "requests" but none of them worked. How do I fix this?
Your URL wasnt correct. This should work:
import pandas as pd
data_url = 'https://data.baltimorecity.gov/resource/27w9-urtv.csv'
df = pd.read_csv(data_url)
df.head()

Loading Data Set with Breaks

I'm trying to load a dataset with breaks in it. I am trying to find an intelligent way to make this work. I got started on it with the code i included.
As you can see, the data within the file posted on the public FTP site starts at line 11, ends at line 23818, then starts at again at 23823, and ends at 45,630.
import pandas as pd
import numpy as np
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
url = urlopen("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/10_Portfolios_Prior_12_2_Daily_CSV.zip")
#Download Zipfile and create pandas DataFrame
zipfile = ZipFile(BytesIO(url.read()))
df = pd.read_csv(zipfile.open('10_Portfolios_Prior_12_2_Daily.CSV'), header = 0,
names = ['asof_dt','1','2','3','4','5','6','7','8','9','10'], skiprows=10).dropna()
df['asof_dt'] = pd.to_datetime(df['asof_dt'], format = "%Y%m%d")
I would ideally like the first set to have a version number "1", the second to have "2", etc.
Any help would be greatly appreciated. Thank you.

Categories