Pandas function explode does not work on this DataSeries - python

The pandas explode function doesn't drop the object elements into rows like it should.
import pandas as pd
import requests
import io
from pandas.io.json import json_normalize
response = requests.request("GET", url, headers=headers, data = payload)
response_text = response.text.encode('utf8')
fundingRate = pd.read_json(response_text,orient='columns',typ='frame')
fundingC = pd.DataFrame(fundingRate['data'])
fundingC = fundingC.T
fundingC = fundingC.astype(object)
fundingdataMap = fundingC['dataMap']
fundingdataMap = fundingdataMap.astype(str)
fundingdataMap = fundingdataMap.str.slice(start=10)
fundingdataMap.explode()
fundingdataMap DataSeries
https://www.pythonanywhere.com/user/armaniallie93/files/home/armaniallie93/fundingdataMap.txt
output
data [0.07280400000000001, 0.013058, 0.01, 0.01, 0....
Name: dataMap, dtype: object
After setting the column elements as a string and slicing the portion I want, no error but it still doesn't produce the explode function correctly. Any insight to why?

The reason for the error is quite simple. You have a dictionary which you are trying to explode, which would not work.
#Removing the first row with dictionary
df.iloc[1:].explode('data')
#Without removing first row
df.explode('data')
You will have to take a call on how you want to convert this dictionary into a list. That would require a lambda function.

Related

ValueError: DataFrame constructor not properly called! when coverting dictionaries within list to pandas dataframe

I want to convert a list of dictionaries to a pandas dataframe, however, I got ValueError: DataFrame constructor not properly called!
Below is an example and how I got the data:
import requests
import pandas as pd
# Send an HTTP GET request to the URL
response = requests.get(url)
# Decode the JSON data into a dictionary
scrapped_data = response.text
Content of response.text is:
[{"id":123456,"date":"12-12-2022","value":37},{"id":123456,"date":"13-12-2022","value":38}]
I want to convert it to a dataframe format like the following:
id
date
value
123456
12-12-2022
37
123456
13-12-2022
38
I tried the following methods:
df = pd.DataFrame(scrapped_data)
df = pd.DataFrame_from_dict(scrapped_data)
df = pd.DataFrame(scrapped_data, orient='columns')
all got the same value errors.
I also tried:
df = pd.json_normalize(scrapped_data)
but got NotImplementedError
The type for scrapped_data is string format
Thanks for your help, let me know if you have any questions
One reason for receiving this error from pandas is providing str as data. I think your data come as str, If it is the case then Try this:
import json
import pandas as pd
orignal_data='[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]'
scraped_data = json.loads(orignal_data)
df = pd.DataFrame(data=scraped_data)
df
As you said, scrapped_data is a string then you need to convert it into a dictionary (with the method loads from the json library for example).
If scrapped_data = '[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]',
then you can just do df = pd.DataFrame(scrapped_data).

How do I get the groupby operation to work?

Can't get Pandas Groupby operation to work.
I suspect I need to convert the data to a pandas dataframe first? However, I can't seem to get that to work either.
import requests
import json
import pandas as pd
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
print(data)
def get_number_of_jobs(technology):
number_of_jobs = 0
number_of_jobs=data.groupby('technology').sum().loc[technology,:][0]
return technology,number_of_jobs
print(get_number_of_jobs('python'))
Thanks
data is a list of dictionaries, not DataFrame, so it doesn't have groupby. You don't really need it anyway, you can create the DataFrame while replacing the A and B columns with the first values in the json response and search for 'Python' there, it's already a single entry
baseurl = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/module%201/datasets/githubposting.json"
response = requests.get(baseurl)
data = response.json()
df = pd.DataFrame(columns=list(data[0].values()), data=[d.values() for d in data[1:]])
number_of_jobs = df.loc[df['technology'] == 'Python', 'number of job posting'].iloc[0]
print(number_of_jobs) # 51

Convert Pandas Data Table to Background Gradient

I have an issue with trying to convert my data table to a background gradient style. Every time I run the script, I'm not able to convert it somehow. I think it has to do that some data values in python won't convert right since they are in the wrong data form. Does anyone know how to help me with this issue?
try:
# For Python 3.0 and later
from urllib.request import urlopen
except ImportError:
print("Wrong version")
import json
def get_jsonparsed_data(url):
"""
Receive the content of ``url``, parse it as JSON and return the object.
Parameters
----------
url : str
Returns
-------
dict
"""
response = urlopen(url)
data = response.read().decode("utf-8")
return json.loads(data)
url = ("https://financialmodelingprep.com/api/v3/income-statement/AAPL?apikey=*******************")
print(get_jsonparsed_data(url))
data = get_jsonparsed_data(url)
import pandas as pd
import numpy as np
# Sets the pandas dataframe wide for vizualization
desired_width=1000
pd.set_option('display.width', desired_width)
np.set_printoptions(linewidth=desired_width)
pd.set_option('display.max_columns',100)
# Gradient color
df = pd.DataFrame(data)
df.info()
df.style.background_gradient(cmap='Blues',
low=0,
high=0,
axis=0,
subset=None,
text_color_threshold=0.408,
vmin=None,
vmax=None)
print(df)
Screenshots:
Calling .style.* doesn't convert anything.
So the print(df) in the end makes your call useless, it gets evaluated, and then nothing.
If you want to "convert your DataFrame" (you can't actually), create a new variable :
df_styled = df.style.background_gradient(...)
But note that
df is a DataFrame,
df_styled is an html representation of a DataFrame...
It's really different

python pandas dataframe missplaced

my pandas dataframe is not correctly placing items when i append new row to it.
i use a function to make it easier to append.
also when i append without function it works fine.
image
code:
from emailsender import email_send
import pandas as pd
import numpy as np
try:
file = pd.read_csv("customers.csv")
except:
pass
customers = {"name":["name"],
"last":["last"],
"age_range":[0],
"emails":["namelast#gmail.com"]}
df_customers = pd.DataFrame(customers)
def add_customer(df,name=np.nan,last=np.nan,age=np.nan,email=np.nan):
return df.append({"name":name,
"last":last,
"age_range":age,
"emails":email},ignore_index=True)
df_customers = (df_customers,"mohamed","miboun","mohamedwapana#gmail.com")
print(df_customers)
You can only append Series, df or list-like objects to a dataframe, but you are appending a dictionary to df. So, try this:
...
#your code befor append
...
df_to_add = pd.DataFrame({"name":[name], "last":[last], "age_range":[25], "emails":[email]})
return df.append(df_to_add, ignore_index=True)

How to create a dataframe from urlopen (csv)

My code:
# parse json returned from the API to Pandas DF
openUrl = urlopen(url)
r = openUrl.read()
openUrl.close()
#d = json.loads(r.decode())
#df = pd.DataFrame(d, index=[0])
df = pd.DataFrame(r, index=[0])
The error:
ValueError: DataFrame constructor not properly called!
Help would be aprreacited.
The DataFrame constructor requires an nd-array like input (or dict, iterable).
You can use pandas.read_csv if you want to directly input a csv and get a DataFrame.
Try printing r to see what is actually inside the response.
pandas.read_csv has a lot of option parameters to handle different types of csv, which of course depends on what you're getting from the url.
This snippet might help you.
import urllib.request
import pandas as pd
r = urllib.request.urlopen('HERE GOES YOUR LINK')
x = r.read()
print(type(x))
y = str(x)
df = pd.DataFrame([y], columns=['string_values'])
print (df)

Categories