Here is an example Data Frame
import pandas as pd
from prettytable import PrettyTable
df = pd.DataFrame()
df["name"] = ["Nick","Bob", "George", "Jason","Death"]
df["Restaurant Manager"] = ["Sam","Mason", "Sam", "Mason","Mason"]
df["Score"] = [1,5, 7, 2,10]
df['Percentile Rank'] = [0,50,80,20,100]
df["Restaurant Name"] = "Elise"
What I am trying to do is to recreate this table (see screenshot) that we have in excel since we are in the process of trying to automate our reporting system.
I managed to get something a bit similar to this but I am stuck... Here is the following code:
#Add headers
column_names = ["Rank","Employee Name", "Score", "Percentile"]#Add columns
tb1.add_column(column_names[0],[1,2])
tb1.add_column(column_names[1],["Nick","George"])
tb1.add_column(column_names[2],[1,7])
tb1.add_column(column_names[3],[0,80])
tb1.title = "Elise"
print(tb1)
Not only would I like to replicate the table fully as in the image shared above. But I would also like to create multiple similar tables for each restaurant name and place them side by side if possible.
Create it from a dictionary:
df = pd.DataFrame({
"name": ["Nick","Bob", "George", "Jason","Death"],
"Restaurant Manager": ["Sam","Mason", "Sam", "Mason","Mason"],
"Score": [1,5, 7, 2,10],
'Percentile Rank': [0,50,80,20,100],
"Restaurant Name": ["Elise", "Elise", "Elise", "Elise", "Elise"]
})
Related
I am trying to export the data frame to an image. I used the dataframe_image lib to do this activity.
import pandas as pd
import dataframe_image as dfi
data = [
{
"name": "John",
"gender": "Male"
},
{
"name": "Martin",
"gender": "Female"
}
]
df = pd.json_normalize(data)
dfi.export(df, 'table.png')
The exported image looks like the below:
I want to remove the index column from this. How can I do that ?
You can set the style to hide the index:
dfi.export(df.style.hide(axis='index'), 'table.png')
So I have this data that I scraped
[
{
"id": 4321069,
"points": 52535,
"name": "Dennis",
"avatar": "",
"leaderboardPosition": 1,
"rank": ""
},
{
"id": 9281450,
"points": 40930,
"name": "Dinh",
"avatar": "https://uploads-us-west-2.insided.com/koodo-en/icon/90x90/aeaf8cc1-65b2-4d07-a838-1f078bbd2b60.png",
"leaderboardPosition": 2,
"rank": ""
},
{
"id": 1087209,
"points": 26053,
"name": "Sophia",
"avatar": "https://uploads-us-west-2.insided.com/koodo-en/icon/90x90/c3e9ffb1-df72-46e8-9cd5-c66a000e98fa.png",
"leaderboardPosition": 3,
"rank": ""
And so on... Big leaderboard of 20 ppl
Scraped with this code
import json
import requests
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
# print for all time:
data = requests.get(url_all_time).json()
# for item in data:
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for item in data:
print(item['name'], item['points'])
And I want to be able to create a table that ressembles this
Every time I scrape data, I want it to update the table with the number of points with a new data stamped as the header. So basically what I was thinking is that my index = usernames and the header = date. The problem is, I can't even get to make a csv file with that NAME/POINTS columns.
The only thing I have succeeded doing so far is writing ALL the data into a csv file. I haven't been able to pinpoint the data I want like in the print command.
EDIT : After reading what #Shijith posted I succeeded at transferring data to .csv but with what I have in mind (add more data as time flies), I was asking myself if I should do a code with an Index or without.
WITH
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, index=['name'], columns=['points','name'])
table.to_csv('products.csv', index=True, encoding='utf-8')
WITHOUT
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, columns=['points','name'])
table.to_csv('products.csv', index=False, encoding='utf-8')
Have you tried just reading the json directly into a pandas dataframe? From here it should be pretty easy to transform it like you want. You could add a column for today's date and pivot it.
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
df = pd.read_json(url_all_time)
data['date'] = pd.Timestamp.today().strftime('%m-%d-%Y')
data.pivot(index='name',columns='date',values='points')
I am reading API data from the cloud server in JSON format as shown here
How to write a code to store this data frame into the data frame in any database. How to convert JSON format to DataFrame?
The requirement output format is shown in table2
Here is an example:
from requests import request
import json
import pandas as pd
response = request(url="http://api.open-notify.org/astros.json", method='get')# API source
data=json.loads(response.text)['people']# pick the 'people' data source from json
pd.DataFrame(data) # convert to pandas dataframe
let me know if it works.
You can try this one, tell me if it works!
import request
import json
import pandas as pd
response = request(url='http://google.com') # Assuming the url
res = response.json()
df = pd.DataFrame(data)
Goodluck mate!
check out the docs here
import pandas as pd
df = pd.read_json("http://some.com/blah.json")
and as for storing it to a database you will need to know some things about your database connection. docs here
tablename = "my_tablename"
connection_values = <your sql alchemy connection here>
df.to_sql(name=tablename, con=connection_values)
To help others also, here is an example with nested json. I have tried to make this similar to the example you showed in your question.
import json
import pandas as pd
import pandas.io.json as pd_json
jsondata = '''
{
"source": { "id": "2480300" },
"time": "2013-07-02T16:32:30.152+02:00",
"type": "huawei_E3131SignalStrength",
"c8y_SignalStrength": {
"rssi": { "value": -53, "unit": "dBm" },
"ber": { "value": 0.14, "unit": "%" }
}
}
'''
data = pd_json.loads(jsondata) #load
df=pd_json.json_normalize(data) #normalise
df
Result:
c8y_SignalStrength.ber.unit c8y_SignalStrength.ber.value c8y_SignalStrength.rssi.unit c8y_SignalStrength.rssi.value source.id time type
% 0.14 dBm -53 2480300 2013-07-02T16:32:30.152+02:00 huawei_E3131SignalStrength
Currently I'm trying to load a json file from a webscrape into python in order to search reorder some of the columns, remove some text such as the (\n), etc. I'm having some issues with the json file, the pd.read_json() works (kinda). It returns a dataframe with 1 column titled 'Default'. My current code is below and runs without errors.
I tried the native JSON interpreter but due to some stylized characters and I receive an error.
def main():
file_path = filedialog.askopenfilename()
df = pd.read_json(file_path)
print(df)
Json file is valid and formatted as so:
{
"Default": [{
"ItemID": "11111",
"Title": "A super captivating title",
"Date": "July 22, 2019",
"URL": "www.someurl.com",
"BodyText": "some text."
}, {
"ItemID": "22222",
"Title": "Even more captivating title",
"Date": "July 12, 2019",
"URL": "www.differenturl.com",
"BodyText": "different text"
}]
}
Now I understand that the "Default" is being interpreted as the JSON object and why it's using it as the column. I experimented with several different orients of the read_json() but received more or less the same result.
I'm hoping to have ItemID, Title, Date, URL, and BodyText be the columns and their values being appropriately designated into rows. Any help is appreciated, I couldn't find a similar question but if it has been answered before please point me in the right direction.
There is no read_json orient that will do it. What you need is to pass the "Default" content to the DataFrame constructor:
import json
import pandas as pd
with open('temp.txt') as fh:
df = pd.DataFrame(json.load(fh)['Default'])
I'm currently working on a project that will be analyzing multiple data sources for information, other data sources are fine but I am having a lot of trouble with json and its sometimes deeply nested structure. I have tried to turn the json into a python dictionary, but with not much luck as it can start to struggle as it gets more complicated. For example with this sample json file:
{
"Employees": [
{
"userId": "rirani",
"jobTitleName": "Developer",
"firstName": "Romin",
"lastName": "Irani",
"preferredFullName": "Romin Irani",
"employeeCode": "E1",
"region": "CA",
"phoneNumber": "408-1234567",
"emailAddress": "romin.k.irani#gmail.com"
},
{
"userId": "nirani",
"jobTitleName": "Developer",
"firstName": "Neil",
"lastName": "Irani",
"preferredFullName": "Neil Irani",
"employeeCode": "E2",
"region": "CA",
"phoneNumber": "408-1111111",
"emailAddress": "neilrirani#gmail.com"
}
]
}
after converting to dictionary and doing dict.keys() only returns "Employees".
I then resorted to instead opt for a pandas dataframe and I could achieve what I wanted by calling json_normalize(dict['Employees'], sep="_") but my problem is that it must work for ALL jsons and looking at the data beforehand is not an option so my method of normalizing this way will not always work. Is there some way I could write some sort of function that would take in any json and convert it into a nice pandas dataframe? I have searched for about 2 weeks for answers bt with no luck regarding my specific problem. Thanks
I've had to do that in the past (Flatten out a big nested json). This blog was really helpful. Would something like this work for you?
Note, like the others have stated, for this to work for EVERY json, is a tall task, I'm merely offering a way to get started if you have a wider range of json format objects. I'm assuming they will be relatively CLOSE to what you posted as an example with hopefully similarly structures.)
jsonStr = '''{
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}'''
It flattens out the entire json into single rows, then you can put into a dataframe. In this case it creates 1 row with 18 columns. Then iterates through those columns, using the number values within those column names to reconstruct into multiple rows. If you had a different nested json, I'm thinking it theoretically should work, but you'll have to test it out.
import json
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
jsonObj = json.loads(jsonStr)
flat = flatten_json(jsonObj)
results = pd.DataFrame()
columns_list = list(flat.keys())
for item in columns_list:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
column = item.replace('_'+row_idx+'_', '_')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
print (results)
Output:
print (results)
Employees_userId ... Employees_emailAddress
0 rirani ... romin.k.irani#gmail.com
1 nirani ... neilrirani#gmail.com
[2 rows x 9 columns]
d={
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}
import pandas as pd
df=pd.DataFrame([x.values() for x in d["Employees"]],columns=d["Employees"][0].keys())
print(df)
Output
userId jobTitleName firstName ... region phoneNumber emailAddress
0 rirani Developer Romin ... CA 408-1234567 romin.k.irani#gmail.com
1 nirani Developer Neil ... CA 408-1111111 neilrirani#gmail.com
[2 rows x 9 columns]
For the particular JSON data given. My approach, which uses pandas package only, follows:
import pandas as pd
# json as python's dict object
jsn = {
"Employees" : [
{
"userId":"rirani",
"jobTitleName":"Developer",
"firstName":"Romin",
"lastName":"Irani",
"preferredFullName":"Romin Irani",
"employeeCode":"E1",
"region":"CA",
"phoneNumber":"408-1234567",
"emailAddress":"romin.k.irani#gmail.com"
},
{
"userId":"nirani",
"jobTitleName":"Developer",
"firstName":"Neil",
"lastName":"Irani",
"preferredFullName":"Neil Irani",
"employeeCode":"E2",
"region":"CA",
"phoneNumber":"408-1111111",
"emailAddress":"neilrirani#gmail.com"
}]
}
# get the main key, here 'Employees' with index '0'
emp = list(jsn.keys())[0]
# when you have several keys at this level, i.e. 'Employers' for example
# .. you need to handle all of them too (your task)
# get all the sub-keys of the main key[0]
all_keys = jsn[emp][0].keys()
# build dataframe
result_df = pd.DataFrame() # init a dataframe
for key in all_keys:
col_vals = []
for ea in jsn[emp]:
col_vals.append(ea[key])
# add a new column to the dataframe using sub-key as its header
# it is possible that values here is a nested object(s)
# .. such as dict, list, json
result_df[key]=col_vals
print(result_df.to_string())
Output:
userId lastName jobTitleName phoneNumber emailAddress employeeCode preferredFullName firstName region
0 rirani Irani Developer 408-1234567 romin.k.irani#gmail.com E1 Romin Irani Romin CA
1 nirani Irani Developer 408-1111111 neilrirani#gmail.com E2 Neil Irani Neil CA