Scraped json data want to output CSV file - python

So I have this data that I scraped
[
{
"id": 4321069,
"points": 52535,
"name": "Dennis",
"avatar": "",
"leaderboardPosition": 1,
"rank": ""
},
{
"id": 9281450,
"points": 40930,
"name": "Dinh",
"avatar": "https://uploads-us-west-2.insided.com/koodo-en/icon/90x90/aeaf8cc1-65b2-4d07-a838-1f078bbd2b60.png",
"leaderboardPosition": 2,
"rank": ""
},
{
"id": 1087209,
"points": 26053,
"name": "Sophia",
"avatar": "https://uploads-us-west-2.insided.com/koodo-en/icon/90x90/c3e9ffb1-df72-46e8-9cd5-c66a000e98fa.png",
"leaderboardPosition": 3,
"rank": ""
And so on... Big leaderboard of 20 ppl
Scraped with this code
import json
import requests
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
# print for all time:
data = requests.get(url_all_time).json()
# for item in data:
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for item in data:
print(item['name'], item['points'])
And I want to be able to create a table that ressembles this
Every time I scrape data, I want it to update the table with the number of points with a new data stamped as the header. So basically what I was thinking is that my index = usernames and the header = date. The problem is, I can't even get to make a csv file with that NAME/POINTS columns.
The only thing I have succeeded doing so far is writing ALL the data into a csv file. I haven't been able to pinpoint the data I want like in the print command.
EDIT : After reading what #Shijith posted I succeeded at transferring data to .csv but with what I have in mind (add more data as time flies), I was asking myself if I should do a code with an Index or without.
WITH
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, index=['name'], columns=['points','name'])
table.to_csv('products.csv', index=True, encoding='utf-8')
WITHOUT
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
data = pd.read_json(url_all_time)
table = pd.DataFrame.from_records(data, columns=['points','name'])
table.to_csv('products.csv', index=False, encoding='utf-8')

Have you tried just reading the json directly into a pandas dataframe? From here it should be pretty easy to transform it like you want. You could add a column for today's date and pivot it.
import pandas as pd
url_all_time = 'https://community.koodomobile.com/widget/pointsLeaderboard?period=allTime&maxResults=20&excludeRoles='
df = pd.read_json(url_all_time)
data['date'] = pd.Timestamp.today().strftime('%m-%d-%Y')
data.pivot(index='name',columns='date',values='points')

Related

Programatically create tables in Python

Here is an example Data Frame
import pandas as pd
from prettytable import PrettyTable
df = pd.DataFrame()
df["name"] = ["Nick","Bob", "George", "Jason","Death"]
df["Restaurant Manager"] = ["Sam","Mason", "Sam", "Mason","Mason"]
df["Score"] = [1,5, 7, 2,10]
df['Percentile Rank'] = [0,50,80,20,100]
df["Restaurant Name"] = "Elise"
What I am trying to do is to recreate this table (see screenshot) that we have in excel since we are in the process of trying to automate our reporting system.
I managed to get something a bit similar to this but I am stuck... Here is the following code:
#Add headers
column_names = ["Rank","Employee Name", "Score", "Percentile"]#Add columns
tb1.add_column(column_names[0],[1,2])
tb1.add_column(column_names[1],["Nick","George"])
tb1.add_column(column_names[2],[1,7])
tb1.add_column(column_names[3],[0,80])
tb1.title = "Elise"
print(tb1)
Not only would I like to replicate the table fully as in the image shared above. But I would also like to create multiple similar tables for each restaurant name and place them side by side if possible.
Create it from a dictionary:
df = pd.DataFrame({
"name": ["Nick","Bob", "George", "Jason","Death"],
"Restaurant Manager": ["Sam","Mason", "Sam", "Mason","Mason"],
"Score": [1,5, 7, 2,10],
'Percentile Rank': [0,50,80,20,100],
"Restaurant Name": ["Elise", "Elise", "Elise", "Elise", "Elise"]
})

How to read first array object from JSON using Python?

I have a JSON file and I need to convert that into CSV. But my JSON file contains JSON object which is an array and my all attributes are in that array but the code I am trying converts the first object into a single value but in actual I want all those attributes from JSON object.
JSON file content
{
"leads": [
{
"id": "31Y2V29CH0X82",
"product_type": "prelist"
},
{
"id": "2N649TAJBA50Z",
"product_type": "prelist"
}
],
"has_next_page": true,
"next_cursor": "2022-07-27T20:02:13.856000-07:00"
}
Python code
import pandas as pd
df = pd.read_json (r'C:\Users\Ron\Desktop\Test\Product_List.json')
df.to_csv (r'C:\Users\Ron\Desktop\Test\New_Products.csv', index = None)
The output I am getting is as following
And the output I want
I want the attributes as CSV content with headers?
I think you'll have to do this row by row.
data = {"leads": [{"id": "31Y2V29CH0X82", "product_type": "prelist"}, {"id": "2N649TAJBA50Z", "product_type": "prelist"}], "has_next_page": True,
"next_cursor": "2022-07-27T20:02:13.856000-07:00"}
headers = data.copy()
del headers['leads']
rows = []
for row in data['leads']:
row.update( headers )
rows.append( row )
import pandas as pd
df = pd.DataFrame( rows )
print(df)
Output:
id product_type has_next_page next_cursor
0 31Y2V29CH0X82 prelist True 2022-07-27T20:02:13.856000-07:00
1 2N649TAJBA50Z prelist True 2022-07-27T20:02:13.856000-07:00

JSON format to Data Frame in Python

I am reading API data from the cloud server in JSON format as shown here
How to write a code to store this data frame into the data frame in any database. How to convert JSON format to DataFrame?
The requirement output format is shown in table2
Here is an example:
from requests import request
import json
import pandas as pd
response = request(url="http://api.open-notify.org/astros.json", method='get')# API source
data=json.loads(response.text)['people']# pick the 'people' data source from json
pd.DataFrame(data) # convert to pandas dataframe
let me know if it works.
You can try this one, tell me if it works!
import request
import json
import pandas as pd
response = request(url='http://google.com') # Assuming the url
res = response.json()
df = pd.DataFrame(data)
Goodluck mate!
check out the docs here
import pandas as pd
df = pd.read_json("http://some.com/blah.json")
and as for storing it to a database you will need to know some things about your database connection. docs here
tablename = "my_tablename"
connection_values = <your sql alchemy connection here>
df.to_sql(name=tablename, con=connection_values)
To help others also, here is an example with nested json. I have tried to make this similar to the example you showed in your question.
import json
import pandas as pd
import pandas.io.json as pd_json
jsondata = '''
{
"source": { "id": "2480300" },
"time": "2013-07-02T16:32:30.152+02:00",
"type": "huawei_E3131SignalStrength",
"c8y_SignalStrength": {
"rssi": { "value": -53, "unit": "dBm" },
"ber": { "value": 0.14, "unit": "%" }
}
}
'''
data = pd_json.loads(jsondata) #load
df=pd_json.json_normalize(data) #normalise
df
Result:
c8y_SignalStrength.ber.unit c8y_SignalStrength.ber.value c8y_SignalStrength.rssi.unit c8y_SignalStrength.rssi.value source.id time type
% 0.14 dBm -53 2480300 2013-07-02T16:32:30.152+02:00 huawei_E3131SignalStrength

pd.read_json() returning dataframe with 1 column

Currently I'm trying to load a json file from a webscrape into python in order to search reorder some of the columns, remove some text such as the (\n), etc. I'm having some issues with the json file, the pd.read_json() works (kinda). It returns a dataframe with 1 column titled 'Default'. My current code is below and runs without errors.
I tried the native JSON interpreter but due to some stylized characters and I receive an error.
def main():
file_path = filedialog.askopenfilename()
df = pd.read_json(file_path)
print(df)
Json file is valid and formatted as so:
{
"Default": [{
"ItemID": "11111",
"Title": "A super captivating title",
"Date": "July 22, 2019",
"URL": "www.someurl.com",
"BodyText": "some text."
}, {
"ItemID": "22222",
"Title": "Even more captivating title",
"Date": "July 12, 2019",
"URL": "www.differenturl.com",
"BodyText": "different text"
}]
}
Now I understand that the "Default" is being interpreted as the JSON object and why it's using it as the column. I experimented with several different orients of the read_json() but received more or less the same result.
I'm hoping to have ItemID, Title, Date, URL, and BodyText be the columns and their values being appropriately designated into rows. Any help is appreciated, I couldn't find a similar question but if it has been answered before please point me in the right direction.
There is no read_json orient that will do it. What you need is to pass the "Default" content to the DataFrame constructor:
import json
import pandas as pd
with open('temp.txt') as fh:
df = pd.DataFrame(json.load(fh)['Default'])

Generate csv from nested json python

I have following nested json file, which I need to convert in pandas dataframe, the main problem is there is only one unique item in the whole json and it is very deeply nested.
I tried to solve this problem with the following code, but it gives repeating output.
[{
"questions": [{
"key": "years-age",
"responseKey": null,
"responseText": "27",
"responseKeys": null
},
{
"key": "gender",
"responseKey": "male",
"responseText": null,
"responseKeys": null
}
],
"transactions": [{
"accId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
"tId": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
"catId": "21001000",
"tType": "80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53",
"name": "Online Transfer FROM CHECKING 1200454623",
"category": [
"Transfer",
"Acc Transfer"
]
}
],
"institutions": [{
"InstName": "Citizens company",
"InstId": "inst_1",
"accounts": [{
"pAccId": "v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ",
"pAccType": "depo",
"pAccSubtype": "check",
"_id": "5ad38837e806efaa90da4849"
}]
}]
}]
I need to convert this to pandas dataframe as follows:
id pAccId tId
5ad38837e806efaa90da4849 v1BN3o9Qy9izz4Jdz0M6C44Oga0qjohkOV3EJ 80o4V19Kd9SqqN80qDXZuoov4rDob8crDaE53
The main problem I am facing is with the "id" as it very deeply nested which is the only unique key for the json.
here is my code:
import pandas as pd
import json
with open('sub.json') as f:
data = json.load(f)
csv = ''
for k in data:
for t in k.get("institutions"):
csv += k['institutions'][0]['accounts'][0]['_id']
csv += "\t"
csv += k['institutions'][0]['accounts'][0]['pAccId']
csv += "\t"
csv += k['transactions'][]['tId']
csv += "\t"
csv += "\n"
text_file = open("new_sub.csv", "w")
text_file.write(csv)
text_file.close()
Hope above code makes sense, as I am new to python.
Read the JSON file and create a dictionary of account pAccId keys mapped to account.
Build the list of transactions as well.
with open('sub.json', 'r') as file:
records = json.load(file)
accounts = {
account['pAccId']: account
for record in records
for institution in record['institutions']
for account in institution['accounts']
}
transactions = (
transaction
for record in records
for transaction in record['transactions']
)
Open a csv file. For each transaction, get account for it from the accounts dictionary.
with open('new_sub.csv', 'w') as file:
file.write('id, pAccId, tId\n')
for transaction in transactions:
pAccId = transaction['accId']
account = accounts[pAccId]
_id = account['_id']
tId = transaction['tId']
file.write(f"{_id}, {pAccId}, {tId}\n")
Finally, read csv file to pandas.DataFrame.
df = pd.read_csv('new_sub.csv')

Categories