pandas - read json file with only a list - python

I have a given .json file which was saved as a list format (I guess it not proper json format)
as following:
users.json:
[ "user1", "user2" ]
I would like to read it into a pandas data frame and I tried using different types of arguments in the orient argument as following:
import pandas as pd
nodes = pd.read_json('users.json', orient='split')
I would like the results to look like this:
desired_df = pd.DataFrame({'col1': ["user1", "user2"]})
The closest so question I found was
this one
Any help on that would be great! thanks in advance

The code below will create the df for you.
BTW - the json file is a valid json
import pandas as pd
import json
with open('users.json') as f:
data = json.load(f)
desired_df = pd.DataFrame({'col1':data})
print(desired_df)
output
col1
0 user1
1 user2

Related

How to read a json data into a dataframe using pandas

I have json data which is in the structure below:
{"Text1": 4, "Text2": 1, "TextN": 123}
I want to read the json file and make a dataframe such as
Each key value pairs will be a row in the dataframe and I need to need headers "Sentence" and "Label". I tried with using lines = True but it returns all the key-value pairs in one row.
data_df = pd.read_json(PATH_TO_DATA, lines = True)
What is the correct way to load such json data?
you can use:
with open('json_example.json') as json_data:
data = json.load(json_data)
df=pd.DataFrame.from_dict(data,orient='index').reset_index().rename(columns={'index':'Sentence',0:'Label'})
Easy way that I remember
import pandas as pd
import json
with open("./data.json", "r") as f:
data = json.load(f)
df = pd.DataFrame({"Sentence": data.keys(), "Label": data.values()})
With read_json
To read straight from the file using read_json, you can use something like:
pd.read_json("./data.json", lines=True)\
.T\
.reset_index()\
.rename(columns={"index": "Sentence", 0: "Labels"})
Explanation
A little dirty but as you probably noticed, lines=True isn't completely sufficient so the above transposes the result so that you have
(index)
0
Text1
4
Text2
1
TextN
123
So then resetting the index moves the index over to be a column named "index" and then renaming the columns.

How to convert list data to xml using python

I have been trying to convert list data to xml file.
But getting below error : ValueError: Invalid tag name '0'
This is my header : 'Name,Job Description,Course'
Code:
import pandas as pd
lst = [ 'Name,Job Description,Course' ,
'Bob,Backend Developer,MCA',
'Raj,Business Analyst,BMS',
'Alice,FullStack Developer,CS' ]
df = pd.DataFrame(lst)
with open('output.xml', 'w') as myfile:
myfile.write(df.to_xml())
The df you created is improper. There are two scenarios.
If you took name, job description, course as single header. You
will fail at the point of saving df to xml.
In order to save df as xml there is a format that need to be followed.
Below solution works. Hope this is what you are trying to achieve.
import pandas as pd
lst = [ ['Name','Job_Description','Course'] ,
['Bob','Backend Developer','MCA'],
['Raj','Business Analyst','BMS'],
['Alice','FullStack Developer','CS'] ]
df = pd.DataFrame(lst[1:], columns=[lst[0]])
print(df)
df.to_xml('./output.xml')

how to insert data from list into excel in python

how to insert data from list into excel in python
for example i exported this data from log file :
data= ["101","am1","123450","2015-01-01 11:19:00","test1 test1".....]
["102","am2","123451","2015-01-01 11:20:00","test2 test3".....]
["103","am3","123452","2015-01-01 11:21:00","test3 test3".....]
Output result:
[1]: https://i.stack.imgur.com/7uTOE.png
.
The module pandas has a DataFrame.to_excel() function that would do that.
import pandas as pd
data= [["101","am1","123450","2015-01-01 11:19:00","test1 test1"],
["102","am2","123451","2015-01-01 11:20:00","test2 test3"],
["103","am3","123452","2015-01-01 11:21:00","test3 test3"]]
df = pd.DataFrame(data)
df.to_excel('my_data.xmls')
That should do it.

Removing Values from Pandas Read Excel

I am trying to read values from an excel and change them to json to use in my API.
I am getting:
{"Names":{"0":"Tom","1":"Bill","2":"Sally","3":"Cody","4":"Betty"}}
I only want to see the values. What I would like to get is this:
{"Names":{"Tom", "Bill", "Sally", "Cody", "Betty"}}
I haven't figured out how to remove the numbers before the values.
The code I am using is as follows:
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\Names.xlsx')
json_str = df.to_json()
print(json_str)
As mentioned in the comments your desired result is not valid json.
maybe you can do this:
import json
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\Names.xlsx')
json_str = df.to_json()
temp = json.loads(json_str)
temp['Names'] = list(temp['Names'].values())
print(json.dumps(temp))

Reading dictionary stored on text file and convert to pandas dataframe [duplicate]

This question already has answers here:
Pandas read nested json
(3 answers)
Closed 4 years ago.
I have a text file that contains a series of data in the form of dictionary.
I would like to read and store as a data frame in pandas.
How would I read.
I read pd.csv yet it does not give me the dataframe.
Can anyone help me with that?
You can download the text file Here
Thanks,
Zep,
The problem is you have a nested json. Try using json_normalize instead:
import requests #<-- requests library helps us handle http-requests
import pandas as pd
id_ = '1DbfQxBJKHvWO2YlKZCmeIN4al3xG8Wq5'
url = 'https://drive.google.com/uc?authuser=0&id={}&export=download'.format(id_)
r = requests.get(url)
df = pd.io.json.json_normalize(r.json())
print(df.columns)
or from hard drive, and json_normalize as wants to read a dictionary object and not a path:
import pandas as pd
import json
with open('myfile.json') as f:
jsonstr = json.load(f)
df = pd.io.json.json_normalize(jsonstr)
Returns:
Index(['average.accelerations', 'average.aerialDuels', 'average.assists',
'average.attackingActions', 'average.backPasses', 'average.ballLosses',
'average.ballRecoveries', 'average.corners', 'average.crosses',
'average.dangerousOpponentHalfRecoveries',
...
'total.successfulLongPasses', 'total.successfulPasses',
'total.successfulPassesToFinalThird', 'total.successfulPenalties',
'total.successfulSmartPasses', 'total.successfulThroughPasses',
'total.successfulVerticalPasses', 'total.throughPasses',
'total.verticalPasses', 'total.yellowCards'],
dtype='object', length=171)
Another idea would be to store the nested objects in a Series (and you can let a dictionary hold that those series).
dfs = {k: pd.Series(v) for k,v in r.json().items()}
print(dfs.keys())
# ['average', 'seasonId', 'competitionId', 'positions', 'total', 'playerId', 'percent'])
print(dfs['percent'])
Returns:
aerialDuelsWon 23.080
defensiveDuelsWon 18.420
directFreeKicksOnTarget 0.000
duelsWon 33.470
fieldAerialDuelsWon 23.080
goalConversion 22.581
headShotsOnTarget 0.000
offensiveDuelsWon 37.250
penaltiesConversion 0.000
shotsOnTarget 41.940
...
yellowCardsPerFoul 12.500
dtype: float64
The data only has one entry though.
You can convert you data to json after reading it as string, then use pandas.read_json() to convert your json to a dataframe.
Example:
import json
from pandas.io.json import json_normalize
f = open("file.txt", "w+")
contents = f.read()
contents = contents.replace("\n", "")
json_data = json.loads(contents)
df = json_normalize(json.loads(data))
You should have your data as a dataframe after that.
Hope this helps!

Categories