Compare JSON with Dictionary and put Key-Value in JSON in Python - python

I have one dictionary and one one json file. I want to check the data exist in dictionary and put the value pair in the json on the compared attribute.
#import pandas as pd
import numpy as np
import pandas as pd
df = pd.read_csv("Iris2.csv" , encoding='ISO-8859-1')
df.head()
dict_from_csv = pd.read_csv('Iris2.csv',encoding='ISO-8859-1', header=None, index_col=0, squeeze=True).to_dict()
print(dict_from_csv)
enter image description here
And then I read the JSON attribute
import pandas as pd
json = pd.read_json (r'C:/Users/IT City/Downloads/data.json')
print(json)
json = pd.read_json (r'C:/Users/IT City/Downloads/data.json')
df.venue_info = pd.DataFrame(json.venue_info.values.tolist())['venue_name']
print(df.venue_info)
[enter image description here][2]
Now I have dictionary contains the csv file "dict_from_csv" and json attribute "df.venue_info"
I firstly compared the json venue_name with Dictionary and got the required results. I have the "Lat" attribute finally. And now I want to add this new attribute to JSON file where the "Lat" would be match otherwise it should place empty attribute on that.
for x in df.venue_info:
if((x in dict_from_csv) == True):
#print(x)
#print(dict_from_csv[x])
Lat = x+":"+dict_from_csv[x]
print(Lat)
else:
print("Not found ")
enter image description here
Please help me in this regard
Thank you

Related

A DataFrame object does not have an attribute select

In palantir foundry, I am trying to read all xml files from a dataset. Then, in a for loop, I parse the xml files.
Until the second last line, the code runs fine without errors.
from transforms.api import transform, Input, Output
from transforms.verbs.dataframes import sanitize_schema_for_parquet
from bs4 import BeautifulSoup
import pandas as pd
import lxml
#transform(
output=Output("/Spring/xx/datasets/mydataset2"),
source_df=Input("ri.foundry.main.dataset.123"),
)
def read_xml(ctx, source_df, output):
df = pd.DataFrame()
filesystem = source_df.filesystem()
hadoop_path = filesystem.hadoop_path
files = [f"{hadoop_path}/{f.path}" for f in filesystem.ls()]
for i in files:
with open(i, 'r') as f:
file = f.read()
soup = BeautifulSoup(file,'xml')
data = []
for e in soup.select('offer'):
data.append({
'meldezeitraum': e.find_previous('data').get('meldezeitraum'),
'id':e.get('id'),
'parent_id':e.get('parent_id'),
})
df = df.append(data)
output.write_dataframe(sanitize_schema_for_parquet(df))
However, as soon as I add the last line:
output.write_dataframe(sanitize_schema_for_parquet(df))
I get this error:
Missing transform attribute
A DataFrame object does not have an attribute select. Please check the spelling and/or the datatype of the object.
/transforms-python/src/myproject/datasets/mydataset.py
output.write_dataframe(sanitize_schema_for_parquet(df))
What am I doing wrong?
You have to convert your pandas DataFrame to a spark DataFrame. Even though they have the same name those are two different object types in python.
The easiest way to do that is
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
df_spark = spark.createDataFrame(df)
You can then pass the spark_df to the output.write_dataframe() function

ValueError: DataFrame constructor not properly called! when coverting dictionaries within list to pandas dataframe

I want to convert a list of dictionaries to a pandas dataframe, however, I got ValueError: DataFrame constructor not properly called!
Below is an example and how I got the data:
import requests
import pandas as pd
# Send an HTTP GET request to the URL
response = requests.get(url)
# Decode the JSON data into a dictionary
scrapped_data = response.text
Content of response.text is:
[{"id":123456,"date":"12-12-2022","value":37},{"id":123456,"date":"13-12-2022","value":38}]
I want to convert it to a dataframe format like the following:
id
date
value
123456
12-12-2022
37
123456
13-12-2022
38
I tried the following methods:
df = pd.DataFrame(scrapped_data)
df = pd.DataFrame_from_dict(scrapped_data)
df = pd.DataFrame(scrapped_data, orient='columns')
all got the same value errors.
I also tried:
df = pd.json_normalize(scrapped_data)
but got NotImplementedError
The type for scrapped_data is string format
Thanks for your help, let me know if you have any questions
One reason for receiving this error from pandas is providing str as data. I think your data come as str, If it is the case then Try this:
import json
import pandas as pd
orignal_data='[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]'
scraped_data = json.loads(orignal_data)
df = pd.DataFrame(data=scraped_data)
df
As you said, scrapped_data is a string then you need to convert it into a dictionary (with the method loads from the json library for example).
If scrapped_data = '[{"id":"123456","date":"12-12-2022","value":"37"}, {"id":"123456","date":"13-12-2022","value":"38"}]',
then you can just do df = pd.DataFrame(scrapped_data).

Removing Values from Pandas Read Excel

I am trying to read values from an excel and change them to json to use in my API.
I am getting:
{"Names":{"0":"Tom","1":"Bill","2":"Sally","3":"Cody","4":"Betty"}}
I only want to see the values. What I would like to get is this:
{"Names":{"Tom", "Bill", "Sally", "Cody", "Betty"}}
I haven't figured out how to remove the numbers before the values.
The code I am using is as follows:
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\Names.xlsx')
json_str = df.to_json()
print(json_str)
As mentioned in the comments your desired result is not valid json.
maybe you can do this:
import json
import pandas as pd
df = pd.read_excel(r'C:\Users\User\Desktop\Names.xlsx')
json_str = df.to_json()
temp = json.loads(json_str)
temp['Names'] = list(temp['Names'].values())
print(json.dumps(temp))

How to make a DataFrame from the nested JSON dictionary

I am trying to make a DataFrame with all values from this address: https://www.ebi.ac.uk/pdbe/api/pisa/interfacecomponent/3gcb/0/1/energetics. But The DataFrame I get is very messy and it doesnt provide all the information contained in the JSON dictionary. I am using this code but the result is bad:
import numpy as np
import pandas as pd
import requests
import json
url = 'https://www.ebi.ac.uk/pdbe/api/pisa/interfacecomponent/3gcb/0/1/energetics'
JSONContent = requests.get(url).json()
content = json.dumps(JSONContent, indent = 4, sort_keys=True)
data = json.loads(content)
df = pd.io.json.json_normalize(data)
print df
Can someone help please?

Reading dictionary stored on text file and convert to pandas dataframe [duplicate]

This question already has answers here:
Pandas read nested json
(3 answers)
Closed 4 years ago.
I have a text file that contains a series of data in the form of dictionary.
I would like to read and store as a data frame in pandas.
How would I read.
I read pd.csv yet it does not give me the dataframe.
Can anyone help me with that?
You can download the text file Here
Thanks,
Zep,
The problem is you have a nested json. Try using json_normalize instead:
import requests #<-- requests library helps us handle http-requests
import pandas as pd
id_ = '1DbfQxBJKHvWO2YlKZCmeIN4al3xG8Wq5'
url = 'https://drive.google.com/uc?authuser=0&id={}&export=download'.format(id_)
r = requests.get(url)
df = pd.io.json.json_normalize(r.json())
print(df.columns)
or from hard drive, and json_normalize as wants to read a dictionary object and not a path:
import pandas as pd
import json
with open('myfile.json') as f:
jsonstr = json.load(f)
df = pd.io.json.json_normalize(jsonstr)
Returns:
Index(['average.accelerations', 'average.aerialDuels', 'average.assists',
'average.attackingActions', 'average.backPasses', 'average.ballLosses',
'average.ballRecoveries', 'average.corners', 'average.crosses',
'average.dangerousOpponentHalfRecoveries',
...
'total.successfulLongPasses', 'total.successfulPasses',
'total.successfulPassesToFinalThird', 'total.successfulPenalties',
'total.successfulSmartPasses', 'total.successfulThroughPasses',
'total.successfulVerticalPasses', 'total.throughPasses',
'total.verticalPasses', 'total.yellowCards'],
dtype='object', length=171)
Another idea would be to store the nested objects in a Series (and you can let a dictionary hold that those series).
dfs = {k: pd.Series(v) for k,v in r.json().items()}
print(dfs.keys())
# ['average', 'seasonId', 'competitionId', 'positions', 'total', 'playerId', 'percent'])
print(dfs['percent'])
Returns:
aerialDuelsWon 23.080
defensiveDuelsWon 18.420
directFreeKicksOnTarget 0.000
duelsWon 33.470
fieldAerialDuelsWon 23.080
goalConversion 22.581
headShotsOnTarget 0.000
offensiveDuelsWon 37.250
penaltiesConversion 0.000
shotsOnTarget 41.940
...
yellowCardsPerFoul 12.500
dtype: float64
The data only has one entry though.
You can convert you data to json after reading it as string, then use pandas.read_json() to convert your json to a dataframe.
Example:
import json
from pandas.io.json import json_normalize
f = open("file.txt", "w+")
contents = f.read()
contents = contents.replace("\n", "")
json_data = json.loads(contents)
df = json_normalize(json.loads(data))
You should have your data as a dataframe after that.
Hope this helps!

Categories