Hey so I have some hash ids in a csv file like
XbRPhe65YbC+xtgGQ8ukeZEr9xFOC4MEs9Z0wUidGSec=
XbRPhe65YbC+xtgGQ8uksrqSUJ/HhTPj1d2pL0/vuGrHM=
and I want to parse them into python wrap them in some additional code like
{"id" :"XbRPshe65YbC+xtGQ8ukqR2u2btfNeNe2gtcs72QbxPA=", "timestamp":"20150831"},
and then wrap all of that in some JSON syntax. This is then sent as a post request. Problem is I cannot seem to make it JSON readable. Everything seems to be ordered wrong and I am getting extra \.
import os
import pandas as pd
from pprint import pprint
df=pd.read_csv('test.csv',sep=',',header=None)
df[0] = '{"id" :"' + df[0].astype(str) + '", "timestamp":"20150831"}, '
df = df[:-1] # removes last comma
test = 'hello'
data =[ { "ids":[ df[0]],
"attributes":[
{
"name":"girl"
},
{
"name":"size"
}
]
}
]
json1 = data.to_json()
print(json1)
I agree that pandas doesn't seem to be the simplest tool for the job here. The built-in libraries will work great:
import csv
import json
with open('test.csv', newline='') as csvfile:
csvreader = csv.reader(csvfile)
data = {
"ids": [{"id": row[0], "timestamp": "20150831"} for row in csvreader],
"attributes": [
{"name": "girl"},
{"name": "size"}
]
}
json1 = json.dumps(data)
print(json1)
Related
I have 100 json file, each json file contains following kind of dict format.
I would like to create a csv file and dump only
{
"label": "image",
"confidence": 1.0
}
this data into csv file into prediction column along with json file name. How would I do it?
rowIf I understand ok what you want, you want to get only the first "item" on predictions list, from multiple files on some path. And then put all this as rows on a csv. So you can do something like:
import csv
import json
from os import listdir
from os.path import isfile, join
path = 'path/to/dir'
result = []
for file_name in listdir(path):
with open(join(path, file_name), 'r') as f:
data = json.load(f)
first = data['predictions'][0]
result.append([first['label'], first['confidence']])
with open('path/to/result.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['label', 'confidence']) # Comment this line if you dont want a header row
writer.writerows(result)
Replacing 'path/to/dir' with the path of the json files directory, and 'path/to/result.csv' with the path to the result csv file.
Assuming you know already how to read the json file as dictionary in python. You could do this.
import pandas as pd
json_data = { "predictions": [
{
"label": "empty",
"confidence": 1.0
},
{
"label": "filled",
"confidence": 9.40968867750501e-25
},
{
"label": "no-checkbox",
"confidence": 1.7350328516351668e-28
}
]
}
output_df = pd.DataFrame(json_data['predictions'])
print(output_df)
label confidence
0 empty 1.000000e+00
1 filled 9.409689e-25
2 no-checkbox 1.735033e-28
https://github.com/Asabeneh/30-Days-Of-Python/blob/ff24ab221faaec455b664ad5bbdc6e0de76c3caf/data/countries_data.json
how can i loop through this countries_data.json file (see link above) to get 'languages'
i have tried:
import json
f = open("countries_data.json")
file = f.read()
# print(file)
for item in file:
print(item)
You have everything correct and set up but you didn't load the json file. Also there is a double space on "f = open". You also didn't open the file with the read parameter, not too sure if its needed though.
Correct code:
import json
f = open("countries_data.json", "r")
file = json.loads(f.read())
for item in file:
print(item)
Hope this helped, always double check your code.
You can see that you import the json module at the beginning, so you might as well use it
If you go to the documentation you will see a function allowing you to read this file directly.
In the end you end up with just a dictionary list, the code can be summarized as follows.
import json
with open("test/countries_data.json") as file:
data = json.load(file)
for item in data:
print(item["languages"])
You are missing one essential step, which is parsing the JSON data to Python datastructures.
import json
# read file
f = open("countries.json")
# parse JSON to Python datastructures
countries = json.load(f)
# now you have a list of countries
print(type(countries))
# loop through list of countries
for country in countries:
# you can access languages with country["languages"]; JSON objects are Python dictionaries now
print(type(country))
for language in country["languages"]:
print(language)
f.close()
Expected output:
<class 'list'>
<class 'dict'>
Pashto
Uzbek
Turkmen
...
You can use the json built-in package to deserialize the content of that file.
A sample of usage
data = """[
{
"name": "Afghanistan",
"capital": "Kabul",
"languages": [
"Pashto",
"Uzbek",
"Turkmen"
],
"population": 27657145,
"flag": "https://restcountries.eu/data/afg.svg",
"currency": "Afghan afghani"
},
{
"name": "Ă…land Islands",
"capital": "Mariehamn",
"languages": [
"Swedish"
],
"population": 28875,
"flag": "https://restcountries.eu/data/ala.svg",
"currency": "Euro"
}]"""
# deserializing
print(json.loads(data))
For more complex content have a look to the JSONDecoder.
doc
EDIT:
import json
path = # my file
with open(path, 'r') as fd:
# iterate over the dictionaries
for d in json.loads(fd.read()):
print(d['languages'])
EDIT: extra - top 10 languages
import json
import itertools as it
path = # path to file
with open(path, 'r') as fd:
text = fd.read()
languages_from_file = list(it.chain(*(d['languages'] for d in json.loads(text))))
# get unique "list" of languages
languages_all = set(languages_from_file)
# count the repeated languages
languages_count = {l: languages_from_file.count(l) for l in languages_all}
# order them per descending value
top_ten_languages = sorted(languages_count.items(), key=lambda k: k[1], reverse=True)[:10]
print(top_ten_languages)
Example: Desired output
{
"id": "",
"data": {
"package": ""
}
}
Here is the little script I have put together
import pandas as pd
df=pd.read_csv('example.csv')
df1=df[['request','text']]
dfnew=df1.rename(columns={'request':'id','text':'package'})
with open('something.json','w') as f:
f.write(dfnew.to_json(orient='records',lines=True))
Output I receive after running the script
{"id":"","package":}
I'll start with a mock dfnew since the code above it does not affect your problem.
If Pandas does not have a built-in method to export exactly what you want, you can manually manipulate the JSON before dumping it to file:
import json
dfnew = pd.DataFrame({
'id': [''],
'package': ['']
})
with open('something.json', 'w') as f:
jsonString = dfnew.to_json(orient='records', lines=True)
jsonObject = json.loads(jsonString)
package = jsonObject.pop('package')
jsonObject['data'] = {
'package': package
}
json.dump(jsonObject, f, indent=4)
I am trying to covert my CSV email list to a JSON format to mass email via API. This is my code thus far but am having trouble with the output. Nothing is outputting on my VS code editor.
import csv
import json
def make_json(csvFilePath, jsonFilePath):
data = {}
with open(csvFilePath, encoding='utf-8') as csvf:
csvReader = csv.DictReader(csvf)
for rows in csvReader:
key = rows['No']
data[key] = rows
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonf.write(json.dumps(data, indent=4))
csvFilePath = r'/data/csv-leads.csv'
jsonFilePath = r'Names.json'
make_json(csvFilePath, jsonFilePath)
Here is my desired JSON format
{
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"Name": "Youngstown Coffee",
"ConsentToTrack": "Yes"
},
Heres my CSV list
No,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen & Bakery,catering#zylberschtein.com,Yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,Yes
It looks like you could use a csv.DictReader to make this easier.
If I have data.csv that looks like this:
Name,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen,catering#zylberschtein.com,yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,yes
I can convert it into JSON like this:
>>> import csv
>>> import json
>>> fd = open('data.csv')
>>> reader = csv.DictReader(fd)
>>> print(json.dumps(list(reader), indent=2))
[
{
"Name": "Zylberschtein's Delicatessen",
"EmailAddress": "catering#zylberschtein.com",
"ConsentToTrack": "yes"
},
{
"Name": "Youngstown Coffee",
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"ConsentToTrack": "yes"
}
]
Here I've assumed the headers in the CSV can be used verbatim. I'll update this with an exmaple if you need to modify key names (e.g. convert "No" to "Name"),.
If you need to rename a column, it might look more like this:
import csv
import json
with open('data.csv') as fd:
reader = csv.DictReader(fd)
data = []
for row in reader:
row['Name'] = row.pop('No')
data.append(row)
print(json.dumps(data, indent=2))
Given this input:
No,EmailAddress,ConsentToTrack
Zylberschtein's Delicatessen,catering#zylberschtein.com,yes
Youngstown Coffee,hello#youngstowncoffeeseattle.com,yes
This will output:
[
{
"EmailAddress": "catering#zylberschtein.com",
"ConsentToTrack": "yes",
"Name": "Zylberschtein's Delicatessen"
},
{
"EmailAddress": "hello#youngstowncoffeeseattle.com",
"ConsentToTrack": "yes",
"Name": "Youngstown Coffee"
}
]
and to print on my editor is it simply print(json.dumps(list(reader), indent=2))?
I'm not really familiar with your editor; print is how you generate console output in Python.
I'm new to python.
I'm trying to extract data from data.json file.
How can i get "Files_Names" and "project_name"?
Also, how to manipulate data, "XX\XX\X" is extra string.
desire output:
File_Names = ih/1/2/3.java
ihh/11/22/33.java.java
Project_name = android/hello
File_Names = hi/1/2/3.java
hih/11/22/33.java.java
Project_name = android/helloworld
data.json
{
"changed": [
{
"prev_revision": "a09936ea19ddc9f69ed00a7929ea81234af82b95",
"added_commits": [
{
"Lines_Deleted": 28,
"File_Names": [
"1\t3\tih/1/2/3.java",
"1\t1\tihh/11/22/33.java.java"
],
"Files_Modified": 8,
"Lines_Inserted": 90
}
],
"project_name": "android/hello"
},
{
"prev_revision": "a09936ea19ddc9f69ed00a7929ea81234af82b95",
"added_commits": [
{
"Lines_Deleted": 28,
"File_Names": [
"14\t3\thi/1/2/3.java",
"1\t1\thih/11/22/33.java.java"
],
"Files_Modified": 8,
"Lines_Inserted": 90
}
],
"project_name": "android/helloworld"
}
]
}
import json then use json.load(open('data.json')) to read the file. It will be loaded as a nested hierarchy of python objects (dictionaries, lists, ints, strings, floats) which you can parse accordingly.
Here's something to spark your imagination and communicate the concept.
import json
x = json.load(open('data.json'))
for sub_dict in x['changed']:
print('project_name', sub_dict['project_name'])
for entry in sub_dict['added_commits']:
print (entry['File_Names'])
You can use this approach
import json
with open('data.json') as json_file:
data = json.loads(json_file)
for item in data['changed']:
print(item['project_name'], item['added_commits']['File_Names'])
You can use something like this with json module
import json
f = open("file_name.json", "r")
data = f.read()
jsondata = json.loads(data)
print jsondata # all json file
print jsondata["changed"] # list after dictionary
print jsondata["changed"][0] # This will get you all you have in the first occurence within changed
f.close()
From here you can take it further with whatever elements you want from the json.