JSON to CSV with Leading Zeros - python

I'm writing a code to convert JSON to CSV; where i need to retain the leading zeros
I have the file emp.json which has numeric values in tag. eg: 000, 001, etc along with other tags.
import pandas as pd
df = pd.read_json('emp.json')
df.to_csv('test1.csv', index= False)
I get the CSV file but the leading zeros in column are removed.

Convert the data type to be string
import pandas as pd
df = pd.read_json('emp.json',dtype=str)
df.to_csv('test1.csv', index= False)
Another way to do it
import json
import pandas as pd
jsondata = '[{"Code":"001","Description":"Afghanistan"},{"Code":"002","Description":"Albania"}]'
jdata = json.loads(jsondata)
df = pd.DataFrame(jdata)
print (df.T)
df.to_csv('test1.csv', index= False)
Code:https://repl.it/repls/BurdensomeCompassionateCommercialsoftware

Maybe have a dtype argument being object:
import pandas as pd
df = pd.read_json('emp.json',dtype=object)
df.to_csv('test1.csv', index= False)
object is just a synonym of str,
Or you can use str:
import pandas as pd
df = pd.read_json('emp.json',dtype=str)
df.to_csv('test1.csv', index= False)

Related

print rows from read_csv object with conditions

Trying to print records from a .csv file with 2 conditions based on 2 columns (Georgraphy and Comment). It works when I put one condition on Geography column but does not work when I put conditions on Geography and Comments columns. Is this a syntax mistake? Thanks!
Works fine:
import pandas as pd
dt = pd.read_csv("data.csv", low_memory=False)
print(dt)
print(list(dt))
geo_ont = dt[dt.Geography=="Ontario"]
print(geo_ont)
Does not Work:
import pandas as pd
dt = pd.read_csv("data.csv", low_memory=False)
print(dt)
print(list(dt))
geo_ont = dt[dt.Geography=="Ontario" & dt.Comment=="TRUE"]
print(geo_ont)
I believe the comment column is Boolean. So, Either you convert comment column to string or just use it as '1' or True. Here is the code,
import pandas as pd
dt = pd.read_csv("test.csv", low_memory=False)
print(dt.Comment.dtype)
geo_ont = dt[(dt.Geography=="Ontario") & (dt.Comment)]
#OR
#geo_ont = dt[(dt.Geography=="Ontario") & (dt.Comment==True)]
#OR
#geo_ont = dt[(dt.Geography=="Ontario") & (dt.Comment==1)]
print(geo_ont)

Read json with meaningless keys into pandas data.frame with correct Dtype

In a project, I receive json that I need to read into a pandas data.frame.
The format looks like the one below (with more columns and rows):
{ "a;b;c;d":{
"1":"100;14/09/2020;0.5;XK3",
"2":"NA;17/09/2020;0.95;NA",
"3":"102;NA;NA;KZ2"}}
I'm able to split the strings, but my types are not what I want. Is there an automated way to convert the columns in u?
from io import StringIO
import pandas as pd
TESTDATA = StringIO("""
{ "a;b;c;d":{
"1":"100;14/09/2020;0.5;XK3",
"2":"NA;17/09/2020;0.95;NA",
"3":"102;NA;NA;KZ2"}}
""")
df = pd.read_json(TESTDATA)
df.head(10)
vnames = df.columns[0].split(';')
u = (df[df.columns[0]].str.split(';', expand=True)
.set_axis(vnames, axis=1, inplace=False)).convert_dtypes()
print(u.head(10))
print(u.info())
I want the Dtype to be int64, datetime64, float64, str.
You could do the following:
from io import StringIO
import pandas as pd
import numpy as np
TESTDATA = StringIO("""
{ "a;b;c;d":{
"1":"100;14/09/2020;0.5;XK3",
"2":"NA;17/09/2020;0.95;NA",
"3":"102;NA;NA;KZ2"}}
""")
df = pd.read_json(TESTDATA)
df.head(10)
vnames = df.columns[0].split(';')
u = (df[df.columns[0]].str.split(';', expand=True)
.set_axis(vnames, axis=1, inplace=False))
u = u.apply(lambda x: x.str.strip()).replace('NA', np.nan)
u = u.to_json()
u = pd.read_json(u).convert_dtypes()
print(u.head(10))
print(u.info())
Try explicitly typecasting the string values before creating the DataFrame, like in this example:
import json
import pandas as pd
s_src = '''{ "a;b;c;d":{
"1":"100;14/09/2020;0.5;XK3",
"2":"NA;17/09/2020;0.95;NA",
"3":"102;NA;NA;KZ2"}}'''
s = json.loads(s_src)
# per-column type conversion
typeconv = [int, pd.to_datetime, float, str]
for k1, subd in s.items():
cols = k1.split(';')
rows = []
for k, v in subd.items():
row = v.split(';')
conv_row =[]
for cvt, r in zip(typeconv, row):
# screen for missing values
if r == 'NA':
conv_row.append(None)
else:
# apply the conversion function for this column
conv_row.append(cvt(r))
rows.append(conv_row)
df = pd.DataFrame(rows, columns=cols)

Pandas cuts off empty columns from csv file

I have the csv file that have columns with no content just headers. And I want them to be included to resulting DataFrame but pandas cuts them off by default. Is there any way to solve this by using read_csv not read_excell?
IIUC, you need header=None:
from io import StringIO
import pandas as pd
data = """
not_header_1,not_header_2
"""
df = pd.read_csv(StringIO(data), sep=',')
print(df)
OUTPUT:
Empty DataFrame
Columns: [not_header_1, not_header_2]
Index: []
Now, with header=None
df = pd.read_csv(StringIO(data), sep=',', header=None)
print(df)
OUTPUT:
0 1
0 not_header_1 not_header_2

Formatting of JSON file

Can we convert the highlighted INTEGER values to STRING value (refer below link)?
https://i.stack.imgur.com/3JbLQ.png
CODE
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
import pandas as pd
df = pd.read_csv ('newsample2.csv')
df.to_json('myjson2.json', indent=4)
print(df)
Try doing something like this.
import pandas as pd
filename = "newsample2.csv"
jsonFileName = "myjson2.json"
df = pd.read_csv ('newsample2.csv')
df['index'] = df.index
df.to_json('myjson2.json', indent=4)
print(df)
This will take indices of your data and store them in the index column, so they will become a part of your data.

Unable to convert JSON file to CSV using Python

I was trying to convert the below JSON file into a csv file.
JSON file
[{
"SubmitID":1, "Worksheet":3, "UserID":65,
"Q1":"395",
"Q2":"2178",
"Q3":"2699",
"Q4":"1494"},{
"SubmitID":2, "Worksheet":3, "UserID":65,
"Q4":"1394"},{
"SubmitID":3, "Worksheet":4, "UserID":65,
"Q1":"1629",
"Q2":"1950",
"Q3":"0117",
"Q4":"1816",
"Empty":" "}]
However, my Python code below gives the error message "TypeError: Expected String or Unicode". May I know how should I modify my program to make it work?
import json
import pandas as pd
f2 = open('temp.json')
useful_input = json.load(f2)
df=pd.read_json(useful_input)
print(df)
df.to_csv('results.csv')
You just need to pass the address string to pd.read_json():
df=pd.read_json("temp.json")
You have not to use json module:
Try:
import pandas as pd
df=pd.read_json("temp.json")
print(df)
df.to_csv('results.csv')
import pandas as pd
df = pd.read_json('data.json')
df.to_csv('data.csv', index=False, columns=['title', 'subtitle', 'date', 'description'])
import pandas as pd
df = pd.read_csv("data.csv")
df = df[df.columns[:4]]
df.dropna(how='all')
df.to_json('data.json', orient='records')

Categories