Pandas to_json in separate lines - python

How can I get the json format from pandas, where each rows are separated with new line. For example if I have a dataframe like:
import pandas as pd
data = [{'a': 1, 'b': 2},
{'a': 3, 'b': 4}]
df = pd.DataFrame(data)
print("First:\n", df.to_json(orient="records"))
print("Second:\n", df.to_json(orient="records", lines=True))
Output:
First:
[{"a":1,"b":2},{"a":3,"b":4}]
Second:
{"a":1,"b":2}
{"a":3,"b":4}
But I really want an output like so:
[{"a":1,"b":2},
{"a":3,"b":4}]
or
[
{"a":1,"b":2},
{"a":3,"b":4}
]
I really just want each line to be separated by new line, but still a valid json format that can be read. I know I can use to_json with lines=True and just split by new line then .join, but wondering if there is a more straight forward/faster solution just using pandas.

Here you go:
import json
list(json.loads(df.to_json(orient="index")).values())

Use indent parameter
import pandas as pd
data = [{'a': 1, 'b': 2},
{'a': 3, 'b': 4}]
df = pd.DataFrame(data)
print(df.to_json(orient="records", indent=1))
#output :
[
{
"a":1,
"b":2
},
{
"a":3,
"b":4
}
]

Why don't you just add the brackets:
print(f"First:\n[{df.to_json(orient='records', lines=True)}]")
print(f"Second:\n[\n{df.to_json(orient='records', lines=True)}\n]")

Related

ParserError: unable to convert txt file to df due to json format and delimiter being the same

Im fairly new dealing with .txt files that has a dictionary within it. Im trying to pd.read_csv and create a dataframe in pandas.I get thrown an error of Error tokenizing data. C error: Expected 4 fields in line 2, saw 11. I belive I found the root problem which is the file is difficult to read because each row contains a dict, whose key-value pairs are separated by commas in this case is the delimiter.
Data (store.txt)
id,name,storeid,report
11,JohnSmith,3221-123-555,{"Source":"online","FileFormat":0,"Isonline":true,"comment":"NAN","itemtrack":"110", "info": {"haircolor":"black", "age":53}, "itemsboughtid":[],"stolenitem":[{"item":"candy","code":1},{"item":"candy","code":1}]}
35,BillyDan,3221-123-555,{"Source":"letter","FileFormat":0,"Isonline":false,"comment":"this is the best store, hands down and i will surely be back...","itemtrack":"110", "info": {"haircolor":"black", "age":21},"itemsboughtid":[1,42,465,5],"stolenitem":[{"item":"shoe","code":2}]}
64,NickWalker,3221-123-555, {"Source":"letter","FileFormat":0,"Isonline":false, "comment":"we need this area to be fixed, so much stuff is everywhere and i do not like this one bit at all, never again...","itemtrack":"110", "info": {"haircolor":"red", "age":22},"itemsboughtid":[1,2],"stolenitem":[{"item":"sweater","code":11},{"item":"mask","code":221},{"item":"jack,jill","code":001}]}
How would I read this csv file and create new columns based on the key-values. In addition, what if there are more key-value in other data... for example > 11 keys within the dictionary.
Is there a an efficient way of create a df from the example above?
My code when trying to read as csv##
df = pd.read_csv('store.txt', header=None)
I tried to import json and user a converter but it do not work and converted all the commas to a |
`
import json
df = pd.read_csv('store.txt', converters={'report': json.loads}, header=0, sep="|")
In addition I also tried to use:
`
import pandas as pd
import json
df=pd.read_csv('store.txt', converters={'report':json.loads}, header=0, quotechar="'")
I also was thinking to add a quote at the begining of the dictionary and at the end to make it a string but thought that was too tedious to find the closing brackets.
I think adding quotes around the dictionaries is the right approach. You can use regex to do so and use a different quote character than " (I used § in my example):
from io import StringIO
import re
import json
with open("store.txt", "r") as f:
csv_content = re.sub(r"(\{.*})", r"§\1§", f.read())
df = pd.read_csv(StringIO(csv_content), skipinitialspace=True, quotechar="§", engine="python")
df_out = pd.concat([
df[["id", "name", "storeid"]],
pd.DataFrame(df["report"].apply(lambda x: json.loads(x)).values.tolist())
], axis=1)
print(df_out)
Note: the very last value in your csv isn't valid json: "code":001. It should either be "code":"001" or "code":1
Output:
id name storeid Source ... itemtrack info itemsboughtid stolenitem
0 11 JohnSmith 3221-123-555 online ... 110 {'haircolor': 'black', 'age': 53} [] [{'item': 'candy', 'code': 1}, {'item': 'candy...
1 35 BillyDan 3221-123-555 letter ... 110 {'haircolor': 'black', 'age': 21} [1, 42, 465, 5] [{'item': 'shoe', 'code': 2}]
2 64 NickWalker 3221-123-555 letter ... 110 {'haircolor': 'red', 'age': 22} [1, 2] [{'item': 'sweater', 'code': 11}, {'item': 'ma...

creating an altair chart based on a dictionary : issue passing pandas dataframe

I'm trying to generate different altair charts programmatically.
I will base those different charts setups on dictionaries with alt.Chart.from_dict().
I've reverse engineered the overall configuration of the charts with an existing chart doing chart.to_dict(), but this method serializes the data into json, whereas my data is hosted in pandas dataframes and I'm struggling to find the right syntax in the dictionary to pass the dataframe.
I've tried a few variations of the below :
d_chart_config = {
"data": df, #or df.to_dict()
"config": {
"view": {"continuousWidth": 400, "continuousHeight": 300},
"title": {"anchor": "start", "color": "#4b5c65", "fontSize": 20},
},
"mark": {"type": "bar", "size": 40},
....}
but haven't managed to figure out how or where to insert the dataframe in the dictionary, either as a dataframe directly or as a df.to_dict()
please help if you've managed something similar.
The pure pandas way to generate a Vega-Lite data field is {"values": df.to_dict(orient="records")}, but this has problems in some cases (namely handling of datetimes, categoricals, and non-standard numeric & string types).
Altair has utilities to work around these issues that you can use directly, namely the altair.utils.data.to_values function.
For example:
import pandas as pd
from altair.utils.data import to_values
df = pd.DataFrame({'a': [1, 2, 3], 'b': pd.date_range('2012', freq='Y', periods=3)})
print(to_values(df))
# {'values': [{'a': 1, 'b': '2012-12-31T00:00:00'},
# {'a': 2, 'b': '2013-12-31T00:00:00'},
# {'a': 3, 'b': '2014-12-31T00:00:00'}]}
You can use this directly within a dictionary containing a vega-lite specification and generate a valid chart:
alt.Chart.from_dict({
"data": to_values(df),
"mark": "bar",
"encoding": {
"x": {"field": "a", "type": "quantitative"},
"y": {"field": "b", "type": "ordinal", "timeUnit": "year"},
}
})

Python list of dictionaries aggregate values

Here is an example input:
[{'name':'susan', 'wins': 1, 'team': 'team1'}
{'name':'jack', 'wins':1, 'team':'team2'}
{'name':'susan', 'wins':1, 'team':'team1'}]
Desired output
[{'name':'susan', 'wins':2, 'team': 'team1'}
{'name':'jack', 'wins':1, 'team':'team2'}]
I have lots of the dictionaries and want to only add, the 'win' value, based on the 'name' value,
and keep the 'team' values
I've tried to use Counter, but the result was
{'name':'all the names added toghther',
'wins': 'all the wins added toghther'
}
I was able to use defaultdict which seemed to work
result = defaultdict(int)
for d in data:
result[d['name']] += d['wins'])
but the results was something like
{'susan': 2, 'jack':1}
Here it added the values correctly but didn't keep the 'team' key
I guess I'm confused about defaultdict and how it works.
any help very appreciated.
Did you consider using pandas?
import pandas as pd
dicts = [
{'name':'susan', 'wins': 1, 'team': 'team1'},
{'name':'jack', 'wins':1, 'team':'team2'},
{'name':'susan', 'wins':1, 'team':'team1'},
]
agg_by = ["name", "team"]
df = pd.DataFrame(dicts)
df = df.groupby(agg_by)["wins"].apply(sum)
df = df.reset_index()
aggregated_dict = df.to_dict("records")

Dataframe to json using python

I a dataframe of below format
I want to send each row separately as below:
{ 'timestamp': 'A'
'tags': {
'columnA': '1',
'columnB': '11',
'columnC': '21'
.
.
.
.}}
The columns vary and I cannot hard code it. Then Send it to firestore collection
Then second row in above format to firestore collection and so on
How can I do this?
and don't mark the question as duplicate without comparing questions
I am not clear on the firebase part, but I think this might be what you want
import json
import pandas as pd
# Data frame to work with
x = pd.DataFrame(data={'timestamp': 'A', 'ca': 1, 'cb': 2, 'cc': 3}, index=[0])
x = x.append(x, ignore_index=True)
# rearranging
x = x[['timestamp', 'ca', 'cb', 'cc']]
def new_json(row):
return json.dumps(
dict(timestamp=row['timestamp'], tag=dict(zip(row.index[1:], row[row.index[1:]].values.tolist()))))
print x.apply(new_json, raw=False, axis=1)
Output
Output is a pandas series with each entry being a str in the json format as needed
0 '{"timestamp": "A", "tag": {"cc": 3, "cb": 2, "ca": 1}}'
1 '{"timestamp": "A", "tag": {"cc": 3, "cb": 2, "ca": 1}}'

Python DataFrames to JSon

I want to store my Python script output to the JSon format below :
[{
"ids": "1",
"result1": ["1","2","3","4"],
"result2": ["4","5","6","1"]
},
{
"ids": "2",
"result1": ["3","4"],
"result2": ["4","5","6","1"]
}]
My code is as follows
for i in df.id.unique():
ids = i
results1 = someFunction(i)
results2 = someFunction2(i)
df_result_int = ["ids : %s" %ids , "results1 : %s" %results1, "results2 : %s" %results2]
df_results.append(df_result_int)
jsonData = json.dumps(df_results)
with open('JSONData.json', 'w') as f:
json.dump(jsonData, f)
someFunction() and someFunction2() return a list.
Thank you in advance.
You should not manually transform your lists to string, json.dumps does it for you. Use dictionaries instead. Here is an example:
df_results = []
results1 = [1,1]
results2 = [2,2]
df_result_int = {"ids" : 1, "results1" : results1, "results2" : results2}
df_results.append(df_result_int)
json.dumps(df_results)
This will result in:
[{"results2": [2, 2], "ids": 1, "results1": [1, 1]}]
There is a method for pandas dataframes that allows you to dump the dataframe directly to a json file. Here the link to the documentation:
to_json
You could use something like
df.reset_index().to_json(orient='records')
where df is the dataframe on which you have done some sort of manipulation before (for example the functions you wish to apply, etc.) and it depends on the info you want in the json file and how is arranged.

Categories