I have different pandas dataframes, which I put in a list.
I want to save this list in json (or any other format) which can be read by R.
import pandas as pd
def create_df_predictions(extra_periods):
"""
make a empty df for predictions
params: extra_periods = how many prediction in the future the user wants
"""
df = pd.DataFrame({ 'model': ['a'], 'name_id': ['a'] })
for col in range(1, extra_periods+1):
name_col = 'forecast' + str(col)
df[name_col] = 0
return df
df1 = create_df_predictions(9)
df2 = create_df_predictions(12)
list_df = [df1, df2]
The question is how to save list_df in a readable format for R? Note that df1 and df2 are have a different amount of columns!
don't know panda DataFrames in detail, so maybe this won't work. But in case it is kind of a traditional dict, you should be able to use the json module.
df1 = create_df_predictions(9)
df2 = create_df_predictions(12)
list_df = [df1, df2]
You can write it to a file, using json.dumps(list_df), which will convert your list of dicts to a valid json representation.
import json
with open("my_file", 'w') as outfile:
outfile.write(json.dumps(list_df))
Edit: as commented by DaveR dataframes are't serializiable. You can convert them to a dict and then dump the list to json.
import json
with open("my_file", 'w') as outfile:
outfile.write(json.dumps([df.to_dict() for df in list_df]))
Alternatively pd.DataFrame and pd.Series have a to_json() method, maybe have a look at those as well.
To export the list of DataFrames to a single json file, you should convert the list into a DataFrame and then use the to_json() function as shown below:
df_to_export = pd.DataFrame(list_df)
json_output = df_to_export.to_json()
with open("output.txt", 'w') as outfile:
outfile.write(json_output)
This will export the full dataset to a single json string and export it to a file.
Related
I have json data which is in the structure below:
{"Text1": 4, "Text2": 1, "TextN": 123}
I want to read the json file and make a dataframe such as
Each key value pairs will be a row in the dataframe and I need to need headers "Sentence" and "Label". I tried with using lines = True but it returns all the key-value pairs in one row.
data_df = pd.read_json(PATH_TO_DATA, lines = True)
What is the correct way to load such json data?
you can use:
with open('json_example.json') as json_data:
data = json.load(json_data)
df=pd.DataFrame.from_dict(data,orient='index').reset_index().rename(columns={'index':'Sentence',0:'Label'})
Easy way that I remember
import pandas as pd
import json
with open("./data.json", "r") as f:
data = json.load(f)
df = pd.DataFrame({"Sentence": data.keys(), "Label": data.values()})
With read_json
To read straight from the file using read_json, you can use something like:
pd.read_json("./data.json", lines=True)\
.T\
.reset_index()\
.rename(columns={"index": "Sentence", 0: "Labels"})
Explanation
A little dirty but as you probably noticed, lines=True isn't completely sufficient so the above transposes the result so that you have
(index)
0
Text1
4
Text2
1
TextN
123
So then resetting the index moves the index over to be a column named "index" and then renaming the columns.
I would like to create a pandas dataframe out of a list variable.
With pd.DataFrame() I am not able to declare delimiter which leads to just one column per list entry.
If I use pd.read_csv() instead, I of course receive the following error
ValueError: Invalid file path or buffer object type: <class 'list'>
If there a way to use pd.read_csv() with my list and not first save the list to a csv and read the csv file in a second step?
I also tried pd.read_table() which also need a file or buffer object.
Example data (seperated by tab stops):
Col1 Col2 Col3
12 Info1 34.1
15 Info4 674.1
test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]
Current workaround:
with open(f'{filepath}tmp.csv', 'w', encoding='UTF8') as f:
[f.write(line + "\n") for line in consolidated_file]
df = pd.read_csv(f'{filepath}tmp.csv', sep='\t', index_col=1 )
import pandas as pd
df = pd.DataFrame([x.split('\t') for x in test])
print(df)
and you want header as your first row then
df.columns = df.iloc[0]
df = df[1:]
It seems simpler to convert it to nested list like in other answer
import pandas as pd
test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]
data = [line.split('\t') for line in test]
df = pd.DataFrame(data[1:], columns=data[0])
but you can also convert it back to single string (or get it directly form file on socket/network as single string) and then you can use io.BytesIO or io.StringIO to simulate file in memory.
import pandas as pd
import io
test = ["Col1\tCol2\tCol3", "12\tInfo1\t34.1","15\tInfo4\t674.1"]
single_string = "\n".join(test)
file_like_object = io.StringIO(single_string)
df = pd.read_csv(file_like_object, sep='\t')
or shorter
df = pd.read_csv(io.StringIO("\n".join(test)), sep='\t')
This method is popular when you get data from network (socket, web API) as single string or data.
I'm writing a very small Pandas dataframe to a JSON file. In fact, the Dataframe has only one row with two columns.
To build the dataframe:
import pandas as pd
df = pd.DataFrame.from_dict(dict({'date': '2020-10-05', 'ppm': 411.1}), orient='index').T
print(df)
prints
date ppm
0 2020-10-05 411.1
The desired json output is as follows:
{
"date": "2020-10-05",
"ppm": 411.1
}
but when writing the json with pandas, I can only print it as an array with one element, like so:
[
{
"date":"2020-10-05",
"ppm":411.1
}
]
I've currently hacked my code to convert the Dataframe to a dict, and then use the json module to write the file.
import json
data = df.to_dict(orient='records')
data = data[0] # keep the only element
with open('data.json', 'w') as fp:
json.dump(data, fp, indent=2)
Is there a native way with pandas' .to_json() to keep the only dictionary item if there is only one?
I am currently using .to_json() like this, which incorrectly prints the array with one dictionary item.
df.to_json('data.json', orient='index', indent = 2)
Python 3.8.6
Pandas 1.1.3
If you want to export only one row, use iloc:
print (df.iloc[0].to_dict())
#{'date': '2020-10-05', 'ppm': 411.1}
I am trying to convert a pandas DataFrame to JSON file. Following image shows my data:
Screenshot of the dataset from Ms. excel
I am using the following code:
import pandas as pd
os.chdir("G:\\My Drive\\LEC dashboard\\EnergyPlus simulation files\\DEC\\Ahmedabad\\Adaptive set point\\CSV")
df = pd.read_csv('Adap_40-_0_0.1_1.5_0.6.csv')
df2 = df.filter(like = '[C](Hourly)',axis =1)
df3 = df.filter(like = '[C](Hourly:ON)',axis =1)
df4 = df.filter(like = '[%](Hourly)',axis =1)
df5 = df.filter(like = '[%](Hourly:ON)',axis =1)
df6 = pd.concat([df2,df3,df4,df5],axis=1)
df6.to_json("123.json",orient='columns')
I the output, I am getting a dictionary in of values. However, I need a list as value.
The output I am getting: The JSON output I am getting by using above code
The out put that is desired: The output that is desired.
I have tried different orientations of json but nothing works.
There might be other ways of doing this but one way is this.
import json
test = pd.DataFrame({'a':[1,2,3,4,5,6]})
with open('test.json', 'w') as f:
json.dump(test.to_dict(orient='list'), f)
Result file will look like this '{"a": [1, 2, 3, 4, 5, 6]}'
There is a built-in function of pandas called to_json:
df.to_json(r'Path_to_file\file_name.json')
Take a look at the documentation if you need more specifics: https://pandas.pydata.org/pandas-docs/version/0.24/reference/api/pandas.DataFrame.to_json.html
I have a CSV file with 100K+ lines of data in this format:
"{'foo':'bar' , 'foo1':'bar1', 'foo3':'bar3'}"
"{'foo':'bar' , 'foo1':'bar1', 'foo4':'bar4'}"
The quotes are there before the curly braces because my data came in a CSV file.
I want to extract the key value pairs in all the lines to create a dataframe like so:
Column Headers: foo, foo1, foo3, foo...
Rows: bar, bar1, bar3, bar...
I've tried implementing something similar to what's explained here ( Python: error parsing strings from text file with Ast module).
I've gotten the ast.literal_eval function to work on my file to convert the contents into a dict but now how do I get the DataFrame function to work? I am very much a beginner so any help would be appreciated.
import pandas as pd
import ast
with open('file_name.csv') as f:
for string in f:
parsed = ast.literal_eval(string.rstrip())
print(parsed)
pd.DataFrame(???)
You can turn a dictionary into a pandas dataframe using pd.DataFrame.from_dict, but it will expect each value in the dictionary to be in a list.
for key, value in parsed.items():
parsed[key] = [value]
df = pd.DataFrame.from_dict(parsed)
You can do this iteratively by appending to your dataframe.
df = pd.DataFrame()
for string in f:
parsed = ast.literal_eval(string.rstrip())
for key, value in parsed.items():
parsed[key] = [value]
df.append(pd.DataFrame.from_dict(parsed))
parsed is a dictionary, you make a dataframe from it, then join all the frames together:
df = []
with open('file_name.csv') as f:
for string in f:
parsed = ast.literal_eval(string.rstrip())
if type(parsed) != dict:
continue
subDF = pd.DataFrame(parsed, index=[0])
df.append(subDF)
df = pd.concat(df, ignore_index=True, sort=False)
Calling pd.concat on a list of dataframes is faster than calling DataFrame.append repeatedly. sort=False means that pd.concat will not sort the column names when it encounters a few one, like foo4 on the second row.