Convert pkl file to json file - python

I'm new on stack-overflow.
I'm trying to convert pkl file into json file using python. Below is my sample code
import pickle
import pandas as pd
# Load pickle file
input_file = open('file.pkl', 'rb')
new_dict = pickle.load(input_file)
input_file()
# Create a Pandas DataFrame
data_frame = pd.DataFrame(new_dict)
# Copy DataFrame index as a column
data_frame['index'] = data_frame.index
# Move the new index column to the from of the DataFrame
index = data_frame['index']
data_frame.drop(labels=['index'], axis=1, inplace = True)
data_frame.insert(0, 'index', index)
# Convert to json values
json_data_frame = data_frame.to_json(orient='values', date_format='iso', date_unit='s')
with open('data.json', 'w') as js_file:
js_file.write(json_data_frame)
When I run this code I got error that TypeError: '_io.TextIOWrapper' object is not callable. By following some same issues This one and This one, these issues suggested to use write method with input_file() at line 7 but still I'm getting this error io.UnsupportedOperation: write which is probably a writing method but I'm using it with reading and for reading I'm unable to fine any method.
I also tried to read pickle file in following way
with open ('file.pkl', 'rb') as input_file:
new_dict = pickle.load(input_file)
and I'm getting this error
DataFrame constructor not properly called!.
I need some kind suggestions that how I can solve this problem?
Any suggestions about other tools which can perform this task, will be appreciable. Thanks

Related

convert excel to json file in python

I am new here , need some help with writing to json file:
I have a dataframe with below values, which is created by reading a excel file
need to write this to json file with object as column dtls
Output :
A similar task is considered in the question:
Converting Excel into JSON using Python
Different approaches are possible to solve this problem.
I hope, it works for your solution.
import pandas as pd
import json
df = pd.read_excel('./TfidfVectorizer_sklearn.xlsx')
df.to_json('new_file1.json', orient='records') # excel to json
# read json and then append details to it
with open('./new_file1.json', 'r') as json_file:
a = {}
data = json.load(json_file)
a['details'] = data
# write new json with details in it
with open("./new_file1.json", "w") as jsonFile:
json.dump(a, jsonFile)
JSON Output:

How to open .ndjson file in Python?

I have .ndjson file that has 20GB that I want to open with Python. File is to big so I found a way to split it into 50 peaces with one online tool. This is the tool: https://pinetools.com/split-files
Now I get one file, that has extension .ndjson.000 (and I do not know what is that)
I'm trying to open it as json or as a csv file, to read it in pandas but it does not work.
Do you have any idea how to solve this?
import json
import pandas as pd
First approach:
df = pd.read_json('dump.ndjson.000', lines=True)
Error: ValueError: Unmatched ''"' when when decoding 'string'
Second approach:
with open('dump.ndjson.000', 'r') as f:
my_data = f.read()
print(my_data)
Error: json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 104925061 (char 104925060)
I think the problem is that I have some emojis in my file, so I do not know how to encode them?
ndjson is now supported out of the box with argument lines=True
import pandas as pd
df = pd.read_json('/path/to/records.ndjson', lines=True)
df.to_json('/path/to/export.ndjson', lines=True)
I think the pandas.read_json cannot handle ndjson correctly.
According to this issue you can do sth. like this to read it.
import ujson as json
import pandas as pd
records = map(json.loads, open('/path/to/records.ndjson'))
df = pd.DataFrame.from_records(records)
P.S: All credits for this code go to KristianHolsheimer from the Github Issue
The ndjson (newline delimited) json is a json-lines format, that is, each line is a json. It is ideal for a dataset lacking rigid structure ('non-sql') where the file size is large enough to warrant multiple files.
You can use pandas:
import pandas as pd
data = pd.read_json('dump.ndjson.000', lines=True)
In case your json strings do not contain newlines, you can alternatively use:
import json
with open("dump.ndjson.000") as f:
data = [json.loads(l) for l in f.readlines()]

"No columns to parse from file" when reading in dictionary

I'm trying to take a dictionary object in python, write it out to a csv file, and then read it back in from that csv file.
But it's not working. When I try to read it back in, it gives me the following error:
EmptyDataError: No columns to parse from file
I don't understand this for two reasons. Firstly, if I used pandas very own to_csv method, it should
be giving me the correct format for a csv. Secondly, when I print out the header values (by doing this : print(df.columns.values) ) of the dataframe that I'm trying to save, it says I do in fact have headers ("one" and "two".) So if the object I was sending out had column names, I don't know why they wouldn't be found when I'm trying to read it back.
import pandas as pd
testing = {"one":1,"two":2 }
df = pd.DataFrame(testing, index=[0])
file = open('testing.csv','w')
df.to_csv(file)
new_df = pd.read_csv("testing.csv")
What am I doing wrong?
Thanks in advance for the help!
The default pandas.DataFrame.to_csv takes a path and not an text io. Just remove the file declaration and directly use the path, pass index = False to skip indexes.
import pandas as pd
testing = {"one":1,"two":2 }
df = pd.DataFrame(testing, index=[0])
df.to_csv('testing.csv', index = False)
new_df = pd.read_csv("testing.csv")

Writing value to given filed in csv file using pandas or csv module

Is there any way you can write value to specific place in given .csv file using pandas or csv module?
I have tried using csv_reader to read the file and find a line which fits my requirements though I couldn't figure out a way to switch value which is in the file to mine.
What I am trying to achieve here is that I have a spreadsheet of names and values. I am using JSON to update the values from the server and after that I want to update my spreadsheet also.
The latest solution which I came up with was to create separate sheet from which I will get updated data, but this one is not working, though there is no sequence in which the dict is written to the file.
def updateSheet(fileName, aValues):
with open(fileName+".csv") as workingSheet:
writer = csv.DictWriter(workingSheet,aValues.keys())
writer.writeheader()
writer.writerow(aValues)
I will appreciate any guidance and tips.
You can try this way to operate the specified csv file
import pandas as pd
a = ['one','two','three']
b = [1,2,3]
english_column = pd.Series(a, name='english')
number_column = pd.Series(b, name='number')
predictions = pd.concat([english_column, number_column], axis=1)
save = pd.DataFrame({'english':a,'number':b})
save.to_csv('b.csv',index=False,sep=',')

Reading an excel data set saved as CSV file in pandas

There is a very similar question to the one I am about to ask posted here:
Reading an Excel file in python using pandas
Except when I attempt to use the solutions posted here I am countered with
AttributeError: 'DataFrame' object has no attribute 'read'
All I want to do is convert this excel sheet into the pandas format so that I can preform data analysis on some of the subjects of my table. I am super new to this so any information, advice, feedback or whatever that anybody could toss my way would be greatly appreciated.
Heres my code:
import pandas
file = pandas.read_csv('FILENAME.csv', 'rb')
# reads specified file name from my computer in Pandas format
print file.read()
By the way, I also tried running the same query with
file = pandas.read_excel('FILENAME.csv', 'rb') returning the same error.
Finally, when I try to resave the file as a .xlsx I am unable to open the document.
Cheers!
read_csv() return a dataframe by itself so there is no need to convert it, just save it into dataframe.
I think this should work
import pandas as pd #It is best practice to import package with as a short name. Makes it easier to reference later.
file = pd.read_csv('FILENAME.csv')
print (file)
Your error message means exactly what it says: AttributeError: 'DataFrame' object has no attribute 'read'
When you use pandas.read_csv you're actually reading the csv file into a dataframe. BTW, you don't need the 'rb'
df = pandas.read_csv('FILENAME.csv')
You can print (df) but you can not do print(df.read()) because the dataframe object doesn't have a .read() attribute. This is what's causing your error.

Categories