I'm attempting to convert a JSON file to an SQLite or CSV file so that I can manipulate the data with python. Here is where the data is housed: JSON File.
I found a few converters online, but those couldn't handle the quite large JSON file I was working with. I tried using a python module called sqlbiter but again, like the others, was never really able to output or convert the file.
I'm not. sure where to go now, if anyone has any recommendations or insights on how to get this data into a database, I'd really appreciate it.
Thanks in advance!
EDIT: I'm not looking for anyone to do it for me, I just need to be pointed in the right direction. Are there other methods I haven't tried that I could learn?
You can utilize pandas module for this data processing task as follows:
First, you need to read the JSON file using with, open and json.load.
Second, you need to change the format of your file a bit by changing the large dictionary that has a main key for every airport into a list of dictionaries instead.
Third, you can now utilize some pandas magic to convert your list of dictionaries into a DataFrame using pd.DataFrame(data=list_of_dicts).
Finally, you can utilize pandas's to_csv function to write your DataFrame as a CSV file into disk.
It would look something like this:
import pandas as pd
import json
with open('./airports.json.txt','r') as f:
j = json.load(f)
l = list(j.values())
df = pd.DataFrame(data=l)
df.to_csv('./airports.csv', index=False)
You need to load your json file and parse it to have all the fields available, or load the contents to a dictionary, then you could using pyodbc to write to the database these fields, or write them to the csv if you use import csv first.
But this is just a general idea. You need to study python and how to do every step.
For instance for writting to the database you could do something like:
for i in range(0,max_len):
sql_order = "UPDATE MYTABLE SET MYTABLE.MYFIELD ...."
cursor1.execute(sql_order)
cursor1.commit()
Related
I've been trying to normalize a JSON file and wanted a python(pandas) or pyspark script the more generic as possible that can extract data from a very nested mongodb JSON - it comes from a third party API and saved in MongoDB - and return it in a relational dataset so we can consume it from the datalake.
There are a lot of records and fields, so we can't do it in only one dataframe. Also, the layout does not have a pattern.
Could you please help us?
What is the best way to do this in best practices and, if possible, recursively?
Below is a chunk of the json file
https://raw.githubusercontent.com/migueelcruz/sample_json/main/sample.json
We expect multiple dataframes that link each other so we can consume data like a relational database. Also, the files must be like a database table.
Thanks a lot for your help!
A way to approach this problem would be using the json module to deserialize the data into a python dictionary.
# Get the data
import urllib.request as urllib
link = "https://raw.githubusercontent.com/migueelcruz/sample_json/main/sample.json"
f = urllib.urlopen(link)
myfile = f.read()
# Deserialize
import json
data = json.loads(myfile)
data
Now the way you would get the data is using python dictionaries syntax.
i.e if you want to get eventos which is under dados which is under eventos would be:
data["dados"]["nfe"]["eventos"]
I have a CSV file, diseases_matrix_KNN.csv which has excel table.
Now, I would like to store all the numbers from the row like:
Hypothermia = [0,-1,0,0,0,0,0,0,0,0,0,0,0,0]
For some reason, I am unable to find a solution to this. Even though I have looked. Please let me know if I can read this type of data in the chosen form, using Python please.
most common way to work with excel is use Pandas.
Here is example:
import pandas as pd
df = pd.read_excel(filename)
print (df.iloc['Hypothermia']). # gives you such result
I struggled with the following for a couple of hours yesterday. I figured out a workaround, but I'd like to understand a little more of what's going on in the background and, ideally, I'd like to remove the intermediate file from my code just for the sake of elegance. I'm using python, by the way and files_df starts off as a pandas dataframe.
Can you help me understand why the following code gives me an error.
files_json = files_df.to_json(orient='records')
for file_json in files_json:
print(file_json) #do stuff
But this code works?
files_json = files_df.to_json(orient='records')
with open('export_json.json', 'w') as f:
f.write(files_json)
with open('export_json.json') as data:
files_json = json.load(data)
for file_json in files_json:
print(file_json) #do stuff
Obviously, the export/import is converting the data somehow into a usable format. I would like to understand that a little better and know if there is some option within the pandas files_df.to_json command to perform the same conversion.
json.load is the opposite of json.dump, but you export from pandas data frames into file and than import again with standard library into some sort of python structure.
Try files_df.to_dict
I am working on django project.where user can upload a csv file and stored into database.Most of the csv file i saw 1st row contain header and then under the values but my case my header presents on column.like this(my csv data)
I did not understand how to save this type of data on my django model.
You can transpose your data. I think it is more appropriate for your dataset in order to do real analysis. Usually things such as id values would be the row index and the names such company_id, company_name, etc would be the columns. This will allow you to do further analysis (mean, std, variances, ptc_change, group_by) and use pandas at its fullest. Thus said:
import pandas as pd
df = pd.read_csv('yourcsvfile.csv')
df2 = df.T
Also, as #H.E. Lee pointed out. In order to save your model to your database, you can either use the method to_sql in your dataframe to save in mysql (e.g. your connection), if you're using mongodb you can use to_json and then import the data, or you can manually set your function transformation to your database.
You can flip it with the built-in CSV module quite easily, no need for cumbersome modules like pandas (which in turn requires NumPy...)... Since you didn't define the Python version you're using, and this procedure differs slightly between the versions, I'll assume Python 3.x:
import csv
# open("file.csv", "rb") in Python 2.x
with open("file.csv", "r", newline="") as f: # open the file for reading
data = list(map(list, zip(*csv.reader(f)))) # read the CSV and flip it
If you're using Python 2.x you should also use itertools.izip() instead of zip() and you don't have to turn the map() output into a list (it already is).
Also, if the rows are uneven in your CSV you might want to use itertools.zip_longest() (itertools.izip_longest() in Python 2.x) instead.
Either way, this will give you a 2D list data where the first element is your header and the rest of them are the related data. What you plan to do from there depends purely on your DB... If you want to deal with the data only, just skip the first element of data when iterating and you're done.
Given your data it may be best to store each row as a string entry using TextField. That way you can be sure not to lose any structure going forward.
Is there any way you can write value to specific place in given .csv file using pandas or csv module?
I have tried using csv_reader to read the file and find a line which fits my requirements though I couldn't figure out a way to switch value which is in the file to mine.
What I am trying to achieve here is that I have a spreadsheet of names and values. I am using JSON to update the values from the server and after that I want to update my spreadsheet also.
The latest solution which I came up with was to create separate sheet from which I will get updated data, but this one is not working, though there is no sequence in which the dict is written to the file.
def updateSheet(fileName, aValues):
with open(fileName+".csv") as workingSheet:
writer = csv.DictWriter(workingSheet,aValues.keys())
writer.writeheader()
writer.writerow(aValues)
I will appreciate any guidance and tips.
You can try this way to operate the specified csv file
import pandas as pd
a = ['one','two','three']
b = [1,2,3]
english_column = pd.Series(a, name='english')
number_column = pd.Series(b, name='number')
predictions = pd.concat([english_column, number_column], axis=1)
save = pd.DataFrame({'english':a,'number':b})
save.to_csv('b.csv',index=False,sep=',')