How to extract information from multiple json files using python - python

I have multiple json files in a folder. I had implemented a way to catch only .json files in this folder. My problem is: I need to extract some information contained in each of those files but it didn't work the way I expected. I need to find a way to get this information and convert all into a pandas dataframe.
The variable jsons_data contains all .json files
jsons_data = json_import(path_to_json)
for index, js in enumerate(jsons_data):
with open(os.path.join(path_to_json, js)) as json_file:
data = json.loads(json_file)
print(data)

The problem in your code is that on every iteration you override the content of data.
I assume you want to create on big dataframe from all the files in that case you can do -
dataframes = []
for js in jsons_data:
dataframes.append(pd.read_json(os.path.join(path_to_json, js)))
df = pd.concat(dataframes)
See documentation about read_json -
https://pandas.pydata.org/pandas-docs/version/1.3.0rc1/reference/api/pandas.read_json.html

Related

How to save dataframes that are stored inside a list as separate csv files

I have a list (named df_split) that stores 576 data frames, and each data frame has 50 rows.
I want to iterate through each data frame and save it as a separate CSV file inside a folder.
I tried the following code but it only saved the last data frame as a CSV file inside the location that I specified.
In this case, I assume I should have also coded the file name for each data frame to be something like file1.csv, file2.csv, etc., but my skills aren't enough.
Can somebody kindly suggest some example solutions?
Here is the code that I tried:
for i in df_split: i.to_csv('./file.csv')
Use enumerate for counter for new file names:
for i, df in enumerate(df_split, 1):
df.to_csv(f'file_{i}.csv')
You could pass the i into the filename as well -
for i in df_split:
i.to_csv(f'file_{i}.csv')

Passing multiple JSON filenames to pandas and adding to a dataframe

I'm working on the following code, where I am passing filenames from a variable "filelist", then performing some pandas dataframe procedures including obtaining data and filtering columns.
filelist=filestring
li=[]
merged_df=[]
for file in filelist:
# temp df
df=pd.read_json(file)
# filter columns
df2=pd.DataFrame(df[['a','e']])
#find data from each JSON
epf=df['s']['d']['Pf']
ep=pd.json_normalize(epf, record_path=['u','D'], meta=['l','dn'])
#append data from each JSON and add to li_dframe
li.append(ep)
frame=pd.concat(li,axis=0,ignore_index=True)
#merge filtered columns and li_dframe
merged_df=pd.concat([df2,frame.reset_index(drop=True)],axis=1).ffill()
print(merged_df)
merged_df.head(n=40)
Where I am having trouble is in the final merged data frame, I can only see data from one of the underlying Json files - the second in the list. This is strange as the print function, shows that the code has correctly handled both JSON and extracted the required values. Can anyone advise what I am missing here to correctly ensure the merged_df has all the necessary info?

How do I export JSON data to CSV using python?

I'm building a site that, based on a user's input, sorts through JSON data and prints a schedule for them into an html table. I want to give it the functionality that once the their table is created they can export the data to a CSV/Excel file so we don't have to store their credentials (logins & schedules in a database). Is this possible? If so, how can I do it using python preferably?
This is not the exact answer but rather steps for you to follow in order to get a solution:
1 Read data from json. some_dict = json.loads(json_string)
2 Appropriate code to get the data from dictionary (sort/ conditions etc) and get that data in a 2D array (list)
3 Save that list as csv: https://realpython.com/python-csv/
I'm pretty lazy and like to utilize pandas for things like this. It would be something along the lines of
import pandas as pd
file = 'data.json'
with open(file) as j:
json_data = json.load(j)
df = pd.DataFrame.from_dict(j, orient='index')
df.to_csv("data.csv")

How do I convert several large text files into one CSV file if they are too large to be converted individually?

I have several large .text files that I want to consolidate into one .csv file. However, each of the files is to large to import into Excel on its own, let alone all together.
I want to create a use pandas to analyze the data, but don't know how to get the files all in one place.
How would I go about reading the data directly into Python, or into Excel for a .csv file?
The data in question is the 2019-2020 Contributions by individuals file on the FEC's website.
You can convert each of the files to csv and the concatenate them to fom one final csv file
import pandas as pd
csv_path = 'pathtonewcsvfolder' # use your path
all_files=os.listdir("path/to/textfiles")
x=0
for filename in all_files:
df = pd.read_fwf(filename)
df.to_csv(os.path.join(csv_path,'log'+str(x)+'.csv'))
x+=1
all_csv_files = glob.iglob(os.path.join(csv_path, "*.csv"))
converted_df=pd.concat((pd.read_csv(f) for f in all_csv_files), ignore_index=True)
converted_df.to_csv('converted.csv')

I have a few JSON files that are empty and are giving an exception when I try to loop through them. How do I make this work?

I'm doing some research on Cambridge Analytica and wanted to have as much news articles as I can from some news outlets.
I was able to scrape them and now have a bunch of JSON files in a folder.
Some of them have only this [] written in them while others have the data I need.
Using pandas I used the following and got every webTitle in the file.
df = pd.read_json(json_file)
df['webTitle']
The thing is that whenever there's an empty file it won't even let me assign df['webTitle'] to a variable.
Is there a way for me to check if it is empty and if it is just go to the next file?
I want to make this into a spreadsheet with a few of the keys and columns and the values as rows for each news article.
My files are organized by day and I've used TheGuardian API to get the data.
I did not write much yet but just in case here's the code as it is:
import pandas as pd
import os
def makePathToFile(path):
pathtoJson = []
for root,sub,filename in os.walk(path):
for i in filename:
pathToJson.append(os.path.join(path, i))
return pathToJson
def readJsonAndWriteCSV (pathToJson):
for json_file in pathToJson:
df = pd.read_json(json_file)
Thanks!
You can set up a google Alert for the news keywords you want, then scrape the results in python using https://pypi.org/project/galerts/

Categories