So recently I've been using REPL as python code source, but whenever I'm offline, any information stored in the JSON File is rolled back after a bit of time. Now I know this is a REPL specific problem after doing some research, but is there any way I can fix this? My code itself is quite a few lines long, so I would rather not want to use a completely different storage method.
To successfully store data in json files in replit.com, it's important to load and dump it the correct way.
An example of storing data in json files:
with open("sample.json", "r") as file:
sample = json.load(file)
sample["item"] = "Value"
with open("sample.json", "w") as file:
json.dump(sample, file)
Let me know if you've already followed these steps.
Related
I have a series of python scripts doing Oracle SQL database checks and outputting the results into individual json files per output. That's all working fine, albeit there is probably better ways to do it.
Then using the following which I found on here in merging the json files into a single json:
import json
import glob
result = []
for f in glob.glob("*.json"):
with open(f, "rb") as infile:
result.append(json.load(infile))
with open("merged_file.json", "w") as outfile:
json.dump(result, outfile)
This works roughly how I wanted the output to be, the only issue I'm facing is the "names" of the result sets (I'm sure my lack of terminology knowledge is what's made this hard).
To visualize what I mean, this is the output result without the data (I realise this isn't the correct output format, this is from a gui viewer of the file to better see formatting):
[]JSON
{}0
{}RMTID
{}FLIGHTS
{}HOURLY
{}WEATHER
{}CALIBRATION
{}GAPS
{}1
{}RMTID
{}FLIGHTS
{}HOURLY
{}WEATHER
{}CALIBRATION
{}GAPS
I'm looking to have the 0 and 1 be the labels of the result set, so that the output would look like:
[]JSON
{}LAX
{}RMTID
{}FLIGHTS
{}HOURLY
{}WEATHER
{}CALIBRATION
{}GAPS
{}LGW
{}RMTID
{}FLIGHTS
{}HOURLY
{}WEATHER
{}CALIBRATION
{}GAPS
Is that possible with Python? Or should I be looking at alternative solutions?
I've seen other suggestions for similar questions that suggest just taking the outputs directly into one file rather then merging multiple files, however the results are from different database connections and finding a solution for "queuing" database connections and storing outputs has been above my skill level.
Thanks!
I don't need the entire code but I want a push to help me on the way, I've been searching on the internet for clues on how to start to write a function like this but I haven't gotten any further then just the name of the function.
So I haven't got the slightest clue on how to start with this, I don't know how to work with text files. Any tips?
These text files are CSV (Comma Separated Values). It is a simple file format used to store tabular data.
You may explore Python's inbuilt module called csv.
Following code snippet an example to load .csv file in Python:
import csv
filename = 'us_population.csv'
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile)
Update: Sorry it seems my question wasn't asked properly. So I am analyzing a transportation network consisting of more than 5000 links. All the data included in a big CSV file. I have several JSON files which each consist of subset of this network. I am trying to loop through all the JSON files INDIVIDUALLY (i.e. not trying to concatenate or something), read the JSON file, extract the information from the CVS file, perform calculation, and save the information along with the name of file in new dataframe. Something like this:
enter image description here
This is the code I wrote, but not sure if it's efficient enough.
name=[]
percent_of_truck=[]
path_to_json = \\directory
import glob
z= glob.glob(os.path.join(path_to_json, '*.json'))
for i in z:
with open(i, 'r') as myfile:
l=json.load(myfile)
name.append(i)
d_2019= final.loc[final['LINK_ID'].isin(l)] #retreive data from main CSV file
avg_m=(d_2019['AADTT16']/d_2019['AADT16']*d_2019['Length']).sum()/d_2019['Length'].sum() #calculation
percent_of_truck.append(avg_m)
f=pd.DataFrame()
f['Name']=name
f['% of truck']=percent_of_truck
I'm assuming here you just want a dictionary of all the JSON. If so, use the JSON library ( import JSON). If so, this code may be of use:
import json
def importSomeJSONFile(f):
return json.load(open(f))
# make sure the file exists in the same directory
example = importSomeJSONFile("example.json")
print(example)
#access a value within this , replacing key with what you want like "name"
print(JSON_imported[key])
Since you haven't added any Schema or any other specific requirements.
You can follow this approach to solve your problem, in any language you prefer
Get Directory of the JsonFiles, which needs to be read
Get List of all files present in directory
For each file-name returned in Step2.
Read File
Parse Json from String
Perform required calculation
I am trying to use "requests" package and retrieve info from Github, like the Requests doc page explains:
import requests
r = requests.get('https://api.github.com/events')
And this:
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size):
fd.write(chunk)
I have to say I don't understand the second code block.
filename - in what form do I provide the path to the file if created? where will it be saved if not?
'wb' - what is this variable? (shouldn't second parameter be 'mode'?)
following two lines probably iterate over data retrieved with request and write to the file
Python docs explanation also not helping much.
EDIT: What I am trying to do:
use Requests to connect to an API (Github and later Facebook GraphAPI)
retrieve data into a variable
write this into a file (later, as I get more familiar with Python, into my local MySQL database)
Filename
When using open the path is relative to your current directory. So if you said open('file.txt','w') it would create a new file named file.txt in whatever folder your python script is in. You can also specify an absolute path, for example /home/user/file.txt in linux. If a file by the name 'file.txt' already exists, the contents will be completely overwritten.
Mode
The 'wb' option is indeed the mode. The 'w' means write and the 'b' means bytes. You use 'w' when you want to write (rather than read) froma file, and you use 'b' for binary files (rather than text files). It is actually a little odd to use 'b' in this case, as the content you are writing is a text file. Specifying 'w' would work just as well here. Read more on the modes in the docs for open.
The Loop
This part is using the iter_content method from requests, which is intended for use with large files that you may not want in memory all at once. This is unnecessary in this case, since the page in question is only 89 KB. See the requests library docs for more info.
Conclusion
The example you are looking at is meant to handle the most general case, in which the remote file might be binary and too big to be in memory. However, we can make your code more readable and easy to understand if you are only accessing small webpages containing text:
import requests
r = requests.get('https://api.github.com/events')
with open('events.txt','w') as fd:
fd.write(r.text)
filename is a string of the path you want to save it at. It accepts either local or absolute path, so you can just have filename = 'example.html'
wb stands for WRITE & BYTES, learn more here
The for loop goes over the entire returned content (in chunks incase it is too large for proper memory handling), and then writes them until there are no more. Useful for large files, but for a single webpage you could just do:
# just W becase we are not writing as bytes anymore, just text.
with open(filename, 'w') as fd:
fd.write(r.content)
I am usint the 64-bit version of Enthought Python to process data across multiple HDF5 files. I'm using h5py version 1.3.1 (HDF5 1.8.4) on 64-bit Windows.
I have an object that provides a convenient interface to my specific data heirarchy, but testing the h5py.File(fname, 'r') independently yields the same results. I am iterating through a long list (~100 files at a time) and attempting to pull out specific pieces of information from the files. The problem I'm having is that I'm getting the same information out of several files! My loop looks something like:
files = glob(r'path\*.h5')
out_csv = csv.writer(open('output_file.csv', 'rb'))
for filename in files:
handle = hdf5.File(filename, 'r')
data = extract_data_from_handle(handle)
for row in data:
out_csv.writerow((filename, ) +row)
When I inspect the files using something like hdfview, I know the internals are different. However, the csv I get seems to indicate that all the files contain the same data. Has anyone seen this behavior before? Any suggestions where I could go to start debugging this issue?
I've concluded that this is a strange manifestation of Perplexing assignment behavior with h5py object as instance variable . I re-wrote my code so that each file is handled within a function call and the variable is not reused. Using this approach, I don't see the same strange behavior and it seems to work much better. For clarity, the solution looks more like:
files = glob(r'path\*.h5')
out_csv = csv.writer(open('output_file.csv', 'rb'))
def extract_data_from_filename(filename):
return extract_data_from_handle(hdf5.File(filename, 'r'))
for filename in files:
data = extract_data_from_filename(filename)
for row in data:
out_csv.writerow((filename, ) +row)