I have a huge txt file and I need to put it on DynamoDB.
the file struct is:
223344|blue and orange|Red|16/12/2022
223344|blue and orange|Red|16/12/2022
...
This file has more than 200M lines
I have tried to convert it on json file using this code bellow:
import json
with open('mini_data.txt', 'r') as f_in:
for line in f_in:
line = line.strip().split('|')
filename = 'smini_final_data.json'
result = {"fild1": line[0], "field2": line[1], "field3": str(line[2]).replace(" ",""),"field4":line[3]}
with open(filename, "r") as file:
data = json.load(file)
data.append(result)
with open(filename, "w") as file:
json.dump(data, file)
But this isn't efficient and it's only the first part of the job ( convert data to Json ), after this I need put the Json in dynamoDB.
I have used this code (it's look good):
def insert(self):
if not self.dynamodb:
self.dynamodb = boto3.resource(
'dynamodb', endpoint_url="http://localhost:8000")
table = self.dynamodb.Table('fruits')
json_file = open("final_data.json")
orange = json.load(json_file, parse_float = decimal.Decimal)
with table.batch_writer() as batch:
for fruit in orange:
fild1 = fruit['fild1']
fild2 = fruit['fild2']
fild3= fruit['fild3']
fild4 = fruit['fild4']
batch.put_item(
Item={
'fild1':fild1,
'fild2':fild2,
'fild3':fild3,
'fild4':fild4
}
)
So, does anyone, have some suggestions to process this txt more efficiently?
Thanks
The step of converting from delimited text to JSON seems unnecessary in this case. The way you've written it requires reopening and rewriting the JSON file for each line of your delimited text file. That I/O overhead repeated 200M times can really slow things down.
I suggest going straight from your delimited text to DynamoDB. It might look something like this:
dynamodb = boto3.resource(
'dynamodb', endpoint_url="http://localhost:8000")
table = self.dynamodb.Table('fruits')
with table.batch_writer() as batch:
with open('mini_data.txt', 'r') as f_in:
for line in f_in:
line = line.strip().split('|')
batch.put_item(
Item={
'fild1':line[0],
'fild2':line[1],
'fild3':str(line[2]).replace(" ",""),
'fild4':line[3]
}
)
My program takes a csv file as input and writes it as an output file in json format. On the final line, I use the print command to output the contents of the json format file to the screen. However, it does not print out the json file contents and I don't understand why.
Here is my code that I have so far:
import csv
import json
def jsonformat(infile,outfile):
contents = {}
csvfile = open(infile, 'r')
reader = csvfile.read()
for m in reader:
key = m['No']
contents[key] = m
jsonfile = open(outfile, 'w')
jsonfile.write(json.dumps(contents))
csvfile.close()
jsonfile.close()
return jsonfile
infile = 'orders.csv'
outfile = 'orders.json'
output = jsonformat(infile,outfile)
print(output)
Your function returns the jsonfile variable, which is a file.
Try adding this:
jsonfile.close()
with open(outfile, 'r') as file:
return file.read()
Your function returns a file handle to the file jsonfile that you then print. Instead, return the contents that you wrote to that file. Since you opened the file in w mode, any previous contents are removed before writing the new contents, so the contents of your file are going to be whatever you just wrote to it.
In your function, do:
def jsonformat(infile,outfile):
...
# Instead of this:
# jsonfile.write(json.dumps(contents))
# do this:
json_contents = json.dumps(contents, indent=4) # indent=4 to pretty-print
jsonfile.write(json_contents)
...
return json_contents
Aside from that, you aren't reading the CSV file the correct way. If your file has a header, you can use csv.DictReader to read each row as a dictionary. Then, you'll be able to use for m in reader: key = m['No']. Change reader = csvfile.read() to reader = csv.DictReader(csvfile)
As of now, reader is a string that contains all the contents of your file. for m in reader makes m each character in this string, and you cannot access the "No" key on a character.
a_file = open("sample.json", "r")
a_json = json.load(a_file)
pretty_json = json.dumps(a_json, indent=4)
a_file.close()
print(pretty_json)
Using this sample to print the contents of your json file. Have a good day.
I am working on a project where I need to use US States Zip Code Data. I want to merge two geojson files while preserving the data in those files. geojson-merge https://github.com/mapbox/geojson-merge does this but I am hoping for a python based solution.
Each state has a separate *.json file. For example:
mt_montana_zip_codes_geo.min.json
nd_north_dakota_zip_codes_geo.min.json
import json
nd_boundary_file = r"C:\Data_ZipCodes_States\State-zip-code-GeoJSON-master" \
r"\nd_north_dakota_zip_codes_geo.min.json"
with open(nd_boundary_file, 'r') as f:
nd_zipcode_boundary = json.load(f)
mt_boundary_file = r"C:\\Data_ZipCodes_States\State-zip-code-GeoJSON-master" \
r"\mt_montana_zip_codes_geo.min.json"
with open(mt_boundary_file, 'r') as f:
mt_zipcode_boundary = json.load(f)
#This overwrote the mt_zipcode_boundary with the nd_zipcode_boundary into merged
#merged = {**mt_zipcode_boundary, **nd_zipcode_boundary}
#produced a file with two json objects one 'mt' and the other 'nd'
data = {'mt': mt_zipcode_boundary, 'nd':nd_zipcode_boundary}
#Also overwrote mt_zipcode_boundary
mt_zipcode_boundary.update(nd_zipcode_boundary)
How would I write code to combine these two geojson files into a single file?
What about something like this?
import json
fc = {
'type': 'FeatureCollection',
'features': []
}
with open("mt_montana_zip_codes_geo.min.json") as json_file:
obj = json.load(json_file)
fc['features'].extend(obj['features'])
with open("nd_north_dakota_zip_codes_geo.min.json") as json_file:
obj = json.load(json_file)
fc['features'].extend(obj['features'])
with open("merged.json", "w") as outfile:
json.dump(fc, outfile)
I have this simple python code that converts a json file into a csv.
I would like to convert only the first four values of each key, but i couldn't figure out how to do it.
import json
import csv
# Opening JSON file and loading the data
# into the variable data
with open('personas.json') as json_file:
data = json.load(json_file)
employee_data = data['emp_details']
# now we will open a file for writing
data_file = open('data_file.csv', 'w')
# create the csv writer object
csv_writer = csv.writer(data_file)
# Counter variable used for writing
# headers to the CSV file
count = 0
for emp in employee_data:
if count == 0:
# Writing headers of CSV file
header = emp.keys()
csv_writer.writerow(header)
count += 1
# Writing data of CSV file
csv_writer.writerow(emp.values())
data_file.close()
Here is an example of json file's format
{"emp_details":[
{
"DATAID":"6908443",
"FIRST_NAME":"Fernando",
"SECOND_NAME":"Fabbiano",
"THIRD_NAME":"Agustin",
"FOURTH_NAME":"",
"AGE": "21",
"EMAIL": "fer.fab#gmail.com"
}
]}
And as i said, i would like to convert only the fields DATAID, FIRSTNAME, SECONDNAME, THIRD NAME.
Hi I am trying to take the data from a json file and insert and id then perform POST REST.
my file data.json has:
{
'name':'myname'
}
and I would like to add an id so that the json data looks like:
{
'id': 134,
'name': 'myname'
}
So I tried:
import json
f = open("data.json","r")
data = f.read()
jsonObj = json.loads(data)
I can't get to load the json format file.
What should I do so that I can convert the json file into json object and add another id value.
Set item using data['id'] = ....
import json
with open('data.json', 'r+') as f:
data = json.load(f)
data['id'] = 134 # <--- add `id` value.
f.seek(0) # <--- should reset file position to the beginning.
json.dump(data, f, indent=4)
f.truncate() # remove remaining part
falsetru's solution is nice, but has a little bug:
Suppose original 'id' length was larger than 5 characters. When we then dump with the new 'id' (134 with only 3 characters) the length of the string being written from position 0 in file is shorter than the original length. Extra chars (such as '}') left in file from the original content.
I solved that by replacing the original file.
import json
import os
filename = 'data.json'
with open(filename, 'r') as f:
data = json.load(f)
data['id'] = 134 # <--- add `id` value.
os.remove(filename)
with open(filename, 'w') as f:
json.dump(data, f, indent=4)
I would like to present a modified version of Vadim's solution. It helps to deal with asynchronous requests to write/modify json file. I know it wasn't a part of the original question but might be helpful for others.
In case of asynchronous file modification os.remove(filename) will raise FileNotFoundError if requests emerge frequently. To overcome this problem you can create temporary file with modified content and then rename it simultaneously replacing old version. This solution works fine both for synchronous and asynchronous cases.
import os, json, uuid
filename = 'data.json'
with open(filename, 'r') as f:
data = json.load(f)
data['id'] = 134 # <--- add `id` value.
# add, remove, modify content
# create randomly named temporary file to avoid
# interference with other thread/asynchronous request
tempfile = os.path.join(os.path.dirname(filename), str(uuid.uuid4()))
with open(tempfile, 'w') as f:
json.dump(data, f, indent=4)
# rename temporary file replacing old file
os.rename(tempfile, filename)
There is really quite a number of ways to do this and all of the above are in one way or another valid approaches... Let me add a straightforward proposition. So assuming your current existing json file looks is this....
{
"name":"myname"
}
And you want to bring in this new json content (adding key "id")
{
"id": "134",
"name": "myname"
}
My approach has always been to keep the code extremely readable with easily traceable logic. So first, we read the entire existing json file into memory, assuming you are very well aware of your json's existing key(s).
import json
# first, get the absolute path to json file
PATH_TO_JSON = 'data.json' # assuming same directory (but you can work your magic here with os.)
# read existing json to memory. you do this to preserve whatever existing data.
with open(PATH_TO_JSON,'r') as jsonfile:
json_content = json.load(jsonfile) # this is now in memory! you can use it outside 'open'
Next, we use the 'with open()' syntax again, with the 'w' option. 'w' is a write mode which lets us edit and write new information to the file. Here s the catch that works for us ::: any existing json with the same target write name will be erased automatically.
So what we can do now, is simply write to the same filename with the new data
# add the id key-value pair (rmbr that it already has the "name" key value)
json_content["id"] = "134"
with open(PATH_TO_JSON,'w') as jsonfile:
json.dump(json_content, jsonfile, indent=4) # you decide the indentation level
And there you go!
data.json should be good to go for an good old POST request
try this script:
with open("data.json") as f:
data = json.load(f)
data["id"] = 134
json.dump(data, open("data.json", "w"), indent = 4)
the result is:
{
"name":"mynamme",
"id":134
}
Just the arrangement is different, You can solve the problem by converting the "data" type to a list, then arranging it as you wish, then returning it and saving the file, like that:
index_add = 0
with open("data.json") as f:
data = json.load(f)
data_li = [[k, v] for k, v in data.items()]
data_li.insert(index_add, ["id", 134])
data = {data_li[i][0]:data_li[i][1] for i in range(0, len(data_li))}
json.dump(data, open("data.json", "w"), indent = 4)
the result is:
{
"id":134,
"name":"myname"
}
you can add if condition in order not to repeat the key, just change it, like that:
index_add = 0
n_k = "id"
n_v = 134
with open("data.json") as f:
data = json.load(f)
if n_k in data:
data[n_k] = n_v
else:
data_li = [[k, v] for k, v in data.items()]
data_li.insert(index_add, [n_k, n_v])
data = {data_li[i][0]:data_li[i][1] for i in range(0, len(data_li))}
json.dump(data, open("data.json", "w"), indent = 4)
This implementation should suffice:
with open(jsonfile, 'r') as file:
data = json.load(file)
data[id] = value
with open(jsonfile, 'w') as file:
json.dump(data, file)
using context manager for the opening of the jsonfile.
data holds the updated object and dumped into the overwritten jsonfile in 'w' mode.
Not exactly your solution but might help some people solving this issue with keys.
I have list of files in folder, and i need to make Jason out of it with keys.
After many hours of trying the solution is simple.
Solution:
async def return_file_names():
dir_list = os.listdir("./tmp/")
json_dict = {"responseObj":[{"Key": dir_list.index(value),"Value": value} for value in dir_list]}
print(json_dict)
return(json_dict)
Response look like this:
{
"responseObj": [
{
"Key": 0,
"Value": "bottom_mask.GBS"
},
{
"Key": 1,
"Value": "bottom_copper.GBL"
},
{
"Key": 2,
"Value": "copper.GTL"
},
{
"Key": 3,
"Value": "soldermask.GTS"
},
{
"Key": 4,
"Value": "ncdrill.DRD"
},
{
"Key": 5,
"Value": "silkscreen.GTO"
}
]
}