How to merge multiple files in Python - python

Very simple question! I want to merge multiple JSON files and the
Check this out:
f1data = f2data = f3data = f4data = f5data = f6data = ""
with open('1.json') as f1:
f1data = f1.read()
with open('2.json') as f2:
f2data = f2.read()
with open('3.json') as f3:
f3data = f3.read()
with open('4.json') as f4:
f4data = f4.read()
with open('5.json') as f5:
f5data = f5.read()
with open('6.json') as f6:
f6data = f6.read()
f1data += "\n"
f1data += f2data += f3data += f4data += f5data += f6data
with open ('merged.json', 'a') as f3:
f3.write(f1data)
And the output should be like this:
[
{
"id": "1",
"name": "John",
},
{
"id": "2",
"name": "Tom",
}
]
The problem is that Visual Code Studio brings up a red line under:
f1data += f2data += f3data += f4data += f5data += f6data
I have no idea why! And the code can't run! there is no error so i can troubleshoot..Any advice?

There is several points to improve in this code:
You should consider doint it in a more "programatic" way:
if you declare a list with the names of the json files you want to access like this:
files_names = ["1","2","3","4","5","6"]
you can then do:
files_names = ["1","2","3","4","5","6"]
data = ""
for file_name in files_names :
with open(file_name+".json" ,"r") as file_handle :
temp_data = file_handle.read()
data = data + temp_data
with open ('merged.json', 'a') as file_handle :
file_handle.write(data)
which is more concise, more pythonic, and can be adapted easily if you ever need 7 json input files for example.
If your files are always 1,2,3,4, you can also just do the for loop iteration like this: knowing the highest json file number you want
max_file_name = 6
for file_name in range(1,max_file_name):#i added 1 as first arg of
#range, assuming your files naming start at 1 and not at 0
str(file_name) + ".json"
To be sure your Json is a valid json, you could use the json standard library. It will take a little bit more time though, as the file will be parsed instead of just dumped into another one, but if you do not have 100000 files to merge and you don't know for sure the code that creates your json files in the first plase is valid, you souldn't see the difference
To use it, just do
import json
max_file_name = 6
data = ""
for file_name in range(max_file_name):
with open(str(file_name) + ".json" ,"r") as file_handle :
temp_data = json.load(file_handle )
data = {**data , **temp_data}
# ** is used to "unload" every key value in a dict at runtime,
# as if you provided them one by one separated by comas :
# data["key1"],data["key2"]...
# doing so for both json objects and puttin them
# into a dictionnay is effectively just like merging them.
with open ('merged.json', 'a') as file_handle :
json.dump(data,file_handle)

You have several ways:
with open('a', 'w') as a, open('b', 'w') as b, ..:
do_something()
files_list = ['a', 'b', ..]
for file in files_list:
with open(file, 'w')...
Use contextlib.ExitStack
with ExitStack() as stack:
files = [stack.enter_context(open(fname)) for fname in filenames]
# Do something with "files"

The output to the 'merge' file won't be formatted exactly as you've specified but this shows the approach that I would personally use:-
import json
alist = []
with open('/Users/andy/merged.json', 'w') as outfile:
for k in range(6):
with open(f'/Users/andy/{k+1}.json') as infile:
alist.append(json.load(infile))
outfile.write(str(alist))

Related

python change the value of a specific line

I have a 1000 json files, I need to change the value of a specific line with numeric sequence in all files.
An example
the specific line is - "name": "carl 00",
I need it to be like following
File 1
"name": "carl 1",
File 1
"name": "carl 2",
File 3
"name": "carl 3",
What is the right script to achieve the above using python
This should do the trick. But you're not very clear about how the data is stored in the actual json file. I listed two different approaches. The first is to parse the json file into a python dict then manipulate the data and then turn it back into a character string and then save it. The second is what I think you mean by "line". You can split the file's character string into a list then change the line you want, and remake the full string again, then save it.
This also assumes your json files are in the same folder as the python script.
import os
import json
my_files = [name1, name2, name3, ...] # ['file_name.json', ...]
folder_path = os.path.dirname(__file__)
for i, name in enumerate(my_files):
path = f'{folder_path}/{name}'
with open(path, 'r') as f:
json_text = f.read()
# if you know the key(s) in the json file...
json_dict = json.loads(json_text)
json_dict['name'] = json_dict['name'].replace('00', str(i))
new_json_str = json.dumps(json_dict)
# if you know the line number in the file...
line_list = json_text.split('\n')
line_list[line_number - 1] = line_list[line_number - 1].replace('00', str(i))
new_json_str = '\n'.join(line_list)
with open(path, 'w') as f:
f.write(new_json_str)
Based on your edit, this is what you want:
import os
import json
my_files = [f'{i}.json' for i in range(1, 1001)]
folder_path = os.path.dirname(__file__) # put this .py file in same folder as json files
for i, name in enumerate(my_files):
path = f'{folder_path}/{name}'
with open(path, 'r') as f:
json_text = f.read()
json_dict = json.loads(json_text)
json_dict['name'] = f'carl {i}'
# include these lines if you want "symbol" and "subtitle" changed
json_dict['symbol'] = f'carl {i}'
json_dict['subtitle'] = f'carl {i}'
new_json_str = json.dumps(json_dict)
with open(path, 'w') as f:
f.write(new_json_str)
Without knowing more, the below loop will accomplish the posts requirements.
name = 'carl'
for i in range(0,1001):
print(f'name: {name} {i}')

While reading txt file lines. I can't "append list" or "update dictionary" why?

I have txt file like that:
"aroint" : "Lorem.",
"agama" : "Simply.",
"allantoidea" : "Dummy.",
"ampelopsis" : "Whiske"\"red.",
"zopilote" : "Vulture.\n\n",
"zooedendrium" : "Infusoria."
I tried to read the txt file, convert Python dictionary and then create the json file
import json
dictionary = {}
with open('/Users/stackoverflowlover/Desktop/source.txt', "r") as f:
for line in f:
s = (line.replace("\"", "").replace("\n\n", "").replace("\n", "").strip().split(":"))
xkey = (s[0])
xvalue = (s[-1])
zvalue = str(xvalue)
value = zvalue[:0] + zvalue[0 + 1:]
key = xkey.replace(' ', '', 1)
dict = {'key1': 'stackoverflow'}
dictadd={key:value}
(dict.update(dictadd))
dictionary_list = []
dictionary_list.append(key)
dictionary_list.append(value)
print(dictionary_list)
with open("/Users/stackoverflowlover/Desktop/test.json", 'w', encoding='utf8') as f3:
json.dump(dict, f3, ensure_ascii=False, indent=1)
print(json.dumps(dict, sort_keys=True, indent=4))
My output:
['zooedendrium', 'Infusoria.']
{
"key1": "stackoverflow",
"zooedendrium": "Infusoria."
}
When I try to read lines I can see all of them, but after I can see just the last lines dictionary.
How I can fix that?
Here you go. below logic will read you file and convert it to the JSON structure which you are looking as output
import re
import json
f = open("/Users/stackoverflowlover/Desktop/source.txt", "r")
data = {}
for line in f:
line = line.split(":")
line[0] = re.sub("[^\w]+", "", line[0])
line[1] = re.sub("[^\w]+", "", line[1])
data[line[0]] = line[1]
print(data)

Python readlines function not reading first line in file

I am trying to search through a list of files and extract the line start with "id'. This occurs for many times in each file and often in the first line of text in the file.
The code I have written so far works, however it seems to miss the first line in each file (the first occurrence of 'id').
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
for line in f:
lines = f.readlines()
a=0
while a < len(lines):
temp_array = lines[a].rstrip().split(",")
if temp_array[0] == "id":
game_id = temp_array[1]
Any suggestions on how I can include this first line of text in the readlines? I tried changing a to -1 so it would include the first line of text (where a=0) but this didn't work.
EDIT:
I need to keep 'a' in my code as an index because I use it later on. The code I showed above was truncated. Here is more of the code for example. Any suggestions on how else I can remove "for line in f:"?
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
for line in f:
lines = f.readlines()
a=0
while a < len(lines):
temp_array = lines[a].rstrip().split(",")
if temp_array[0] == "id":
game_id = temp_array[1]
for o in range(a+1,a+7,1):
if lines[o].rstrip().split(",")[1]== "visteam":
awayteam = lines[o].rstrip().split(",")[2]
if lines[o].rstrip().split(",")[1]== "hometeam":
hometeam = lines[o].rstrip().split(",")[2]
if lines[o].rstrip().split(",")[1]== "date":
date = lines[o].rstrip().split(",")[2]
if lines[o].rstrip().split(",")[1]== "site":
site = lines[o].rstrip().split(",")[2]
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
for line in f:
temp_array = line.rstrip().split(",")
if temp_array[0] == "id":
game_id = temp_array[1]
The above should work, it can also be made a bit faster as there is no need to create a list for each line:
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
for line in f:
if line.startswith("id,"):
temp_array = line.rstrip().split(",")
game_id = temp_array[1]
You can use enumerate to keep track of the current line number. Here is another way having seen your edit to the question;
for file2 in data_files2:
with open(file2, 'r') as f: # use context manager to open files
lines = f.readlines()
for n, line in enumerate(lines):
if line.startswith("id,"):
game_id = line.rstrip().split(",")[1]
for o in range(n + 1, n + 7):
linedata = lines[o].rstrip().split(",")
spec = linedata[1]
if spec == "visteam":
awayteam = linedata[2]
elif spec == "hometeam":
hometeam = linedata[2]
elif spec == "date":
date = linedata[2]
elif spec == "site":
site = linedata[2]
You should also consider using the csv library for working with csv files.

How to save a dictionary as a JSON file?

I have some invoice items:
lista_items = {}
lineNumber = 0
for line in self.invoice_line_ids:
lineNumber = lineNumber + 1
print lineNumber
lista_items["numeroLinea"] = [lineNumber]
lista_items["cantidad"] = [line.quantity]
lista_items["costo_total"] = [line.price_subtotal]
lista_items["precioUnitario"] = [line.price_unit]
lista_items["descripcion"] = [line.name]
# for line_tax in line.invoice_line_tax_ids:
# print line_tax.amount
# print line_tax.id
# # print line.invoice_line_tax_ids
return lista_items
I need to save the items in a dictionary and after that to save it to a JSON.
How can I do it?
You can use json.dump() to save a dictionary to a file. For example:
# note that output.json must already exist at this point
with open('output.json', 'w+') as f:
# this would place the entire output on one line
# use json.dump(lista_items, f, indent=4) to "pretty-print" with four spaces per indent
json.dump(lista_items, f)
In the following code just replace the variable d with your dictionary and put your filename in place of 'json_out'. Take note of the parameter w+, it opens the file both for reading and writing and overwrites the existing file if any.
Also note that there is also 'dumps' method in json which will give you string representation of the dict.
import json
d = {'x':2,'y':1}
out_file = open('json_out','w+')
json.dump(d,out_file)
just dump the lista_items in a json file like:
import json
lista_items = {}
lineNumber = 0
for line in self.invoice_line_ids:
lineNumber = lineNumber + 1
lista_items["numeroLinea"] = [lineNumber]
lista_items["cantidad"] = [line.quantity]
lista_items["costo_total"] = [line.price_subtotal]
lista_items["precioUnitario"] = [line.price_unit]
lista_items["descripcion"] = [line.name]
with open('file.json', 'w') as fp:
json.dump(lista_items, fp, indent=4)

Compare multiple text files, and save commons values

My actual code :
import os, os.path
DIR_DAT = "dat"
DIR_OUTPUT = "output"
filenames = []
#in case if output folder doesn't exist
if not os.path.exists(DIR_OUTPUT):
os.makedirs(DIR_OUTPUT)
#isolating empty values from differents contracts
for roots, dir, files in os.walk(DIR_DAT):
for filename in files:
filenames.append("output/" + os.path.splitext(filename)[0] + ".txt")
filename_input = DIR_DAT + "/" + filename
filename_output = DIR_OUTPUT + "/" + os.path.splitext(filename)[0] + ".txt"
with open(filename_input) as infile, open(filename_output, "w") as outfile:
for line in infile:
if not line.strip().split("=")[-1]:
outfile.write(line)
#creating a single file from all contracts, nb the values are those that are actually empty
with open(DIR_OUTPUT + "/all_agreements.txt", "w") as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)
#finale file with commons empty data
#creating a single file
with open(DIR_OUTPUT + "/all_agreements.txt") as infile, open(DIR_OUTPUT + "/results.txt", "w") as outfile:
seen = set()
for line in infile:
line_lower = line.lower()
if line_lower in seen:
outfile.write(line)
else:
seen.add(line_lower)
print("Psst go check in the ouptut folder ;)")
The last lines of my code are checking wether or not, element exists mutliple times. So, may the element exists, once, twice, three, four times. It will add it to results.txt.
But the thing is that I want to save it into results.txt only if it exists 4 times in results.txt.
Or best scenario, compare the 4 .txt files and save elements in commons into results.txt.
But I can't solve it..
Thanks for the help :)
To make it easier,
with open(DIR_OUTPUT + "/all_agreements.txt") as infile, open(DIR_OUTPUT + "/results.txt", "w") as outfile:
seen = set()
for line in infile:
if line in seen:
outfile.write(line)
else:
seen.add(line)
Where can I use the .count() function ?
Because I want to do something like xxx.count(line) == 4 then save it into resulsts.txt
If your files are not super big you can use set.intersection(a,b,c,d).
data = []
for fname in filenames:
current = set()
with open(fname) as infile:
for line in infile:
current.add(line)
data.append(current)
results = set.intersection(*data)
You also don't need to create one single big file for this issue.
Not sure how your input looks like or what output is expected...
But maybe this can spark some ideas:
from io import StringIO
from collections import Counter
lines = ["""\
a=This
b=is
c=a Test
""", """\
a=This
b=is
c=a Demonstration
""", """\
a=This
b=is
c=another
d=example
""", """\
a=This
b=is
c=so much
d=fun
"""]
files = (StringIO(l) for l in lines)
C = Counter(line for f in files for line in f)
print([k for k,v in C.items() if v >= 4])
# Output: ['a=This\n', 'b=is\n']

Categories