How to bulk rename JSON files? - python

I am trying to rename and replace a certain text with the hexadecimal for thousands of JSON files.
Could you help how to rename and replace them in bulk files?
For instance, the current file name is 10 and it has 10 in the JSON file, I
would like to rename/switch to A.

Here is a theoretical solution:
import os
file_names = ["1_file.json", "10_file.json", "27_file.json", "44_file.json"]
for file in file_names:
file_name_parts = file.split('_')
file_name_parts[0] = hex(int(file_name_parts[0])).replace('0x', '').upper()
renamed_file = '_'.join(file_name_parts)
print(file, 'will be', renamed_file)
os.rename(file, renamed_file)
1_file.json will be 1_file.json
10_file.json will be A_file.json
27_file.json will be 1B_file.json
44_file.json will be 2C_file.json
Other stuff depends on details that you didn't describe

Related

SAX Parser in Python

I am parsing xml files in a folder using Python SAX Parser and writing the output in CSV using pandas, But I am getting only the data from last file in the CSV.
I am new to Python and this is for the first time trying SAX Parsing
File read:
for dirpath, dirs, files in os.walk(fp1):
for filename in files:
print(files)
fname = os.path.join(dirpath,filename)
if fname.endswith('.xml'):
print(fname)
#for count in files:
parser.parse(fname)
def characters(self, content):
rows = []
cols = ["ReporterCite","DecisionDate","CaseName","FileNum","CourtLocation","CourtName","CourtAbbrv","Judge","CaseLength","CourtCite","ParallelCite","CitedCount","UCN"]
#ReporteCite, DecisionDate, CaseName, FileNum, CourtLocation, CourtName, CourtAbbrv, Judge, CaseLength, CourtCite, ParallelCite, CitedCount, UCN
rows.append({"ReporterCite":self.rc,
"DecisionDate": self.dd,
"CaseName": self.can,
"FileNum": self.fn,
"CourtLocation": self.loc,
"CourtName": self.cn,
"CourtAbbrv": self.ca,
"Judge": self.j,
"CaseLength": self.cl,
"CourtCite": self.cc,
"ParallelCite": self.pc,
"CitedCount": self.cd,
"UCN": self.rn})
#print(rows)
df = pd.DataFrame(rows, columns=cols)
df.to_csv(fp2,index=False)
I assume you will always overwrite your previous result. This is a pandas question, not a SAX question. You would like append to the existing csv, right? If this is the case you have to use the mode = ‘a’, like
df.to_csv('filename.csv',mode = 'a')
More options, see Doc
'w' open for writing, truncating the file first (default)
'x' open for exclusive creation, failing if file already exists
'a' open for writing, appending to the end of file if it exists

Extracting files from zip file from GET request

I currently have a GET request to a URL that returns three things: .zip file, .zipsig file, and a .txt file.
I'm only interested in the .zip file which has dozens of .json files. I would like to extract all these .json files, preferable directly into a single pandas data frame, but extracting them into a folder also works.
Code so far, mostly stolen:
license = requests.get(url, headers={'Authorization': "Api-Token " + 'blah'})
z = zipfile.ZipFile(io.BytesIO(license.content))
billingRecord = z.namelist()[0]
z.extract(billingRecord, path = "C:\\Users\\Me\\Downloads\\Json license")
This extracts the entire .zip file to the path. I would like to extract the individual .json files from said .zip file to the path.
import io
import zipfile
import pandas as pd
import json
dfs = []
with zipfile.ZipFile(io.BytesIO(license.content)) as zfile:
for info in zfile.infolist():
if info.filename.endswith('.zip'):
zfiledata = io.BytesIO(zfile.read(info.filename))
with zipfile.ZipFile(zfiledata) as json_zips:
for info in json_zips.infolist():
if info.filename.endswith('.json'):
json_data = pd.json_normalize(json.loads(json_zips.read(info.filename)))
dfs.append(json_data)
df = pd.concat(dfs, sort=False)
print(df)
I would do something like this. Obviously this is my test.zip file but the steps are:
List the files from the archive using the .infolist() method on your z archive
Check if the filename ends with the json extension using .endswith('.json')
Extract that filename with .extract(info.filename, info.filename)
Obviously you've called your archive z but mine is archive bu that should get you started.
Example code:
import zipfile
with zipfile.ZipFile("test.zip", mode="r") as archive:
for info in archive.infolist():
print(info.filename)
if info.filename.endswith('.png'):
print('Match: ', info.filename)
archive.extract(info.filename, info.filename)

Save file name and extension based on another file with .to_csv()

I have an input file name file_a.xml
I already created a function to parse out the xml and save it as a df. Then I used df.to_csv
to save the output file name file_a.csv
Is there a way to do this automatically with default filename and extension?
I need to iterate over a folder with lots of .xml files, so I like to the output filename & extension it based on the input xml file.
xml_file = open ('file/path/dir/file_a.xml','r').read()
def XML_to_CSV(xml_file):
...code to parse out xml...
return df
csv_data = df.to_csv('file/path/dir/file_a.csv',index = False)
Try something like this:
import os
import pandas as pd
from pathlib import Path
for file in os.listdir("your dir"):
if file.endswith(".xml"):
...make the xml turn df.
df.to_csv(Path(file).stem + '.csv', index=False)

reading file names from a list and then appending them does not append files

I have a list contains names of the files.
I want to append content of all the files into the first file, and then copy that file(first file which is appended) to new path.
This is what I have done till now:
This is part of code for appending (I have put a reproducable program in the end of my question please have a look on that:).
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), out_path_tempappendedfiles)
else:
with open(appended[0],'a+') as myappendedfile:
for file in appended:
myappendedfile.write(file)
shutil.copy(os.path.join(path, myappendedfile.name), out_path_tempappendedfiles)
this one will run successfully and copy successfully but it does not append files it just keep the content of the first file.
I have also tried this link it did not raises error but did not append files. so the same code except instead of using write I used shutil.copyobject
with open(file,'rb') as fd:
shutil.copyfileobj(fd, myappendedfile)
the same thing happend.
Update1
This is the whole code:
Even with the update it still does not append:
import os
import pandas as pd
d = {'Clinic Number':[1,1,1,2,2,3],'date':['2015-05-05','2015-05-05','2015-05-05','2015-05-05','2016-05-05','2017-05-05'],'file':['1a.txt','1b.txt','1c.txt','2.txt','4.txt','5.txt']}
df = pd.DataFrame(data=d)
df.sort_values(['Clinic Number', 'date'], inplace=True)
df['row_number'] = (df.date.ne(df.date.shift()) | df['Clinic Number'].ne(df['Clinic Number'].shift())).cumsum()
import shutil
path= 'C:/Users/sari/Documents/fldr'
out_path_tempappendedfiles='C:/Users/sari/Documents/fldr/temp'
for rownumber in df['row_number'].unique():
appended = df[df['row_number']==rownumber]['file'].tolist()
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), out_path_tempappendedfiles)
else:
with open(appended[0],'a') as myappendedfile:
for file in appended:
fd=open(file,'r')
myappendedfile.write('\n'+fd.read())
fd.close()
Shutil.copy(os.path.join(path, myappendedfile.name), out_path_tempappendedfiles)
Would you please let me know what is the problem?
you can do it like this, and if the size of files are to large to load, you can use readlines as instructed in Python append multiple files in given order to one big file
import os,shutil
file_list=['a.txt', 'a1.txt', 'a2.txt', 'a3.txt']
new_path=
with open(file_list[0], "a") as content_0:
for file_i in file_list[1:]:
f_i=open(file_i,'r')
content_0.write('\n'+f_i.read())
f_i.close()
shutil.copy(file_list[0],new_path)
so this how I resolve it.
that was very silly mistake:| not joining the basic path to it.
I changed it to use shutil.copyobj for the performance purpose, but the problem only resolved with this:
os.path.join(path,file)
before adding this I was actually reading from the file name in the list and not joining the basic path to read from actual file:|
for rownumber in df['row_number'].unique():
appended = df[df['row_number']==rownumber]['file'].tolist()
print(appended)
if (len(appended) == 1):
shutil.copy(os.path.join(path, appended[0]), new_path)
else:
with open(appended[0], "w+") as myappendedfile:
for file in appended:
with open(os.path.join(path,file),'r+') as fd:
shutil.copyfileobj(fd, myappendedfile, 1024*1024*10)
myappendedfile.write('\n')
shutil.copy(appended[0],new_path)

get file list of files contained in a zip file

I have a zip archive: my_zip.zip. Inside it is one txt file, the name of which I do not know. I was taking a look at Python's zipfile module ( http://docs.python.org/library/zipfile.html ), but couldn't make too much sense of what I'm trying to do.
How would I do the equivalent of 'double-clicking' the zip file to get the txt file and then use the txt file so I can do:
>>> f = open('my_txt_file.txt','r')
>>> contents = f.read()
What you need is ZipFile.namelist() that will give you a list of all the contents of the archive, you can then do a zip.open('filename_you_discover') to get the contents of that file.
import zipfile
# zip file handler
zip = zipfile.ZipFile('filename.zip')
# list available files in the container
print (zip.namelist())
# extract a specific file from the zip container
f = zip.open("file_inside_zip.txt")
# save the extraced file
content = f.read()
f = open('file_inside_zip.extracted.txt', 'wb')
f.write(content)
f.close()
import zipfile
zip=zipfile.ZipFile('my_zip.zip')
f=zip.open('my_txt_file.txt')
contents=f.read()
f.close()
You can see the documentation here. In particular, the namelist() method will give you the names of the zip file members.

Categories