Python Zip Files Read - python

I am new to python, I have a .zip file which has multiple sub-folders and each sub-folders has multiple .txt files. I am trying to read all .txt files But I want to store files folder specific into a variable But I am not able to do so.
For eg:
"test.zip" which has three folders "a","b","c", each has multiple(>10,000) .txt files
I want to read all files inside folder "a" and store it into a variable a_file and same with folder "b" and "c"
I tried the following code:
for file in os.listdir():
if file.endswith('test.zip'):
zfile=zipfile.ZipFile(file)
fnames= [f.filename for f in zfile.infolist()]
for subfile in fnames:
if fnames == "a" . #name of a folder
if subfile.endswith('.txt'):
lines=zfile.open(subfile).read()
print(lines)
But the code is extracting all files from multiple folders and not displaying any output maybe because of if condition
it. Instead of a reading folder specific and storing it
Thank You in Advance for helping

That happened because zip file lists the files as follows:
a/a1.txt a/a2.txt b/b1.txt b/b2.txt
So you need to separate files from directory using split('/')
You could try this:
import os
from zipfile import ZipFile
for file in os.listdir():
if file.endswith('test.zip'):
zfile = ZipFile(file);
fnames = [f.filename for f in zfile.filelist];
for subfile in fnames:
dir_name = subfile.split('/')[0];
if(dir_name == 'a'):
if(subfile.endswith('.txt')):
lines = zfile.open(subfile).read();
print(lines);

Related

Python - How can I extract from zip file, then rename extracted file the same as the zip file

Very new to programming. Cobbled some code together. Looked through many different Q&As and haven't found what I am looking for.
I am using Python to extract many zip files that all contain one Excel file. Each Excel file is named incorrectly and I want to rename it using the name of the file it was unzipped from. I know how to extract them, but not rename them.
Here is what I have so far:
import os
import zipfile
dir_name = r"C:\Users\Nuffanael\Documents\Python Scripts\UnZIP Test"
extension = ".zip"
xlsx = ".xlsx"
os.chdir(dir_name) # change directory from working dir to dir with files
for item in os.listdir(dir_name) : #loop through items in dir
if item.endswith(extension) : # check for ".zip" extension
file_name = os.path.abspath(item) # get full path of files
zip_ref = zipfile.ZipFile(file_name) # create zipfile object
zip_ref.extractall(dir_name) # extract file to dir
zip_ref.close() # close file

Copying files with a specific name to another folder

I have folder with 1500 txt files and also folder with 20000 jpg files. 1500 of jpg files are named the same as txt files. I need that these jpg files with names similar to txt files to be moved to another folder.
To put it simply, I have txt files from 1(1) to 5(500), I need to move only files with the name from 1(1) to 5(500) from the folder with jpg files from 0(1) to 9(500).
I tried to write a program in Python via shutil, but I don't understand how to enter all 1500 valuse, so that only these files are moved.
Please tell me how to do this? Thanks in advance!
I found all the names of txt files and tried to copy pictures from another folder with the same names.
Example on pictures:
I have this names
I need to copy only this images, because their names are the same as the names of txt:
import os
import glob
import shutil
source_txt = 'C:\obj\TXT'
dir = 'C:\obj\Fixed_cars'
vendors =['C:\obj\car']
files_txt = [os.path.splitext(filename)[0] for filename in os.listdir(source_txt)]
for file in vendors:
for f in (glob.glob(file)):
if files_txt in f: # if apple in name, move to new apple dir
shutil.move(f, dir)
Well, using this answer on looping through files of a folder
you can loop through the files of your folder.
import os
directory = ('your_path_in_string')
txt_files = []
for filename in os.listdir(directory):
if filename.endswith(".txt"):
txt_files.append(filename.rstrip('.txt'))
for filename in os.listdir(directory):
if (filename.endswith(".jpg") and (filename.rstrip('.jpg') in txt_files)):
os.rename(f"path/to/current/{filename}", f"path/to/new/destination/for/{filename}")
for the the moving file part you can read this answer

Opening excels from different folders with python

Hi have a folder and inside that folder I have got nfolders(400)
In each of those folders I have several documents and one of them is an excel with a key name
Is there any possibility of oppening those excel as df1, df2,dfn?
Does anyone know how to Do a foor loop that opens each of those 400 folders?
Thanks!!
Assuming your excel files have extension '.xlsx'.
I use os.walk(path) from os package. os.walk traverses all the subfolders.
Put the path to the parent folder in path variable.
import os
import pandas as pd
path_to_parentfolder = 'Parent_Folder/'
files = []
for r, d, f in os.walk(path_to_parentfolder):
for file in f:
if '.xlsx' in file: #Enter the extension for your file type
files.append(os.path.join(r, file).replace('/','\\'))
df_list = [pd.read_excel(open(file)) for file in files] #All your data is stored in the list
Read about os.walk in its docs

How to read csv files from multiple zip folders located in the same directory using python?

I have multiple zip folders named zip_folder_1, zip_folder_2, ..., zip_folder_n.
All these zip folders are located in the same directory. Each one of these zip folders contains a csv file named "selected_file.csv".
I need to read each one of the "selected_file.csv" located at each one of the zip folders and concatenate them into a single file
Could someone give me a hint on the required python code to solve this problem? I appreciate your help!
This should produce concatenated_data.csv in your working directory, and assumes that all files in my_data_dir are zip files with data in them.
import os, numpy as np, zipfile
def add_data_to_file(new_data,file_name):
if os.path.isfile(file_name):
mode = 'ab'
else:
mode = 'wb'
with open(file_name,mode) as f:
np.savetxt(f,np.array([new_data]),delimiter=',')
my_data_dir = 'C:/my/zip/data/dir/'
data_files = os.listdir(my_data_dir)
for data_file in data_files:
full_path = os.path.join(my_data_dir,data_file)
with zipfile.ZipFile(full_path,'r',zipfile.ZIP_DEFLATED) as zip_file:
with zip_file.open('selected_file.csv','r') as selected_file:
data = np.loadtxt(selected_file,delimiter=",")
add_data_to_file(data,'concatenated_data.csv')

Walking sub directories in Python and saving to same sub directory

First of all thanks for reading this. I am a little stuck with sub directory walking (then saving) in Python. My code below is able to walk through each sub directory in turn and process a file to search for certain strings, I then generate an xlsx file (using xlsxwriter) and post my search data to an Excel.
I have two problems...
The first problem I have is that I want to process a text file in each directory, but the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt (would I use glob here?)
The second problem is that when I open/create an Excel I would like to save the file to the same sub directory where the .txt file has been found and processed. Currently my Excel is saving to the python script directory, and consequently gets overwritten each time a new sub directory is opened and processed. Would it be wiser to save the Excel at the end to the sub directory or can it be created with the current sub directory path from the start?
Here's my partially working code...
for root, subFolders, files in os.walk(dir_path):
if 'Textfile.txt' in files:
with open(os.path.join(root, 'Textfile.txt'), 'r') as f:
#f = open(file, "r")
searchlines = f.readlines()
searchstringsFilter1 = ['Filter Used :']
searchstringsFilter0 = ['Filter Used : 0']
timestampline = None
timestamp = None
f.close()
# Create a workbook and add a worksheet.
workbook = xlsxwriter.Workbook('Excel.xlsx', {'strings_to_numbers': True})
worksheetFilter = workbook.add_worksheet("Filter")
Thanks again for looking at this problem.
MikG
I will not solve your code completely, but here are hints:
the text file name varies per sub directory, so rather than specifying 'Textfile.txt' I'd like to do something like *.txt
you can list all files in directory, then check file extension
for filename in files:
if filename.endswith('.txt'):
# do stuff
Also when creating woorkbook, can you enter path? You have root, right? Why not use it?
You don't want glob because you already have a list of files in the files variable. So, filter it to find all the text files:
import fnmatch
txt_files = filter(lambda fn: fnmatch.fnmatch(fn, '*.txt'), files)
To save the file in the same subdirectory:
outfile = os.path.join(root, 'someoutfile.txt')

Categories