Extract only jpg files from a .tar.gz file using python - python

Problem Summary:
In one of my folder I have .tar.gz file and I need to extract all the images (.jpg & .png) from it. But I have to use the .tar.gz extension (using path to directory) to extract it and not by using the usual way of giving the input file_name to extract it. I need this in one of the part of GUI (Tkinter) for the image classification project.
Code I'm trying:
import os
import tarfile
def extractfile():
os.chdir('GUI_Tkinter/PMC_downloads')
with tarfile.open(os.path.join(os.environ['GUI_Tkinter/PMC_downloads'], f'Backup_{self.batch_id}.tar.gz'), "r:gz") as so:
so.extractall(path=os.environ['GUI_Tkinter/PMC_downloads'])
The code is not giving any error but it's not working. Please suggest me how to do the same by any other way by specifying the .tar.gz file extension to extract it.

I think you can use this code.
import tarfile
import os
t = tarfile.open('example.tar.gz', 'r')
for member in t.getmembers():
if ".jpg" in member.name:
t.extract(member, "outdir")
print(os.listdir('outdir'))
Hope to be helpful for you. Thanks.

Generic/dynamic way to extract one or more .tar.gz or zip file present in a folder without specifying the file name. This is executed by using the extension and the path (location) of the file. You can extract any type of file (.pdf, .nxml, .xml, .gif, etc.) you want from the .tar.gz/zip/compressed file just by mentioning the extension of the required file as the member name in this code. As, I needed all the images from that .tar.gz file to be extracted in one folder. So, in the code below I have specified the extensions .jpg and .png and extracted all the images in the same directory under a folder named "Extracted_Images". If you want, you can also change the directory where the files needed to be extracted by providing the path parameter.
For example "C:/Users/dell/project/histo_images" instead of "Extracted_Images".
import tarfile
import os
import glob
path = glob.glob("*.tar.gz")
for file in path:
t = tarfile.open(file, 'r')
for member in t.getmembers():
if ".jpg" in member.name:
t.extract(member, "Extracted_Images")
elif ".png" in member.name:
t.extract(member, "Extracted_Images")

Related

Read RTF file using python

reading RTF file using striprtf
rtf_to_text not able to read URL,what changes need to make in the code?
Input
Get latest news update at abc#gmail.com
Output
Get latest news update at
Desired Output
Get latest news update at abc#gmail.com
python code:-
import os
from striprtf.striprtf import rtf_to_text
import pandas as pd
from os import path
path_of_the_directory= r'C:\Users\Documents\filename.rtf'
print("Files and directories in a specified path:")
for filename in os.listdir(path_of_the_directory):
f = os.path.join(path_of_the_directory,filename)
if os.path.isfile(f):
print(f)
open_rtf_file=open(f,'r')
file_content_read=open_rtf_file.read()
text_content=rtf_to_text(file_content_read)
print(text_content)
It looks like you are treating a file as a directory. your path_of_the_directory varaible is actually the path to a rtf file name. Without knowing what specific error you are getting at runtime, it looks to me like that is the problem. An easy way to fix it is to check to make sure the path is a directory prior to calling os.listdir like I do in the example below.
path_of_the_directory= r'C:\Users\Documents\filename.rtf' #<--- this is a file
print("Files and directories in a specified path:")
if os.path.isdir(filename): # check if path is directory
for filename in os.listdir(path_of_the_directory):
f = os.path.join(path_of_the_directory,filename)
if os.path.isfile(f):
print(f)
open_rtf_file=open(f,'r')
file_content_read=open_rtf_file.read()
text_content=rtf_to_text(file_content_read)
print(text_content)

Why using Image.save() (from Pillow in Python) with variables is not working?

On a Python course, I have to write a script to transform a bunch of images (with wrong format(.tiff) and size) to '.jpeg' and save them with the same name to another folder!
The problem is it won't save it to the directory I want (even on the same directory) using the path + file variable + the image format. I used the os.path.join() method too but it did not work either. I managed to do it a few times but not in the directory I want. Can you give me some advice? Thank you!
#!/usr/bin/env python3
from PIL import Image
import os
files = os.listdir('/home/dani/images')
if not os.path.exists('/home/dani/images/opt/icons'):
os.makedirs('/home/dani/images/opt/icons')
for file in files[1:]:
if not os.path.isdir('/home/dani/images/'+ file):
im = Image.open('/home/dani/images/'+file)
im.convert('RGB').rotate(-90).resize((128,\
128)).save('/home/dani/images/opt/icons/'+file +'.jpeg')
Sometimes saving has problems with permission. Try saving the file in the same folder you have your .py file.
If saving in the same folder works, then the problem is a permission. Try os.umask()

How to import folder in finder on Mac to python

I have a folder of 400 individual data files saved on my Mac with pathway Users/path/to/file/data. I am trying to write code that will iterate through each data file and plot the data inside of it, however I am having trouble actually importing this folder of all the data into python. Does anyone have a way for me to import this entire folder so I can just iterate through each file by writing
for file in folder:
read data file
plot data file
Any help is greatly appreciated. Thank you!
EDIT: I am using Spyder for this also
You can use the os module to list all files in a directory by path. os provides a function os.listdir(), that when a path is passed, lists all items located in that path, like this: ['file1.py', 'file2.py']. If no argument is passed, it defaults to the current one.
import os
path_to_files = "Users/path/to/file/data"
file_paths = os.listdir(path_to_files)
for file_path in file_paths:
# reading file
# "r" means that you open the file for *reading*
with open(file_path, "r") as file:
lines = file.readlines()
# plot data....

How to read a CSV from a folder without file name in Python

I need to read a CSV file from a folder, which is generating from another Module. If that Module fails it won't generate the folder which will have a CSV file.
Example:
path = 'c/files' --- fixed path
When Module successfully runs it will create a folder called output and a file in it.
path =
'c/files/output/somename.csv'
But here is a catch everytime it generates a output folder, CSV file has a different name.
First i need to check if that output folder and a CSV file is there or not.
After that I need to read that CSV.
The following will check for existance of output folder as well as csv file and read the csv file:
import os
import pandas as pd
if 'output' in os.listdir('c/files'):
if len(os.listdir('c/files/output')>0:
x=[i for i in os.listdir('c/files/output') if i[-3:]=='csv][0]
new_file=pd.read_csv(x)
glob.glob can help.
glob.glob('c/files/output/*.csv') returns either an empty list or a list with (hopefully) the path to a single file
You may also try to get the latest file based on creation time, after you have done check on existence of a directory (from above post). Something like below
list_of_files = glob.glob("c/files/*.csv")
latest_file = max(list_of_files, key=os.path.getctime)
latest_file is the latest created file in your directory.

How to load a text or csv file which is in a different parallele directory using a python script when i do not have access to absolute path

I have some text files in a directory (say A/b1/b2/File.txt), while my script is in another directory (say A/m1/m2/Program.py). How do I load the text file from the python script.
I am not looking to import any module, or function from other python script, but loading a non-python file(like text or csv) from some parallel location using my python script.
You can just enter the full address:
File=open("C:\A\b1\b2\File.txt") #Assuming you're using windows.
If you don't know the full address, then you may need to import from a library.
from os.path import realpath
from os.path import dirname
Path=realpath(__file__)
while Path.split("\\")[-1]!="A":Path=dirname(Path)
Path+="\\b1\\b2\\File.txt"
File=open(Path)
just specify the path and open it.
file_to_open = r'A/b1/b2/somefile.csv' # don't forget the 'r' here to prevent backslash issues
with open(file_to_open, 'r') as file:
df1 = pd.read_csv(file)

Categories