Audio Detection for Python - python

I wanted to make a python program which detects system audio. Just like open-cv match template, is there any module which can detect a certain "characteristic" of an image (or audio for my case) and we can set a threshold value, and if the threshold value is crossed, we can pass a function. Basically, I am making this program for fishing in Minecraft. There is a mod which "pings" me or makes a sound whenever a fish is stuck in the fishing rod. I wanted to input and save the sound in the program (which can be easily done using modules) and see if the audio passes a certain threshold value in order to carry out the process of right clicking the mouse (which can be done by pyautogui module). I want this program to run infinitely until I press a key (which can be done using keyboard.is_pressed("ctrl") ).
I tried googling for a solution but due to my near to no knowledge of sound processing, I was unable to come to a solution. A logic for doing the infinite loop thing was to change the input setting in windows to system sounds and then recording and saving an audio wave file every five seconds and as soon as the file is saved, the program "tries" (try and except block) to delete its previous file in order to save storage and processing the current sound in the same loop.
Following is a sample code I made using open-cv using image detection:
(NOTE: You don't necessarily need to look into this code to solve my problem, this is just for an idea and I want a similar code but with audio instead of image detection)
import cv2 as cv
import os
import pyautogui
import time
from PIL import ImageGrab
import keyboard
path_1 = r"C:\Users\Aditya\AppData\Roaming\.minecraft\screenshots\2022-12-21_11.37.40 - Copy.png" #path_1 is the path of the "template" i.e. the image that I am trying to detect on the screen
threshold = 50672740 #this number was obtained after experimentally testing
path = "C:\\Users\\Aditya\\Documents\\fishing_pics\\"
#path variable is the path where the screenshots are being saved
n = len(os.listdir(path))+1
while keyboard.is_pressed("ctrl")==False:
filename = path+str(n)+".png"
prev2_filename = path+str(n-2)+".png"
n+=1
screenshot = ImageGrab.grab()
screenshot.save(filename,"PNG")
try:
os.remove(prev2_filename)
except:
print (prev2_filename+" doesn't exists")
haystack_img = cv.imread(filename,0)
needle_img = cv.imread(path_1,0)
haystack_img = cv.cvtColor(haystack_img,cv.COLOR_BGR2GRAY)
needle_img = cv.cvtColor(needle_img,cv.COLOR_BGR2GRAY)
result = cv.matchTemplate(needle_img,haystack_img,cv.TM_CCOEFF_NORMED)
min_val, max_val, min_loc, max_loc = cv.minMaxLoc(result)
if max_val>=threshold:
pyautogui.click(button='right')
time.sleep(1)
pyautogui.click(button='right')
time.sleep(0.1)
for file_name in os.listdir(path):
file = path + file_name
os.remove(file)
#this is used to remove any file left in the folder where screenshots are saved in order to save strorage
LOGIC >>> Ignore variable "n" because its just used to name the file as 1.png,2.png,etc...
I am capturing and saving a picture of my screen using ImageGrab from Pillow (PIL) and then comparing it to the template image in order to get a threshold value and if the threshold value is greater than a certain value (set by me) then it performs a function

Related

tf.image.decode_jpeg often taking forever to load file

The following code is part of my code for a tf graph to read images. When I use this code to iterate through the data, the program gets stuck in tf.io.read_file(path) after a few hundred images forever and doesn't do anything. More specifically, the code even can't be paused and I had to restart the session every time.
#tf.function()
def read_image(path):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image)
return image
...
div8k_list=[os.path.join(div8k_save_path, x) for x in os.listdir(div8k_save_path)]
train_path = tf.data.Dataset.from_tensor_slices(div8k_list)
train_images = train_path.map(read_image, num_parallel_calls=tf.data.AUTOTUNE)
I first suspected that there were a few corrupted images or wrong paths in the data that were causing this problem and tested the following code.
for path in train_path:
print(path)
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image)
Surprisingly, there was no common characteristic of the image path the loop was stuck. And it was not a problem of the image because the loop was once stuck at 1056.png but when I explicitly loaded 1056.png, there was no problem.
What could be the cause of this problem?
edit: to summarize, the program is stuck at read_image forever, while I couldn't find a problem in the dataset.
My dataset is the DIV8K dataset and I am running in COLAB.
EDIT The function that is slowing my code is decode_jpeg, because the following definition of read_image worked multiple times.
#tf.function()
def read_image(path):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image)
return image
As mentioned in the comment, try the following function to decode the image file as it can handle mixed extension file format (jpg, png etc), ref.
tf.io.decode_image(image, expand_animations = False)
However, decode_jpeg should also able to handle the .png file format now. Without the image file, it's hard to break down what's causing to prevent this. Most probably the file is somehow corrupted or not the valid extension for the decode_jpeg, though it's named that way, check this solution.

Defining a filename and calling the filename in various loops and functions

In short, I have written a code that opens up a file and does a number of modifications on it. However, I don't want to keep going through my script and renaming all the files when I want to open up a new file.
I'm thinking of setting a variable early on that defines the filename, i.e.
A=filename('png1.png')
B=filename('png2.png')
However, I don't quite know how to implement this. This is my current code:
import os
from os import path
import numpy as np
from PIL import Image
from wordcloud import WordCloud, STOPWORDS
#d=path.dirname(_file_) if "_file_" in locals() else os.getcwd()
os.chdir('C:/Users/Sams PC/Desktop/Word_Cloud_Scripts/Dmitrys Papers/Word_Cloud_Dmitry')
Document=open('Dmitry_all_lower.txt', 'r', encoding='utf-8')
text=Document.read()
heart_mask=np.array(Image.open("**png1.png**"))
print (heart_mask)
split= str('**png1.png**').rsplit('.')
extension=split[len(split)-1]
if extension == "png":
image = Image.open("**png1.png**")
image.convert("RGBA") # Convert this to RGBA if possible
canvas = Image.new('RGBA', image.size, (255,255,255,255)) # Empty canvas colour (r,g,b,a)
canvas.paste(image, mask=image) # Paste the image onto the canvas, using it's alpha channel as mask
#canvas.thumbnail([width, height], Image.ANTIALIAS)
canvas.save('**png2.png**')
from wand.image import Image
with Image(filename='**png2.png**') as img:
img.format='jpeg'
img.save(filename='**png1.jpg**')
from PIL import Image
heart_mask=np.array(Image.open("**png1.jpg**"))
else:
print ('')
print (heart_mask)
stopwords=set(STOPWORDS)
stopwords.update(["will", "us","protein","residue", "interaction","residues","using","proteins","thus","fig"])
wc= WordCloud(stopwords=stopwords, background_color="white",max_words=1000, mask=heart_mask, contour_width=3, contour_color='black')
print ('Generating Word Cloud')
wc.generate(text)
wc.to_file("Dmitry3.png")
import matplotlib.pyplot as plt
plt.figure()
plt.imshow(wc,interpolation="bilinear")
plt.axis("off")
print ('Generation Done')
plt.show()
I've put the entire thing just to see what's going on, but I've bolded (put stars next to), the files I'm trying to modify in my idea. As you can see, I have multiple calls to my file 'png1.png', and I also have calls to save a modified version of that file to 'png2.png' and later a jpeg version of it 'png1.jpg'. I don't want to have to go through my script each time and change each one individually. I was hoping to define them earlier such as A=png1, B=png2, C=jpg1 so that I can replace the calls in my loops with simply A B and C, and if I do choose a new image to upload, I simply change 1 or 2 lines rather than 5 or 6. I.E.
heart_mask=np.array(Image.open("A"))
split= str('A').rsplit('.')
image = Image.open("A")
canvas.save('B')
... so on and so forth
To make your task easier, perhaps you should establish a naming standard defining which files are to be modified, and which ones are already processed. Also, the images you are to process should have a dedicated directory for the purpose.
From what I understand in your code, PNG files are the ones getting processed, while the JPEG files are already done. You can use os.listdir() to traverse a list of files which have a .png extension, something similar to the one below:
for file in os.listdir( "/dedicated_image_dir" ):
if file.endswith(".png"):
# Process your PNG images here
That way, you wouldn't even need to change your code just to accommodate new PNG images with different filenames.

Lower the brightness of all RGB pixels by 20% in Python?

I have been trying to teach myself more advanced methods in Python but can't seem to find anything similar to this problem to base my code off of.
First question: Is this only way to display an image in the terminal to install Pillow? I would prefer not to, as I'm trying to then teach what I learn to a very beginner student. My image.show() function doesn't do anything.
Second question: What is the best way to go about lowering the brightness of all RGB pixels in an image by 20%? What I have below doesn't do anything to the alter the brightness, but it also can compile completely. I would prefer the most simple way to go about this as far as importing minimal libraries.
Third Question: How do I made a new picture instead of changing the original? (IE- lower brightness 20%, "image-decreasedBrightness.jpg" is created from "image.jpg")
here is my code - sorry it isn't formatted correctly. Every time i tried to indent it would tab down to the tags bar.
import Image
import ImageEnhance
fileToBeOpened = raw_input("What is the file name? Include file type.")
image = Image.open(fileToBeOpened)
def decreaseBrightness(image):
image.show()
image = image.convert('L')
brightness = ImageEnhance.Brightness(image)
image = brightness.enhance(20)
image.show()
return image
decreaseBrightness(image)
To save the image as a file, there's an example on the documentation:
from PIL import ImageFile
fp = open("lena.pgm", "rb")
p = ImageFile.Parser()
while 1:
s = fp.read(1024)
if not s:
break
p.feed(s)
im = p.close()
im.save("copy.jpg")
The key function is im.save.
For a more in-depth solution, get a nice beverage, find a comfortable place to sit and enjoy your read:
Pillow 3.4.x Documentation.

Python/Pygame Converting a .jpg to a string and back to a .jpg: Corruption Issue

I'm making a program in Python using Pygame that will load an image to the screen, open the raw data (as in, the characters you would see if you opened the jpg as a text file), throw some random characters in with the data, and then resave it as a jpg to load into pygame again. This results in a cool looking glitch effect.
I am not having any problems with the desired glitches, but I was finding that despite what kind of random character was placed where, for certain images every time the image went through my glitch function I ended up with a grey bar on the bottom of the image. I simplified my function so that all it did was load the image, open the image as a read binary (even though I'm on a mac), save a string of the raw data, write a new file based on this string and then load that file. The image was not purposefully glitched in any way, and the data was supposedly untouched but I still encountered this grey bar.
Here is the relevant code:
def initializeScreen(x, y):
pygame.display.set_mode((x,y))
return pygame.display.get_surface()
def importImage(fileName):
imgText = open(fileName, 'rb')
imgTextStr = imgText.read()
imgText.close()
return imgTextStr
screenSurf = initializeScreen(800,600)
textOfImg = importImage('/Users/Amoeba/Desktop/GlitchDriving/Clouds.jpg')
newFile = open('/Users/Amoeba/Desktop/GlitchDriving/tempGlitchFile.jpg', 'wb')
newFile.write(textOfImg)
newimgSurf = pygame.image.load('/Users/Amoeba/Desktop/GlitchDriving/tempGlitchFile.jpg')
screenSurf.blit(newimgSurf, (0,0))
pygame.display.flip()
Here is an example of one of the images before and after passing through my function:
It is worth noting that the size of the grey bar depends on the picture. Some pictures even pass through my function visibly unchanged, as they should be. Also, if I open the new version of the jpg written by my program with image viewing software like preview, the grey bar does not appear. My suspicion is that it is a quirk of the pygame image load function or that there is some strange character (or possibly white space) that is being dropped in my conversion from jpg to string or vice-versa. I did compare two of the text files (one with grey bar and one without) and found no difference while using an online "difference finder".
This is my first post here, but I've lurked for answers dozens of times. Any help is greatly appreciated.
You never close the file object you create with open, so probably not all data gets written back (flushed) to your new file.
Either close the file object before trying to read the file again, or better start using the with statement (which will close the file for you) whenever you deal with files:
def importImage(fileName):
with open(fileName, 'rb') as imgText:
return imgText.read()
screenSurf = initializeScreen(800,600)
textOfImg = importImage(r'path/to/file')
with open(r'path/to/otherfile', 'wb') as newFile:
newFile.write(textOfImg)
newimgSurf = pygame.image.load(r'path/to/otherfile')
screenSurf.blit(newimgSurf, (0,0))
pygame.display.flip()

Python activity logging

I have a question regarding logging for somescript.py
The script performs some actions to find matches for words the user is looking for in some pages that have become unreadable due to re-formatting and printing of the pages.
Because of this, OCR techniques don't work for us anymore so i've come up with a script that compares countours of words to find matches.
the script looks something like:
import cv2
from cv2 import *
import numpy as np
method = cv.CV_TM_SQDIFF_NORMED
template_name = "this.png"
image_name = "3.tif"
needle = cv2.imread(template_name)
haystack = cv2.imread(image_name)
# Convert to gray:
needle_g = cv2.cvtColor(needle, cv2.CV_32FC1)
haystack_g = cv2.cvtColor(haystack, cv2.CV_32FC1)
# Attempt match
d = cv2.matchTemplate(needle_g, haystack_g, cv2.cv.CV_TM_SQDIFF_NORMED)
#we want the minimum squared difference
mn,_,mnLoc,_ = cv2.minMaxLoc(d)
print mnLoc
# Draw the rectangle
MPx,MPy = mnLoc
trows,tcols = needle_g.shape[:2]
#Normed methods give better results, ie matchvalue = [1,3,5], others sometimes showserrors
cv2.rectangle(haystack, (MPx,MPy),(MPx+tcols,MPy+trows),(0,0,255),2)
cv2.imshow('output',haystack)
cv2.waitKey(0)
import sys
sys.exit(0)
Now i want to log the various tasks that the script performs, like
converting the image to grayscale
attempting a match
drawing the rectangle
I have seen a few scripts on stackoverflow explaining how to log an entire script or the entire output but i haven't found anything that just logs a few actions.
Also i would like to add the date and time the activity was performed.
Furthermore i have wrote a function that calculates an MD5 and SHA1 hash of the input file, for this particular case, that is for 'this.png' and '3.tif', I have yet to implement this piece of code but would it be easy to log that as well?
I am a python-noob so if the anwsers are obvious to you guys you know why i couldn't figure it out myself.
I hope you can help me out on this one!

Categories