Python multiple search string function

Python multiple search string function - python

I have below file contents
apples:100
books:100
pens:200
banana:300
I have below code to search string in file:
def search_string(file_search, search_string):
search_output = []
with open(file_search) as f:
for line in f:
if search_string in line:
search_output.append(line)
return search_output
To search apples:
search_string("filename", "apples")
Some cases, I have to search two or three strings depends on requirement in same file, so I need to write separate functions, or can we achieve in same function. If same function can any one help
for two string search I have below code:
def search_string2(file_search, search_string1, search_strin2):
search_output = []
with open(file_search) as f:
for line in f:
if search_string1 in line or search_string2 in line:
search_output.append(line)
return search_output

You can achieve this in a single function with varargs, by naming an argument with a preceding *, which collects all additional positional arguments into a tuple under that name. Then you use any with a generator expression to generalize the test to cover an unknown number of search strings:
def search_string(file_search, *search_strings):
search_output = []
with open(file_search) as f:
for line in f:
if any(searchstr in line for searchstr in search_strings):
search_output.append(line)
return search_output
This is fine for a smallish number of search strings, but if you need to handle a large volume of search strings, scanning a huge input or many small inputs, I'd suggest looking at more advanced optimizations, e.g. Aho-Corasick.

Just declare string2=None, then:
if str1 in line:
....
if str2 is not None and str2 in line:
....

Related

Is there a way to reverse the order of lines within a text file using a function in python?

def encrypt():
while True:
try:
userinp = input("Please enter the name of a file: ")
file = open(f"{userinp}.txt", "r")
break
except:
print("That File Does Not Exist!")
second = open("encoded.txt", "w")
for line in file:
reverse_word(line)
def reverse_word(line):
data = line.read()
data_1 = data[::-1]
print(data_1)
return data_1
encrypt()
I'm currently supposed to make a program that encrypts a text file in some way, and one method that I'm trying to use is reversing the sequence of the lines in the text file. All of my other functions already made, utilize the "for line in file", where "line" is carried over to each separate function, then changed for the purpose of encryption, but when trying to do the same thing here for reversing the order of the lines in the file, I get an error
"str" object has no attribute "read"
I've tried using the same sequence as I did down below, but instead carrying over the file, which works, but I want to have it so that it can work when I carry over individual lines from the file, as is, with the other functions that I have currently (or more simply put, having this function inside of the for loop).
Any Suggestions? Thanks!

Are you trying to reverse the order of the lines or the order of the words in each line?
Reversing the lines can be done by simply reading the lines and using the built-in reverse function:
lines = fp.readlines()
lines.reverse()
If you're trying to reverse the words (actual words, not just the string of characters in each line) you're going to need to do some regex to match on word boundaries.
Otherwise, simply reversing each line can be done like:
lines = fp.readlines()
for line in lines:
chars = list(line)
chars.reverse()

I think the bug you're referring to is in this function:
def reverse_word(line):
data = line.read()
data_1 = data[::-1]
print(data_1)
return data_1
You don't need to call read() on line because it's already a string; read() is called on file objects in order to turn them into strings. Just do:
def reverse_line(line):
return line[::-1]
and it will reverse the entire line.
If you wanted to reverse the individual words in the line, while keeping them in the same order within the line (e.g. turn "the cat sat on a hat" to "eht tac tas no a tah"), that'd be something like:
def reverse_words(line):
return ' '.join(word[::-1] for word in line.split())
If you wanted to reverse the order of the words but not the words themselves (e.g. turn "the cat sat on a hat" to "hat a on sat cat the"), that would be:
def reverse_word_order(line):
return ' '.join(line.split()[::-1])

python open csv search for pattern and strip everything else

I got a csv file 'svclist.csv' which contains a single column list as follows:
pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1
pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs
I need to strip each line from everything except the PL5 directoy and the 2 numbers in the last directory
and should look like that
PL5,00
PL5,01
I started the code as follow:
clean_data = []
with open('svclist.csv', 'rt') as f:
for line in f:
if line.__contains__('profile'):
print(line, end='')
and I'm stuck here.
Thanks in advance for the help.

you can use the regular expression - (PL5)[^/].{0,}([0-9]{2,2})
For explanation, just copy the regex and paste it here - 'https://regexr.com'. This will explain how the regex is working and you can make the required changes.
import re
test_string_list = ['pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1',
'pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs']
regex = re.compile("(PL5)[^/].{0,}([0-9]{2,2})")
result = []
for test_string in test_string_list:
matchArray = regex.findall(test_string)
result.append(matchArray[0])
with open('outfile.txt', 'w') as f:
for row in result:
f.write(f'{str(row)[1:-1]}\n')
In the above code, I've created one empty list to hold the tuples. Then, I'm writing to the file. I need to remove the () at the start and end. This can be done via str(row)[1:-1] this will slice the string.
Then, I'm using formatted string to write content into 'outfile.csv'

You can use regex for this, (in general, when trying to extract a pattern this might be a good option)
import re
pattern = r"pf=/usr/sap/PL5/SYS/profile/PL5_.*(\d{2})"
with open('svclist.csv', 'rt') as f:
for line in f:
if 'profile' in line:
last_two_numbers = pattern.findall(line)[0]
print(f'PL5,{last_two_numbers}')
This code goes over each line, checks if "profile" is in the line (this is the same as _contains_), then extracts the last two digits according to the pattern

I made the assumption that the number is always between the two underscores. You could run something similar to this within your for-loop.
test_str = "pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1"
test_list = test_str.split("_") # splits the string at the underscores
output = test_list[1].strip(
"abcdefghijklmnopqrstuvwxyz" + str.swapcase("abcdefghijklmnopqrstuvwxyz")) # removing any character
try:
int(output) # testing if the any special characters are left
print(f"PL5, {output}")
except ValueError:
print(f'Something went wrong! Output is PL5,{output}')

How to add numbers from a file into a list?

I am trying to read a file that has a list of numbers in each line. I want to take only the list of numbers and not the corresponding ID number and put it into a single list to later sort by frequencies in a dictionary.
I've tried to add the numbers into the list and I am able to get just the numbers that I need but I can not get it to add to the list correctly.
I have the function to read the file and to find just the location that I want to read from the line. I then try to add it to the list but it continues to come up like:
['23,43,56,', '67,87,34',]
And I want it to look like this:
[23, 43, 56, 67, 87, 34]
Here is my Code
def frequency():
f = open('Loto4.txt', "r")
list = []
for line in f:
line.strip('\n')
start = line.find("[")
end = line.find("]")
line = line[start+1:end-1]
list.append(line)
print(line)
print(list)
frequency()
This is the file that I am reading:
1:[36,37,38,9]
2:[3,5,28,25]
3:[10,14,15,9]
4:[23,9,31,41]
5:[5,2,21,9]

Try using a list comprehension on the line with append (i changed it to extend), also please do not name variables a default python builtin, since list is one, I renamed it to l, but please do this on your own next time, also see #MichaelButscher's comment:
def frequency():
f = open('Loto4.txt', "r")
l = []
for line in f:
line = line.strip('\n')
start = line.find("[")
end = line.find("]")
line = line[start + 1:end]
l.extend([int(i) for i in line.split(',')])
print(line)
print(l)
frequency()

The literal_eval method of ast module can be used in this case.
from ast import literal_eval
def frequency()
result_list = list()
with open('Loto4.txt') as f:
for line in f:
result_list.extend(list(literal_eval(line)))
print (result_list)
return result_list
The literal_eval method of ast (abstract syntax tree) module is used to safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
This can be used for safely evaluating strings containing Python values from untrusted sources without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing.

def frequency():
f = open('Loto4.txt', "r")
retval = []
for line in f:
line.strip('\n')
start = line.find("[")
end = line.find("]")
line = line[start+1:end-1]
retval.extend([int(x) for x in line.split(',')])
print(line)
print(retval)
frequency()
I changed the name of the list to retval - since list is a builtin class.

Regular Expression to find valid words in file

I need to write a function get_specified_words(filename) to get a list of lowercase words from a text file. All of the following conditions must be applied:
Include all lower-case character sequences including those that
contain a - or ' character and those that end with a '
character.
Exclude words that end with a -.
The function must only process lines between the start and end marker lines
Use this regular expression to extract the words from each relevant line of a file: valid_line_words = re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line)
Ensure that the line string is lower case before using the regular expression.
Use the optional encoding parameter when opening files for reading. That is your open file call should look like open(filename, encoding='utf-8'). This will be especially helpful if your operating system doesn't set Python's default encoding to UTF-8.
The sample text file testing.txt contains this:
That are after the start and should be dumped.
So should that
and that
and yes, that
*** START OF SYNTHETIC TEST CASE ***
Toby's code was rather "interesting", it had the following issues: short,
meaningless identifiers such as n1 and n; deep, complicated nesting;
a doc-string drought; very long, rambling and unfocused functions; not
enough spacing between functions; inconsistent spacing before and
after operators, just like this here. Boy was he going to get a low
style mark.... Let's hope he asks his friend Bob to help him bring his code
up to an acceptable level.
*** END OF SYNTHETIC TEST CASE ***
This is after the end and should be ignored too.
Have a nice day.
Here's my code:
import re
def stripped_lines(lines):
for line in lines:
stripped_line = line.rstrip('\n')
yield stripped_line
def lines_from_file(fname):
with open(fname, 'rt') as flines:
for line in stripped_lines(flines):
yield line
def is_marker_line(line, start='***', end='***'):
min_len = len(start) + len(end)
if len(line) < min_len:
return False
return line.startswith(start) and line.endswith(end)
def advance_past_next_marker(lines):
for line in lines:
if is_marker_line(line):
break
def lines_before_next_marker(lines):
valid_lines = []
for line in lines:
if is_marker_line(line):
break
valid_lines.append(re.findall("[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+", line))
for content_line in valid_lines:
yield content_line
def lines_between_markers(lines):
it = iter(lines)
advance_past_next_marker(it)
for line in lines_before_next_marker(it):
yield line
def words(lines):
text = '\n'.join(lines).lower().split()
return text
def get_valid_words(fname):
return words(lines_between_markers(lines_from_file(fname)))
# This must be executed
filename = "valid.txt"
all_words = get_valid_words(filename)
print(filename, "loaded ok.")
print("{} valid words found.".format(len(all_words)))
print("word list:")
print("\n".join(all_words))
Here's my output:
File "C:/Users/jj.py", line 45, in <module>
text = '\n'.join(lines).lower().split()
builtins.TypeError: sequence item 0: expected str instance, list found
Here's the expected output:
valid.txt loaded ok.
73 valid words found.
word list:
toby's
code
was
rather
interesting
it
had
the
following
issues
short
meaningless
identifiers
such
as
n
and
n
deep
complicated
nesting
a
doc-string
drought
very
long
rambling
and
unfocused
functions
not
enough
spacing
between
functions
inconsistent
spacing
before
and
after
operators
just
like
this
here
boy
was
he
going
to
get
a
low
style
mark
let's
hope
he
asks
his
friend
bob
to
help
him
bring
his
code
up
to
an
acceptable
level
I need help with getting my code to work. Any help is appreciated.

lines_between_markers(lines_from_file(fname))
gives you a list of list of valid words.
So you just need to flatten it :
def words(lines):
words_list = [w for line in lines for w in line]
return words_list
Does the trick.
But I think that you should review the design of your program :
lines_between_markers should only yield lines between markers, but it does more. Regexp should be use on the result of this function and not inside the function.
What you didn't do :
Ensure that the line string is lower case before using the regular expression.
Use the optional encoding parameter when opening files for reading.
That is your open file call should look like open(filename,
encoding='utf-8').

Accept multiple files in parameter using args python

I need to be able to import and manipulate multiple text files in the function parameter. I figured using *args in the function parameter would work, but I get an error about tuples and strings.
def open_file(*filename):
file = open(filename,'r')
text = file.read().strip(punctuation).lower()
print(text)
open_file('Strawson.txt','BigData.txt')
ERROR: expected str, bytes or os.PathLike object, not tuple
How do I do this the right way?

When you use the *args syntax in a function parameter list it allows you to call the function with multiple arguments that will appear as a tuple to your function. So to perform a process on each of those arguments you need to create a loop. Like this:
from string import punctuation
# Make a translation table to delete punctuation
no_punct = dict.fromkeys(map(ord, punctuation))
def open_file(*filenames):
for filename in filenames:
print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
print(text)
print()
#test
open_file('Strawson.txt', 'BigData.txt')
I've also included a dictionary no_punct that can be used to remove all punctuation from the text. And I've used a with statement so each file will get closed automatically.
If you want the function to "return" the processed contents of each file, you can't just put return into the loop because that tells the function to exit. You could save the file contents into a list, and return that at the end of the loop. But a better option is to turn the function into a generator. The Python yield keyword makes that simple. Here's an example to get you started.
def open_file(*filenames):
for filename in filenames:
print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
yield text
def create_tokens(*filenames):
tokens = []
for text in open_file(*filenames):
tokens.append(text.split())
return tokens
files = '1.txt','2.txt','3.txt'
tokens = create_tokens(*files)
print(tokens)
Note that I removed the word.strip(punctuation).lower() stuff from create_tokens: it's not needed because we're already removing all punctuation and folding the text to lower-case inside open_file.
We don't really need two functions here. We can combine everything into one:
def create_tokens(*filenames):
for filename in filenames:
#print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
yield text.split()
tokens = list(create_tokens('1.txt','2.txt','3.txt'))
print(tokens)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiple search string function - python

Just declare string2=None, then: if str1 in line: .... if str2 is not None and str2 in line: ....

Related

Is there a way to reverse the order of lines within a text file using a function in python?

python open csv search for pattern and strip everything else

How to add numbers from a file into a list?

Regular Expression to find valid words in file

Accept multiple files in parameter using args python

Categories

Resources