I need to be able to import and manipulate multiple text files in the function parameter. I figured using *args in the function parameter would work, but I get an error about tuples and strings.
def open_file(*filename):
file = open(filename,'r')
text = file.read().strip(punctuation).lower()
print(text)
open_file('Strawson.txt','BigData.txt')
ERROR: expected str, bytes or os.PathLike object, not tuple
How do I do this the right way?
When you use the *args syntax in a function parameter list it allows you to call the function with multiple arguments that will appear as a tuple to your function. So to perform a process on each of those arguments you need to create a loop. Like this:
from string import punctuation
# Make a translation table to delete punctuation
no_punct = dict.fromkeys(map(ord, punctuation))
def open_file(*filenames):
for filename in filenames:
print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
print(text)
print()
#test
open_file('Strawson.txt', 'BigData.txt')
I've also included a dictionary no_punct that can be used to remove all punctuation from the text. And I've used a with statement so each file will get closed automatically.
If you want the function to "return" the processed contents of each file, you can't just put return into the loop because that tells the function to exit. You could save the file contents into a list, and return that at the end of the loop. But a better option is to turn the function into a generator. The Python yield keyword makes that simple. Here's an example to get you started.
def open_file(*filenames):
for filename in filenames:
print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
yield text
def create_tokens(*filenames):
tokens = []
for text in open_file(*filenames):
tokens.append(text.split())
return tokens
files = '1.txt','2.txt','3.txt'
tokens = create_tokens(*files)
print(tokens)
Note that I removed the word.strip(punctuation).lower() stuff from create_tokens: it's not needed because we're already removing all punctuation and folding the text to lower-case inside open_file.
We don't really need two functions here. We can combine everything into one:
def create_tokens(*filenames):
for filename in filenames:
#print('FILE', filename)
with open(filename) as file:
text = file.read()
text = text.translate(no_punct).lower()
yield text.split()
tokens = list(create_tokens('1.txt','2.txt','3.txt'))
print(tokens)
Related
I currently have the below code in Python 3.x:-
lst_exclusion_terms = ['bob','jenny', 'michael']
file_list = ['1.txt', '2.txt', '3.txt']
for f in file_list:
with open(f, "r", encoding="utf-8") as file:
content = file.read()
if any(entry in content for entry in lst_exclusion_terms):
print(content)
What I am aiming to do is to review the content of each file in the list file_list. When reviewing the content, I then want to check to see if any of the entries in the list lst_exclusion_terms exists. If it does, I want to remove that entry from the list.
So, if 'bob' is within the content of 2.txt, this will be removed (popped) out of the list.
I am unsure how to replace my print(content) with the command to identify the current index number for the item being examined and then remove it.
Any suggestions? Thanks
You want to filter a list of files based on whether they contain some piece(s) of text.
There is a Python built-in function filter which can do that. filter takes a function that returns a boolean, and an iterable (e.g. a list), and returns an iterator over the elements from the original iterable for which the function returns True.
So first you can write that function:
def contains_terms(filepath, terms):
with open(filepath) as f:
content = f.read()
return any(term in content for term in terms)
Then use it in filter, and construct a list from the result:
file_list = list(filter(lambda f: not contains_terms(f, lst_exclusion_terms), file_list))
Of course, the lambda is required because contains_terms takes 2 arguments, and returns True if the terms are in the file, which is sort of the opposite of what you want (but sort of makes more sense from the point of view of the function itself). You could specialise the function to your use case and remove the need for the lambda.
def is_included(filepath):
with open(filepath) as f:
content = f.read()
return all(term not in content for term in lst_exclusion_terms)
With this function defined, the call to filter is more concise:
file_list = list(filter(is_included, file_list))
I've had a desire like this before, where I needed to delete a list item when iterating over it. It is often suggested to just recreate a new list with the contents you wanted as suggested here
However, here is a quick and dirty approach that can remove the file from the list:
lst_exclusion_terms = ['bob','jenny', 'michael']
file_list = ['1.txt', '2.txt', '3.txt']
print("Before removing item:")
print(file_list)
flag = True
while flag:
for i,f in enumerate(file_list):
with open(f, "r", encoding="utf-8") as file:
content = file.read()
if any(entry in content for entry in lst_exclusion_terms):
file_list.pop(i)
flag = False
break
print("After removing item")
print(file_list)
In this case, file 3.txt was removed from the list since it matched the lst_exclusion_terms
The following were the contents used in each file:
#1.txt
abcd
#2.txt
5/12/2021
#3.txt
bob
jenny
michael
Please consider the following list:
l = ['pfg022G', 'pfg022T', 'pfg068T', 'pfg130T', 'pfg181G', 'pfg181T', 'pfg424G', 'pfg424T']
and the file:
example.conf
"flowcell_unmapped_bams": ["/groups/cgsd/alexandre/gatk-workflows/src/ubam/pfg022G.unmapped.bam"],
"unmapped_bam_suffix": ".unmapped.bam",
"sample_name": "pfg022G",
"base_file_name": "pfg022G.GRCh38DH.target"
I would like to create a function that reads through every element of the list and looks into the file for that pattern and substitutes the pattern with the subsequent element of that list. For example the first element of the list is pfg022G, read through the file example.conf and search for pfg022G , once found replace to pdf022T.
Two functions, for readability. You can surely combine them into one single one.
def replace_words(words, content):
"""List of words is supposed to have even items
Items are extracted in pairs: Find the first word
in content and replace with next word
"""
_iterator = iter(words)
for _find in _iterator:
_replace = next(_iterator)
content = content.replace(_find, _replace)
return content
def rewrite_file(file, words):
""" Open the file to modify, read its content
then apply the replace_words() function. Once
done, write the replaced content back to the
file. You could compact them into one single
function.
"""
content = open(file, 'r').read()
with open(file, 'w') as f:
f.write(replace_words(words, content))
FILENAME = 'example.conf'
l = ['pfg022G', 'pfg022T', 'pfg068T', 'pfg130T', 'pfg181G', 'pfg181T', 'pfg424G', 'pfg424T']
rewrite_file(FILENAME, l)
I'm doing a python decryting program for a school project.
So first of all, i have a function who takes a file as argument. Then i must take all the line by line and return a tuple.
This file containt 3 things : -a number(whatever it's), -the decrypted text, -the crypted text)
import sys
fileName = sys.argv[-1]
def load_data(fileName):
tuple = ()
data = open(fileName, 'r')
content = data.readlines()
for i in contenu:
tuple += (i,)
return tuple #does nothing why?
print(tuple)
load_data(fileName)
Output:
('13\n', 'mecanisme chiffres substituer\n', "'dmnucmnn gmnuaetiihmnunofrutfrmhamprmnunshusfua f ludmuaoccsfta rtofumruvosnu vmzul ur aemudmulmnudmaetiihmhulmnucmnn gmnuaetiihmnunofrudtnpoftblmnunosnul uiohcmudusfurmxrmuaofnrtrsmudmulmrrhmnuctfsnaslmnun fnu aamfrumrudmua h armhmnubl fanuvosnun vmzuqsmulmucma ftncmudmuaetiihmcmfrusrtltnmuaofntnrmu unsbnrtrsmhulmnua h armhmnudsucmnn gmudmudmp hrup hudu srhmnumfuhmnpmar frusfudtartoff thmudmuaetiihmcmfr'")
Output needed:
(13,'mecanisme chiffres substituer','dmnucmnn gmnuaetiihmnunofrutfrmhamprmnunshusfua f ludmuaoccsfta rtofumruvosnu vmzul ur aemudmulmnudmaetiihmhulmnucmnn gmnuaetiihmnunofrudtnpoftblmnunosnul uiohcmudusfurmxrmuaofnrtrsmudmulmrrhmnuctfsnaslmnun fnu aamfrumrudmua h armhmnubl fanuvosnun vmzuqsmulmucma ftncmudmuaetiihmcmfrusrtltnmuaofntnrmu unsbnrtrsmhulmnua h armhmnudsucmnn gmudmudmp hrup hudu srhmnumfuhmnpmar frusfudtartoff thmudmuaetiihmcmfr')
The tuple need to be like this (count,word_list,crypted), 13 as count and so on..
If someone can help me it would be great.
Sorry if i'm asking wrongly my question..
You could try this to avoid the '\n' characters at the end
import sys
fileName = sys.argv[-1]
def load_data(fileName):
tuple = ()
data = open(fileName, 'r')
content = data.readlines()
for i in content:
tuple += (i.strip(''' \n'"'''),)
return tuple
print(load_data(fileName));
Note that a function ends when ever it finds a return statement, if you want to print the value of tuple do the before return statement or print the returned value.
I am a little confused about what the file in question looks like, but from what I could infer from the output you got the file appears to be something like this:
some number
decrypted text
encrypted text
If so, the most straightforward way to do this would be
with open('lines.txt','r') as f:
all_the_text = f.read()
list_of_text = all_the_text.split('\n')
tuple_of_text = tuple(list_of_text)
print(tuple_of_text)
Explanation:
The open built-in function creates an object that allows you to interact with the file. We use open with the argument 'r' to let it know we only want to read from the file. Doing this within a with statement ensures that the file gets closed properly when you are done with it. The as keyword followed by f tells us that we want the file object to be placed into the variable f. f.read() reads in all of the text in the file. String objects in python contain a split method that will place strings separated by some delimiter into a list without placing the delimiter into the separated strings. The split method will return the results in a list. To put it into a tuple, simply pass the list into tuple.
I have below file contents
apples:100
books:100
pens:200
banana:300
I have below code to search string in file:
def search_string(file_search, search_string):
search_output = []
with open(file_search) as f:
for line in f:
if search_string in line:
search_output.append(line)
return search_output
To search apples:
search_string("filename", "apples")
Some cases, I have to search two or three strings depends on requirement in same file, so I need to write separate functions, or can we achieve in same function. If same function can any one help
for two string search I have below code:
def search_string2(file_search, search_string1, search_strin2):
search_output = []
with open(file_search) as f:
for line in f:
if search_string1 in line or search_string2 in line:
search_output.append(line)
return search_output
You can achieve this in a single function with varargs, by naming an argument with a preceding *, which collects all additional positional arguments into a tuple under that name. Then you use any with a generator expression to generalize the test to cover an unknown number of search strings:
def search_string(file_search, *search_strings):
search_output = []
with open(file_search) as f:
for line in f:
if any(searchstr in line for searchstr in search_strings):
search_output.append(line)
return search_output
This is fine for a smallish number of search strings, but if you need to handle a large volume of search strings, scanning a huge input or many small inputs, I'd suggest looking at more advanced optimizations, e.g. Aho-Corasick.
Just declare string2=None, then:
if str1 in line:
....
if str2 is not None and str2 in line:
....
I have to define a function names Correct(), which has two parameters. A list of strings to be tested for misspelled words(this parameter will be my file), and my dictionary, which I have already made called mydictionary.
The function should test each word in the first list using the dictionary and the check word() function. correct should return a list of words with all the misspelled words replaced.
For example, correct([‘the’, ‘cheif’, ‘stopped’, ‘the’, ‘theif’]) should return the list [‘‘the’, ‘chief’, ‘stopped’, ‘the’, ‘thief’]
Then I'm supposed to test the function in the main()
import string
# Makes a function to read a file, use empty {} to create an empty dict
# then reads the file by using a for loop
def make_dict():
file = open ("spellingWords.txt")
dictionary = {}
for line in file:
# Splits the lines in the file as the keys and values, and assigning them as
# misspell and spell
misspell, spell = string.split(line.strip())
# Assigns the key to the value
dictionary[misspell] = spell
file.close()
return dictionary
mydictionary = make_dict()
#print mydictionary
# Gets an input from the user
word = raw_input("Enter word")
# Uses the dictionary and the input as the parameters
def check_word(word,mydictionary):
# Uses an if statement to check to see if the misspelled word is in the
# dictionary
if word in mydictionary:
return mydictionary[word]
# If it is not in the dictionary then it will return the word
else:
return word
# Prints the function
print check_word(word,mydictionary)
def main():
file2 = open ("paragraph.txt")
file2.read()
thelist = string.split(file2)
print thelist
def correct(file2,mydictionary):
return thelist
paragraph = string.join(thelist)
print paragraph
main()
All my other functions work besides my Correct() function and my Main() function. It is also giving me the error 'file object has no attribute split'. Can I get help with my mistakes?
so I corrected it and now have for my main function
def main():
file2 = open ("paragraph.txt")
lines = file2.read()
thelist = string.split(lines)
def correct(file2,mydictionary):
while thelist in mydictionary:
return mydictionary[thelist]
paragraph = string.join(thelist)
print correct(file2,mydictionary)
however I am now getting the error 'unhashable type: 'list''
You are not understanding the concept of reading file object. You have to store the value that the method read() returns so that you can use it later.
open simply returns a file object, not the lines of a file. To read spy tuff, you can simply use methods to do things. See the docs
To read the file:
lines = file.read()
# do something with lines - it's a super big string
Also, you don't have to import string, the split() method is already built in with strings.