How to write a file and eliminate lines? - python

I want to read from a file various lines like this for example:
hello I live in London.
hello I study.
And then based on what is the first word I want to remove the line from the file.
Can I put which sentence in a array?

You can read in the entire contents of the file into memory (into a list), choose which lines you wish to keep, and write those a new file (you can replace the old one if you wish).
For example:
old_lines = open("input.txt",'r').readlines()
new_lines = []
for line in old_lines:
words = line.split()
if words[0] == 'hello': # if the first word is "hello", keep it.
new_lines.append(line)
f = open("output.txt",'w')
for line in new_lines:
f.write(line)

Related

how can i read each word of a line from a file separately?

so the problem I came across is that I need to read each word of a line from a line one by one and repeat it for the whole file. each of the words are separated from each other by the # sign, e.g
2016/2017#Southeast_Kootenay#Mount_Baker_Secondary#STANDARD#COURSE_MARKS#99.0#71.0#88.0#49.0
after that I need to assign each value to the appropriate element of a class, for example:
school_years would be 2016/2017, district_name would be Southeast_Kootenay and etc.
the thing is that I have clue how to do it, I managed to extract the first word from a file but couldn't do it for the whole line and let alone the whole file, this is the code I used.
def word_return():
for lines in file:
for word in lines.split('#'):
return word
any kind of help would be appreciated
You're returning a single word. Remove last for and return the entire list like this if you want to get only the first line:
(Assuming file is a list of lines resulted from file = open("file.txt", "r").readlines())
def word_return():
for line in open("yourFile.txt", "r").readlines():
return lines.split('#')
If you want to return a list that will contain a list for each line, check the following:
def word_return():
allLines = []
for line in open("yourFile.txt", "r").readlines():
allLines.append(lines.split('#'))
return allLines

Random substitution

I have a txt file and a dictionary, where keys are adjectives, values are their synonyms. I need to replace the common adjectives from the dictionary which I meet in a given txt file with their synonyms - randomly! and save both versions - with changed and unchanged adjectives - line by line - in a new file(task3_edited_text). My code:
#get an English text as a additional input
filename_eng = sys.argv[2]
infile_eng = open(filename_eng, "r")
task3_edited_text = open("task3_edited_text.txt", "w")
#necessary for random choice
import random
#look for adjectives in English text
#line by line
for line in infile_eng:
task3_edited_text.write(line)
line_list = line.split()
#for each word in line
for word in line_list:
#if we find common adjectives, change them into synonym, randomly
if word in dict.keys(dictionary):
word.replace(word, str(random.choice(list(dictionary.values()))))
else:
pass
task3_edited_text.write(line)
Problem is in the output adjectives are not substituted by their values.
line_list = line.split()
...
task3_edited_text.write(line)
The issue is that you try to modify line_list, which you created from line. However, line_list is simply a list made from copying values generated from line ; modifying it doesn't change line in the slightest. So writing line to the file writes the unmodified line to the file, and doesn't take your changes into account.
You probably want to generate a line_to_write from line_list, and writing it to the file instead, like so:
line_to_write = " ".join(line_list)
task3_edited_text.write(line_to_write)
Also, line_list isn't even modified in your code as word is a copy of an element in line_list and not a reference to the original. Moreover, replace returns a copy of a string and doesn't modify the string you call it on. You probably want to modify line_list via the index of the elements like so:
for idx, word in enumerate(line_list):
#if we find common adjectives, change them into synonym, randomly
if word in dict.keys(dictionary):
line_list[idx] = word.replace(word, str(random.choice(list(dictionary.values()))))
else:
pass

How to efficiently split a text file according to certain characters?

I have recently started learning Python3, solely trying to improve efficiency for my work. And this may be possibly an extremely basic question.
I know for strings, we can use str.split to split the string into pieces,
according to a given character.
But how might I go for this.
With a file bigfile.txt, some of the lines say
some intro lines xxxxxx
sdafiefisfhsaifdijsdjsia
dsafdsifdsiod
\item 12478621376321748324
sdfasfsdfafda
\item 23847328412834723
uduhfavfduhfu
sduhfhaiuesfhseuif
lots and other lines
\item 328347848732
pewprpewposdp
everthing up to and inclued this line
and the blank line too
some end lines dsahudfuha
dsfdsfdsf
What's of interest are the lines starting with \item xxxxx and afterwards, before another \item xxxxx
How to efficiently split bigfile.txt so I have the following:
bigfile_part1.txt which contains
\item 12478621376321748324
sdfasfsdfafda
bigfile_part2.txt which contains
\item 23847328412834723
uduhfavfduhfu
sduhfhaiuesfhseuif
lots and other lines
bigfile_part3.txt which contains
\item 328347848732
pewprpewposdp
everthing up to and inclued this line
and the blank line too
ignoring the intro lines as well as the end lines.
Moreover, how can I apply this function to split batch files, say
bigfile2.txt
bigfile3.txt
bigfile4.txt
in exactly the same way.
You can use itertools.groupby to carve up the file. groupby creates subiterators whenever a condition changes. In your case that's whether a line starts with "\item ".
import itertools
records = []
record = None
for key, subiter in itertools.groupby(open('thefile'),
lambda line: line.startswith("\item ")):
if key:
# in a \item group, which has 1 line
item_id = next(subiter).split()[1]
record = {"item_id":item_id}
else:
# in the the value subgroup
if record:
record["values"] = [line.strip() for line in subiter]
records.append(record)
for record in records:
print(record)
As for processing multiple files, you could put that into a function to be called once per file. Then its a question of getting the file list. Perhaps glob.glob("some/path/big*.txt").
Since it's a big file, instead of reading entire file into a string, let us try reading the file line by line.
import sys
def parseFromFile(filepath):
parsedListFromFile = []
unended_item = False
with open(filepath) as fp:
line = fp.readline()
while line:
if line.find("\item")!=-1 or unended_item:
if line.find("\item") != -1: #says that there is \item present in line
parsedListFromFile.append("\item"+line.split("\item")[-1])
unended_item=True
else:
parsedListFromFile[-1]+=line.split("\item")[-1]
line = fp.readline()
#write each item of parseListFromFile to file
for index, item in enumerate(parsedListFromFile):
with open(filepath+str(index)+".txt", 'w') as out:
out.write(item + '\n')
def main():
#assuming you run script like this: pythonsplit.py myfile1.txt myfile2.txt ...
paths = sys.argv[1:] #this gets all cli args after pythonsplit.py
for path in paths:
parseFromFile(path) #call function for each file
if __name__ == "__main__": main()
*Assuming one line only has one \item in it.
*This doesn't ignore the end line. You can put an if or just manually remove it from the last file.
Another approach to split based on newline characters,
import re
text = """some intro lines xxxxxx
sdafiefisfhsaifdijsdjsia
dsafdsifdsiod
\item 12478621376321748324
sdfasfsdfafda
...
"""
# split by newline characters
for i, j in enumerate(re.split('\n{2,}', text)):
if j.startswith("\item"):
print(f"bigfile{i}.txt", j, sep="\n") # dump to file here
bigfile1.txt
\item 12478621376321748324
sdfasfsdfafda
bigfile2.txt
\item 23847328412834723
uduhfavfduhfu
sduhfhaiuesfhseuif
lots and other lines
bigfile3.txt
\item 328347848732
pewprpewposdp
everthing up to and inclued this line
and the blank line too

Read each line from a file and if that line length is smaller than 9 add that line to an array

words = []
for line in f:
if len(line) <= 9:
words.append(line)
#words = f.readlines(250000)
f.close()
return words
I am trying to read each line from a text file which contains one word. I want to be able to compare the length of that word to a condition and if it meets that condition then add it to a list to save the words that are under 9 characters long. The code should go through the entire file and the words that are under 9 characters should be added to the array called words. I tried using f.readlines()but I dont know how to filter the results as this just gives all of the words in the file.
You can use file.readlines as this:
with open('path/to/file') as f:
for line in f.readlines():
if len(line.strip()) <= 9:
words.append(line)
see that using context manager to open file is a good practice so you also dont need to close it at the end and you wont forget to :)

Appending Data In a Specific Line of Text in a File Python

Suppose I have a file like this:
words
words
3245, 3445,
345634, 345678
I am wondering if it is possible to add data onto the 4th line of the code so the out put is this:
words
words
3245, 3445, 67899
345634, 345678
I found a similar tutorial: appending a data in a specific line of a text file in Python?
but the problem is I don't want to use .startswith because the files will all have different beginnings.
Thanks for your help!
You can achieve that by doing this
# define a function so you can re-use it for writing to other specific lines
def writetoendofline(lines, line_no, append_txt):
lines[line_no] = lines[line_no].replace('\n', '') + append_txt + '\n'
# open the file in read mode to read the current input to memory
with open('./text', 'r') as txtfile:
lines = txtfile.readlines()
# in your case, write to line number 4 (remember, index is 3 for 4th line)
writetoendofline(lines, 3, ' 67899')
# write the edited content back to the file
with open('./text', 'w') as txtfile:
txtfile.writelines(lines)
# close the file
txtfile.close()

Categories