Split text file into lines by key word python - python

I have a large text file I have imported in python and want to split into lines by a key word, then use those lines to take out relevent information into a dataframe.
The data follows along the same pattern for each line but wont be the exact same number of characters and some lines may have extra data
So I have a text file such as:
{data: name:Mary, friends:2, cookies:10, chairs:4},{data: name:Gerald friends:2, cookies:10, chairs:4, outside:4},{data: name:Tom, friends:2, cookies:10, chairs:4, stools:1}
There is always the key word data between lines, is there any way I can split it out by using this word as the beginning of the line (then put it into a dataframe)?
I'm not sure where to begin so any help would be amazing

When you get the content of a .txt file like this...
with open("file.txt", 'r') as file:
content = file.read()
...you have it as a string, so you can split it with the function str.split():
content = content.split(my_keyword)
You can do it with a function:
def splitter(path: str, keyword: str) -> str:
with open(path, 'r') as file:
content = file.read()
return content.split(keyword)
that you can call this way:
>>> splitter("file.txt", "data")
["I really like to write the word ", ", because I think it has a lot of meaning."]

Related

How to check each line of a text file for a string and print all lines with occurrences of that word to a new text file?

I am currently trying to write a script that reads a text file line by line, and transfers all lines with an occurrences of a specific str, for example (if the line has the string 'apple' or 'Hello World'), to a new text file.
fpath = open('Redshift_twb.txt', 'r')
lines = fpath.readlines()
fpath_write = open('Redshift_new.txt', 'w+')
print(lines[0:10])
fpath_write.write(lines[0:10])
fpath.close()
fpath_write.close()
Currently, I just have the basics set up, but I am stuck because when I use .readlines() it adds all the lines to a list at separate indexes. Then when I try to write back to a new file, I get an error message:
fpath_write.write(lines[0:10])
TypeError: write() argument must be str, not list
Because you cannot use the function .write() on a list.
I know this is a two part question, but any help at all would be greatly appreciated.
Read the lines in the input file
Filter the list to find the lines that you need to write to the output file
join whose lines with \n - to stitch together a single string
Write that string to the output file
fpath = open('Redshift_twb.txt', 'r')
lines = fpath.readlines()
fpath_write = open('Redshift_new.txt', 'w+')
print(lines[0:10])
# filter the list; with the string 'apple'
# replace 'apple' with whatever string you want to find
out_lines = [line for line in lines if 'apple' in line]
# Join the lines into a single string
output = '\n'.join(out_lines)
# write it
fpath_write.write(output)
fpath.close()
fpath_write.close()
Replace your fpath_write.write(lines[0:10]) with this fpath_write.write(''.join(line for line in lines[0:10] if 'content1' in line))

read keyword in txt file, and print add text + keyword

I got many keywords in txt file to python using f = open().
And I want to add text before each keywords.
example,
(http://www.google.com/) + (abcdefg)
add text keywords imported
It have tried it, I can't result I want.
f = open("C:/abc/abc.txt", 'r')
data = f.read()
print("http://www.google.com/" + data)
f.close()
I tried it using "for".
But, I can't it.
Please let me know the solution.
many thanks.
Your original code has some flaws:
you only read the first line of the file, with data = f.read(). If you want to read all the lines from the file, use a for;
data is a str-type variable, which may have more than one word. Thus, you must split this line into words, using data.split()
To solve your problem, you need to read each line from the file, split the line into the words it has, then loop through the list with the words, add the desired text then the word itself.
The correct program is this:
f = open("C:/abc/abc.txt", 'r')
for data in f:
words = data.split()
for i in words:
print("http://www.google.com/" + i)
f.close()
with open('text.txt','r') as f:
for line in f:
print("http://www.google.com/" + line)

Extract chunks of text from document and write them to new text file

I have a large file text file that I want to read several lines of, and write these lines out as one line to a text file. For instance, I want to start reading in lines at a certain start word, and end on a lone parenthesis. So if my start word is 'CAR' I would want to start reading until a one parenthesis with a line break is read. The start and end words are to be kept as well.
What is the best way to achieve this? I have tried pattern matching and avoiding regex but I don't think that is possible.
Code:
array = []
f = open('text.txt','r') as infile
w = open(r'temp2.txt', 'w') as outfile
for line in f:
data = f.read()
x = re.findall(r'CAR(.*?)\)(?:\\n|$)',data,re.DOTALL)
array.append(x)
outfile.write(x)
return array
What the text may look like
( CAR: *random info*
*random info* - could be many lines of this
)
Using regular expression is totally fine for these type of problems. You cannot use them when your pattern contains recursion, like get the content from the parenthesis: ((text1)(text2)).
You can use the following regular expression: (CAR[\s\S]*?(?=\)))
See explanation...
Here you can visualize your regular expression...
We can match the text you're interested in using the regex pattern: (CAR.*)\) with flags gms.
Then we just have to remove the newline characters from the resulting matches and write them to a file.
with open("text.txt", 'r') as f:
matches = re.findall(r"(CAR.*)\)", f.read(), re.DOTALL)
with open("output.txt", 'w') as f:
for match in matches:
f.write(" ".join(match.split('\n')))
f.write('\n')
The output file looks like this:
CAR: *random info* *random info* - could be many lines of this
EDIT:
updated code to put newline between matches in output file

Python: How to insert tab and format text file

Bringing in a text file that is formatted like below:
hammer#9.95
saw#20.15
shovel#35.40
I need to bring it into python and format it so that it is in line with an existing snippet of code:
# display header line for items list
print('{0: <10}'.format('Item'), '{0: >17}'.format('Cost'), sep = '' )
The goal is for the text file to be in line with existing headers like so:
Item Cost
hammer $9.95
saw $20.15
shovel $35.4
I can bring in the text file into Python and get replace the # sign with a $ sign:
file = open('Invoice.txt', 'r')
file_contents = file.read()
new_file_contents = file_contents.replace('#', '$')
Which gives me this output:
hammer$9.95
saw$20.15
shovel$35.40
but I'm having trouble with the formatting aspect. Any suggestions?
You can do something like this:
with open(file,'rt',encoding='utf-8') as infile:
for line in infile:
print("{:<6} {}".format(line.strip().split('#')[0],"$"+line.strip().split("#")[1]))
only problem is that it'll look ugly if you have a longer word than hammer. I suggest finding the largest word in your list first, then using that as the limiter for the {:<6}.

(Pig-Latin converter read from .txt file) How to split contents of file into lines AND word by word? Python

I'm writing a piglatin converter in Python that takes a txt file and translates it line by line and outputs to another textfile in piglatin. It works properly except when I'm reading multiple lines, I need to have it output exactly the same.
code that splits the file between spaces
def getWords(vowels, file):
listOfW = file.read().split()
return listOfW
text inside notepad file:
if beast student
away
Converted: ['ifway', 'eastbay', 'tudentsay', 'awayway']
current output: (should be on two lines)
ifway eastbay tudentsay awayway
What it should look like:
ifway eastbay tudentsay
awayway
getWords is just a function I used to get a list and then I convert them with another function
thanks for any help!
Assuming you already have such a function:
def convertToPigLatin(word):
# returns converted string
You can open both your reading and writing file. Then iterate over each line of the in-file, split the line on whitespace, and convert each word in a list comprehension. Then you can write out that line to the out-file.
with open(infile, 'r') as fIn, open(outfile, 'w') as fOut:
for line in fIn:
convertedWords = [convertToPigLatin(word) for word in line.split()]
fOut.write(' '.join(convertedWords) + '\n')

Categories