highlighting and saving multiple words in a single string using python - python

I'm pretty new to programming and I've been having a problem: I have a txt file with a long (30,000+ chars) single string made of 4 letters (a DNA sequence) and I need to search that file for certain repeats (for example 'TTAGGG'), highlight them and save as a simple readable file. obviously I can't save it as a txt file because there is no highlight option.
I tried html as well as docx but every search I try removes the previous highlights.
Does anyone have any suggestions?

Open it in MS-Word, and do following Find-and-replace:

Related

How to save data to a file on separate items instead of one long string?

I am having trouble simply saving items into a file for later reading. When I save the file, instead of listing the items as single items, it appends the data together as one long string. According to my Google searches, this should not be appending the items.
What am I doing wrong?
Code:
with open('Ped.dta','w+') as p:
p.write(str(recnum)) # Add record number to top of file
for x in range(recnum):
p.write(dte[x]) # Write date
p.write(str(stp[x])) # Write Steps number
Since you do not show your data or your output I cannot be sure. But it seems you are trying to use the write method like the print function, but there are important differences.
Most important, write does not follow its written characters with any separator (like space by default for print) or end (like \n by default for print).
Therefore there is no space between your data and steps number or between the lines because you did not write them and Python did not add them.
So add those. Try the lines
p.write(dte[x]) # Write date
p.write(' ') # space separator
p.write(str(stp[x])) # Write Steps number
p.write('\n') # line terminator
Note that I do not know the format of your "date" that is written, so you may need to convert that to text before writing it.
Now that I have the time, I'll implement #abarnert's suggestion (in a comment) and show you how to get the advantages of the print function and still write to a file. Just use the file= parameter in Python 3, or in Python 2 after executing the statement
from __future__ import print_function
Using print you can do my four lines above in one line, since print automatically adds the space separator and newline end:
print(dte[x], str(stp[x]), file=p)
This does assume that your date datum dte[x] is to be printed as text.
Try adding a newline ('\n') character at the end of your lines as you see in docs. This should solve the problem of 'listing the items as single items', but the file you create may not be greatly structured nonetheless.
For further of your google searches you may want to check serialization, as well as json and csv formats, covered in python standard library.
You question would have befited if you gave very small example of recnum variable + original f.close() is not necessary as you have a with statement, see here at SO.

Saving my output code to a separate file?

identify the individual words in a sentence and store them in a list
create a list of positions for words in that list
save these lists as a single file or as separate files.
How would i save my output to a separate file?
This is my code
file=open("filename.txt","w")
file.write(*map(position.__getitem__, words)) #Your last line
file.close()
By the way, don't share your code through an image and use markdown code formatting like above.
I recognise the question you are trying to answer from GSCE Computing (UK).
Aside from your request as to how to store the words in a file, you use a confusing way to save the position of the words.
Using position[word] = len(position) + 1 is saving the position of the word in the list of words, not the position in the original sentence. In fact, the number stored against the word in position is not used, since you map the word against the words list to get the position.
Writing to a file is not difficult, do a search of Stackoverflow.

Can i format txt files from within python

now pleased don't get me wrong on this, but im just curious whether I can get a text file and then find out how many lines within that text file have been written on, and thus use that number to print selective data from every few lines. Also could I use python to find specific words within the text file that are evenly apart for example within the text file if everything was written like this
name:> Ben
Score:> 2
name:> Ethan
Score:> 8
name:> James
Score:> 0
would it be possible for me to search the text file, for the string 'name:>' (and then save whatever comes infront of it, if possible to a variable) or seeing as they're all equally spaced could I save the specific score of one person to a variable with their name (as everything in front would be equally spaced), without having to open the txt file at all.
If all of this sounds completely impossible or if any of you have received any vague ideas as to what im talking about (in which case im in awe of your abilities of comprehension from this badly worded example), please give me any thoughts or ideas on how to format text files to create variables.
if all the above seems too complex could someone please just tell me wether its possible to analyse how many lines within a text file have been written on, from there ive got a vague idea on how to create my program.
You can use regular expression (RE) to search the text file as a string, then find out where the existing value is you want to change in the text file and write it.
https://docs.python.org/2/library/re.html
To do what you are asking, I would personally use the built-in re module, as follows:
import re
with open("foo.txt", "r") as foo:
contents = foo.read()
results = re.search("foo-bar", contents).group()
print(results)
That should do what you are looking for.

Python - Dividing a book in PDF form into individual text files that correspond with page numbers

I've converted my PDF file into a long string using PDFminer.
I'm wondering how I should go about dividing this string into smaller, individual strings/pages. Each page is divided by a certain series of characters (CRLF, FF, page number etc), and the string should be split and appended to a new text file according to these characters occurring.
I have no experience with regex, but is using the re module the best way to go about this?
My vague idea for implementation is that I have to iterate through the file using the re.search function, creating text files with each new form feed found. The only code I have is PDF > text conversion. Can anyone point me in the right direction?
Edit: I think the expression I should use is something like ^.*(?=(\d\n\n\d\n\n\f\bFavela\b)) (capture everything before 2 digits, the line breaks and the book's title 'Favela' which appears on top of each page.
Can I save these \d digits as variables? I want to use them as file names, as I iterate through the book and scoop up the portions of text divided by each appearance of \f\Favela.
I'm thinking the re.sub method would do it, looping through and replacing with an empty string as I go.

Python: How to read text between two empty lines into a string

I'm a beginner at programming and Python, and I'm writing a script to do stuff with .srt subtitle files. My problem is that I don't know how to: read through a file, and analyze text first between the beginning of the text and the first empty line and then between that empty line and the next empty line till the end of the file ("analyze" by e.g. calculate the length of a part of it, convert another part to numbers etc.).
You can read about the .srt format specification and see an example here (type: Plain); there's an empty line at the end of the file. I want to compare the display time/duration of each subtitle against the number of characters in it. Starting from the beginning of the file, each subtitle (with its number, duration info and text) is separated from the next one by an empty line (a "\n", I can find them with sth like if "\n" in line and len(line) == 2:). The time codes always contain a "-->" and always end in three digits, so if I have that in a string, I can figure out where it is. The problem is, I need to somehow do the following:
Read the subtitle text, which can be 1-3 lines with line breaks, calculate its character length.
Read the duration, convert to duration in seconds.
Read the line number (to be able to output it somewhere with my results, e.g. "duration of line 44 is 4.54 s").
I can do the second easily, but I'm not sure how to go over the whole file and tell Python: find the end of each subtitle's text, calculate the length of characters in each line, add that, read the duration, divide these, output this with the line number, and do the same with the next subtitle until you reach the end of the file. If it was one subtitle, I could do it easily, but I'm not sure how to do that check on a single one and then seek the next one. I've been looking for 2 hours for this and can't find anything like that.
Regular Expressions can be a powerful tool to help solve this type of processing.
You can use a regular expression to match or parse a single record or against the entire file.
If you don't know about Regex in python, I highly recommend you do some tutorials on the topic... and that should give you plenty of ideas how it can be applied to your problem.
There are many great references on the topic, but here is just one: http://www.diveintopython.net/regular_expressions/

Categories