now pleased don't get me wrong on this, but im just curious whether I can get a text file and then find out how many lines within that text file have been written on, and thus use that number to print selective data from every few lines. Also could I use python to find specific words within the text file that are evenly apart for example within the text file if everything was written like this
name:> Ben
Score:> 2
name:> Ethan
Score:> 8
name:> James
Score:> 0
would it be possible for me to search the text file, for the string 'name:>' (and then save whatever comes infront of it, if possible to a variable) or seeing as they're all equally spaced could I save the specific score of one person to a variable with their name (as everything in front would be equally spaced), without having to open the txt file at all.
If all of this sounds completely impossible or if any of you have received any vague ideas as to what im talking about (in which case im in awe of your abilities of comprehension from this badly worded example), please give me any thoughts or ideas on how to format text files to create variables.
if all the above seems too complex could someone please just tell me wether its possible to analyse how many lines within a text file have been written on, from there ive got a vague idea on how to create my program.
You can use regular expression (RE) to search the text file as a string, then find out where the existing value is you want to change in the text file and write it.
https://docs.python.org/2/library/re.html
To do what you are asking, I would personally use the built-in re module, as follows:
import re
with open("foo.txt", "r") as foo:
contents = foo.read()
results = re.search("foo-bar", contents).group()
print(results)
That should do what you are looking for.
Related
I've been stuck on this problem:
I am given a list of text files (e.g. 'ZchEuxJ9VJ.txt', 'QAih70niIq.txt') which each contain a randomly generated paragraph. I am to write a parser that will extract the individual words from each of the files listed in the folder. My end goal is to create a dictionary with keys that represent the individual words found in my files. The values are supposed to be a list of files that contain that word. For example, if I print ('happy:', search['happy']), the files that contain the word happy should be added as values to my dictionary. If the word is a "new" word I would have to set it up for the first time. If the word is not a "new" word I would have to update the list associated with that word. I also have to make sure that I don't add the same filename to the same word twice.
I've already created a new dictionary called search, visited each of the files and opened them for reading, then isolated each word using the split() method. The thing I am struggling with is how to "find" a word in a particular file and mark down which file a word can be found in. I've tried "searching" for a word in a file then adding the file to the dictionary, but that gets me an error message.
Instead of searching for words in files, you should be going about it the other way around. You know you need to index every word in every file eventually, so why not just go through every word in every file, in order? In pseudocode it might look something like this.
for each file:
for each word in file:
if not word in search:
search[word] = []
search[word].append(file)
This is homework, so I'm going to help you with the algorithm, not the code. You seem to have figured out most of the problem. The only thing you need help with is how to actually populate the dictionary.
Open the file (say fname), read its contents
Split the contents to separate each word
Iterate over each word. Say we call it fword.
Is fword in the dictionary?
No? Create the key with an empty list as the value
Yes? Do nothing and move on
Now you know that fword is a key in the search dictionary, and its value is a list. (Say we call this list fwlist)
You also know that fword was found in the file fname
Check if fname is already in fwlist
No? Add fname to fwlist.
Yes? Don't add it again. Do nothing.
Now, there are optimizations you can make, such as using a set instead of a list. This way you don't need to check if fname already exists in fwlist, because sets automatically discard duplicates, but this should be enough for what you need.
Remember: before you start writing the program, it's helpful to sit down, think about the problem you're trying to solve, and plan out how you're going to attack the problem. Drawing a flowchart helps immensely when you're a novice programmer because it helps you organize your thoughts and figure out how your program is supposed to flow.
Debugging is also a crucial skill -- your code is useless if you can't fix errors. How to debug small programs.
|
What is a debugger and how can it help me diagnose problems?
I'm pretty new to programming and I've been having a problem: I have a txt file with a long (30,000+ chars) single string made of 4 letters (a DNA sequence) and I need to search that file for certain repeats (for example 'TTAGGG'), highlight them and save as a simple readable file. obviously I can't save it as a txt file because there is no highlight option.
I tried html as well as docx but every search I try removes the previous highlights.
Does anyone have any suggestions?
Open it in MS-Word, and do following Find-and-replace:
I'm currently working on my first Sublimetext 3 Plugin. The idea is, that it scans a certain line for a pattern. I Already found the view.find() function but as far as I can say it scans the whole document.
The final aim is to convert a line with several patterns to a new line with contents from the previous line.
My input would be something like
Hello.MyNameIs("Paul", male)
and the output shall be something like:
MyNameIs = Paul
My idea is to use the find function to find the text within the quotes.
result = self.view.find(<pattern>, line.begin())
The question currently is: what kind of pattern do I need to store the name Paul in result?
find(pattern, fromPosition, flags)
You can find your text starting from a position, so you have to first determine the position at the beginning of the line.
You can use lines() or split_by_newlines() to divide the view into lines, and begin() will give you the start position of each line.
I'm using the module Win32Com to edit automatically Word Documents with Python. But I'm facing an annoying problem that you've probably seen before : I use the Find and Replace function of the module to insert paragraphs into a template that I have, but sometimes I'd like to insert several paragraphs at the same time, which are separated with a line-break. The python string of these paragraphs is something like that : text = "First paragraph.\nSecond paragraph."
But the problem is that when I use the Find and Replace function with that kind of strings, it doesn't make a line-break but something like First paragraph Second paragraph, which is obviously not what I want.
Does someone have an idea on how to deal with that ?
Thanks guys for help !
I've converted my PDF file into a long string using PDFminer.
I'm wondering how I should go about dividing this string into smaller, individual strings/pages. Each page is divided by a certain series of characters (CRLF, FF, page number etc), and the string should be split and appended to a new text file according to these characters occurring.
I have no experience with regex, but is using the re module the best way to go about this?
My vague idea for implementation is that I have to iterate through the file using the re.search function, creating text files with each new form feed found. The only code I have is PDF > text conversion. Can anyone point me in the right direction?
Edit: I think the expression I should use is something like ^.*(?=(\d\n\n\d\n\n\f\bFavela\b)) (capture everything before 2 digits, the line breaks and the book's title 'Favela' which appears on top of each page.
Can I save these \d digits as variables? I want to use them as file names, as I iterate through the book and scoop up the portions of text divided by each appearance of \f\Favela.
I'm thinking the re.sub method would do it, looping through and replacing with an empty string as I go.