I'm currently working on my first Sublimetext 3 Plugin. The idea is, that it scans a certain line for a pattern. I Already found the view.find() function but as far as I can say it scans the whole document.
The final aim is to convert a line with several patterns to a new line with contents from the previous line.
My input would be something like
Hello.MyNameIs("Paul", male)
and the output shall be something like:
MyNameIs = Paul
My idea is to use the find function to find the text within the quotes.
result = self.view.find(<pattern>, line.begin())
The question currently is: what kind of pattern do I need to store the name Paul in result?
find(pattern, fromPosition, flags)
You can find your text starting from a position, so you have to first determine the position at the beginning of the line.
You can use lines() or split_by_newlines() to divide the view into lines, and begin() will give you the start position of each line.
Related
I'm trying to remove a couple of lines from a text file that I imported from my Kindle. The text looks like:
Shall I come to you?
Nicholls David, One Day, loc. 876-876
Dexter looked up at the window of the flat where Emma used to live.
Nicholls David, One Day, loc. 883-884
I want to grab the bin bag and do a forensics
Sophie Kinsella, I've Got Your Number, loc. 64-64
The complete file is longer, this is just a piece of document. The aim with my code is to remove all lines where "loc. " is written so that just the extracts remain. My target can be also seen as removing the line which is just before the blank line.
My code so far look like this:
f = open('clippings_export.txt','r', encoding='utf-8')
message = f.read()
line=message[0:400]
f.close()
key=["l","o","c","."," "]
for i in range(0,len(line)-5):
if line[i]==key[0]:
if line[i+1]==key[1]:
if line[i + 2]==key[2]:
if line[i + 3]==key[3]:
if line[i + 4]==key[4]:
The last if finds exactly the position (indices) where each "loc. " is located in file. Nevertheless, after this stage I do not know how to go back in the line so that the code catches where the line starts, and it can be completely remove. What could I do next? Do you recommend me another way to remove this line?
Thanks in advance!
I think that the question might be a bit misleading!
Anyway, if you simply want to remove those lines, you need to check whether they contain the "loc." substring. Probably the easiest way is to use the in operator.
Instead of getting whole file from read() function, read the file line by line (using the readlines() function for example). You can then check if it contains your key and omit it if it does.
Since the result is now list of strings, you might want to merge it: str.join().
Here I used another list to store desired lines, you can also use "more pythonic" filter() or list comprehension (example in similar question I mentioned below).
f = open('clippings_export.txt','r', encoding='utf-8')
lines = f.readlines()
f.close()
filtered_lines = []
for line in lines:
if "loc." in line:
continue
else:
filtered_lines.append(line)
result = ""
result = result.join(filtered_lines)
By the way, I thought it might be a duplicate - Here's question about the opposite (that is wanting lines which contain the key).
now pleased don't get me wrong on this, but im just curious whether I can get a text file and then find out how many lines within that text file have been written on, and thus use that number to print selective data from every few lines. Also could I use python to find specific words within the text file that are evenly apart for example within the text file if everything was written like this
name:> Ben
Score:> 2
name:> Ethan
Score:> 8
name:> James
Score:> 0
would it be possible for me to search the text file, for the string 'name:>' (and then save whatever comes infront of it, if possible to a variable) or seeing as they're all equally spaced could I save the specific score of one person to a variable with their name (as everything in front would be equally spaced), without having to open the txt file at all.
If all of this sounds completely impossible or if any of you have received any vague ideas as to what im talking about (in which case im in awe of your abilities of comprehension from this badly worded example), please give me any thoughts or ideas on how to format text files to create variables.
if all the above seems too complex could someone please just tell me wether its possible to analyse how many lines within a text file have been written on, from there ive got a vague idea on how to create my program.
You can use regular expression (RE) to search the text file as a string, then find out where the existing value is you want to change in the text file and write it.
https://docs.python.org/2/library/re.html
To do what you are asking, I would personally use the built-in re module, as follows:
import re
with open("foo.txt", "r") as foo:
contents = foo.read()
results = re.search("foo-bar", contents).group()
print(results)
That should do what you are looking for.
I'm a beginner at programming and Python, and I'm writing a script to do stuff with .srt subtitle files. My problem is that I don't know how to: read through a file, and analyze text first between the beginning of the text and the first empty line and then between that empty line and the next empty line till the end of the file ("analyze" by e.g. calculate the length of a part of it, convert another part to numbers etc.).
You can read about the .srt format specification and see an example here (type: Plain); there's an empty line at the end of the file. I want to compare the display time/duration of each subtitle against the number of characters in it. Starting from the beginning of the file, each subtitle (with its number, duration info and text) is separated from the next one by an empty line (a "\n", I can find them with sth like if "\n" in line and len(line) == 2:). The time codes always contain a "-->" and always end in three digits, so if I have that in a string, I can figure out where it is. The problem is, I need to somehow do the following:
Read the subtitle text, which can be 1-3 lines with line breaks, calculate its character length.
Read the duration, convert to duration in seconds.
Read the line number (to be able to output it somewhere with my results, e.g. "duration of line 44 is 4.54 s").
I can do the second easily, but I'm not sure how to go over the whole file and tell Python: find the end of each subtitle's text, calculate the length of characters in each line, add that, read the duration, divide these, output this with the line number, and do the same with the next subtitle until you reach the end of the file. If it was one subtitle, I could do it easily, but I'm not sure how to do that check on a single one and then seek the next one. I've been looking for 2 hours for this and can't find anything like that.
Regular Expressions can be a powerful tool to help solve this type of processing.
You can use a regular expression to match or parse a single record or against the entire file.
If you don't know about Regex in python, I highly recommend you do some tutorials on the topic... and that should give you plenty of ideas how it can be applied to your problem.
There are many great references on the topic, but here is just one: http://www.diveintopython.net/regular_expressions/
I'm new to Python and am trying to find a way to write to a file based on two conditions of a text file:
Out of a given text, one of the lines must match my search exactly. Position and value is always the same.
If condition one above is met and a value of X (to be defined \ can change) is also present in the text at a known location, print both the matching text from condition one and the value X with the 10 immediate proceeding characters which never change.
So from the text given in another example I saw on this site:
textInput = """\
I'm trying to have my program grab every fifth word from a text file and
place it in a single string. For instance, if I typed "Everyone likes to
eat pie because it tastes so good plus it comes in many varieties such
as blueberry strawberry and lime" then the program should print out
"Everyone because plus varieties and." I must start with the very first
word and grab every fifth word after. I'm confused on how to do this.
Below is my code, everything runs fine except the last 5 lines."""
From this example, I would like to write to a file the following but only if both are present:
"place it in a single string. For instance, if I typed "Everyone likes to"
and
"blueberry strawberry and lime".
The word lime may change to an unknown, varying value.
What it comes down to is that I have a bunch of log files I'm going through. If an IP address is present at a particular location in the file, I want that IP (which is unknown), the 10 proceeding characters along with with a string of text that is always present a few lines up from the IP. Both of these are to be written to a file.
I figured out how open \ close files and write entries etc. to a new file for a particular found phrase but am having problems sending entries to a file if a specific combination of two or more conditions are met.
I think the best approach would be to read a log file, then use regular expressions to find all ip addresses in your log.
ip = re.compile("^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$")
ip = re.findall(ip, yourLogFile)
then you would use os to loop through each file in your folder of ip's already on file
import os
ipAlreadyOnFile = []
for root, dirs, files in os.walk(r'C:\yourDirectory'):
for file in files:
ipAlreadyOnFile.append(file)
then you could find the differences between the two lists:
newIp = list(set(ip) - set(ipAlreadyOnFile))
now your newIp list has nothing but new ip addresses to either add to your directory or do something else with.
I have a bunch of PDF files that I have to search for a set of keywords against. I have to extract the exact line where the keyword was found. I first used xpdf's pdf2text to convert the file to PDF. (Tried solr but had a tough time tailoring the output/schema to suit my requirement).
import sys
file_name = sys.argv[1]
searched_string = sys.argv[2]
result = [(line_number+1, line) for line_number, line in enumerate(open(file_name)) if searched_string.lower() in line.lower()]
#print result
for each in result:
print each[0], each[1]
ThinkCode:~$ python find_string.py sample.txt "String Extraction"
The problem I have with this is that for cases where search string is broken towards the end of the line :
If you are going to index large binary files, remember to change the
size limits. String
Extraction is a common problem
If I am searching for 'String Extraction', I will miss this keyword if I use the code presented above. What is the most efficient way of achieving this without making 2 copies of text file (one for searching the keyword to extract the line (number) and the other for removing line breaks and finding the keyword to eliminate the case where the keyword spans across 2 lines).
Much appreciated guys!
Note: Some considerations without any code, but I think they belong to an answer rather than to a comment.
My idea would be to search only for the first keyword; if a match is found, search for the second. This allows you to, if the match is found at the end of the line, take into consideration the next line and do line concatenation only if a match is found in first place*.
Edit:
Coded a simple example and ended up using a different algorithm; the basic idea behind it is this code snippet:
def iterwords(fh):
for number, line in enumerate(fh):
for word in re.split(r'\s+', line.strip()):
yield number, word
It iterates over the file handler and produces a (line_number, word) tuple for each word in the file.
The matching afterwards becomes pretty easy; you can find my implementation as a gist on github. It can be run as follows:
python search.py 'multi word search string' file.txt
There is one main concern with the linked code, I didn't code a workaround both for performance and complexity reasons. Can you figure it out? (Spoiler: try to search for a sentence whose first word appears two times in a row in the file)
* I didn't perform any testing on my own, but this article and the python wiki suggest that string concatenation is not that efficient in python (don't know how actual the information is).
There may be a better way of doing it, but my suggestion would be to start by taking in two lines (let's call them line1 and line2), concatenating them into line3 or something similar, and then search that resultant line.
Then you'd assign line2 to line1, get a new line2, and repeat the process.
Use the flag re.MULTILINE when compiling your expressions: http://docs.python.org/library/re.html#re.MULTILINE
Then use \s to represent all white space (including new lines).