Issues using .startswith() for a specific location in a string - python

I have a text file, which has many lines of data in it. I need to check each line of this text file and process the data contained within the line accordingly (i.e. save to a separate, tabulated .txt for analysis)
The text file is in the following format:
Number 1 or 0 (denoting relevance of data)
An ID for each line (referring to what the data is)
The data itself (contained in rest of line)
So this is what two example lines may look like:
1 ID:K-95 list of data
0 ID:D-56 list of other data
Such that the first line had relevant data to ID K-95 and the second had irrelevant data to ID D-56.
I want to parse the text file, and sort the data contained within each line based on the relevance (0 or 1) and the data ID. I.e. save each line with the same ID in order of relevance (first all the lines with 1 and then with 0). Lines can have the same ID, but different data. Lines are also always of a fixed length.
To do this I came up with:
idtag = input('Enter ID:')
with open("example.txt", 'r') as f:
for line in f.readlines():
if line.startswith('1') and line.startswith(idtag, 5, 3):
print line
Having trouble with this however. Specifically around the second condition after the and operator. I can print/select lines based on whether there is a 0 or 1, no problem. However, using the .startswith() method with a defined position seems to return nothing: no error, no printing - it simply executes and returns nothing.
Any ideas? Maybe a better way of parsing this data to meet my objective?

The start and end are interpreted as absolute positions (specifically: end is not interpreted relative to start) for str.startswith:
str.startswith(prefix[, start[, end]])
Return True if string starts with the prefix, otherwise return False. prefix can also be a tuple of prefixes to look for. With optional start, test string beginning at that position. With optional end, stop comparing string at that position.
So instead of
line.startswith(idtag, 5, 3)
you need to use
line.startswith(idtag, 5, 5+4)
The two parameters are equivalent to slicing notation:
line[5: 5+4].startswith(idtag)
For example:
>>> a = 'abcdefg'
>>> a.startswith('c', 2, 1)
False
>>> a[2:1]
''
>>> a.startswith('c', 2)
True
>>> a[2:]
'cdefg'
>>> a.startswith('c', 2, 3)
True
>>> a[2:3]
'c'

I realise there's already an answer, but as an alternative you could also just check if idtag exists in the line:
idtag = input('Enter ID:')
with open("example.txt", 'r') as f:
for line in f.readlines():
if line.startswith('1') and idtag in line:
print line

Related

Reading an nth line of a textfile in python determined from a list

I have a function gen_rand_index that generates a random group of numbers in list format, such as [3,1] or [3,2,1]
I also have a textfile that that reads something like this:
red $1
green $5
blue $6
How do I write a function so that once python generates this list of numbers, it automatically reads that # line in the text file? So if it generated [2,1], instead of printing [2,1] I would get "green $5, red $1" aka the second line in the text file and the first line in the text file?
I know that you can do print(line[2]) and commands like that, but this won't work in my case because each time I am getting a different random number of a line that I want to read, it is not a set line I want to read each time.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
I have this so far, but I am getting this
error: invalid literal for int() with base 10: '[4, 1]'
I also have gotten
TypeError: string indices must be integers
butI have tried replacing str with int and many things like that but I'm thinking the way I'm just approaching this is wrong. Can anyone help me? (I have only been coding for a couple days now so I apologize in advance if this question is really basic)
Okay, let us first get some stuff out of the way
Whenever you access something from a list the thing you put inside the box brackets [] should be an integer, eg: [5]. This tells Python that you want the 5th element. It cannot ["5"] because 5 in this case would be treated as a string
Therefore the line row = str(result[gen_rand_index]) should actually just be row = ... without the call to str. This is why you got the TypeError about list indices
Secondly, as per your description gen_rand_index would return a list of numbers.
So going by that, why don;t you try this
indices_to_pull = gen_rand_index()
file_handle = open("Foodinventory.txt", 'r')
file_contents = file_handle.readlines() # If the file is small and simle this would work fine
answer = []
for index in indices_to_pull:
answer.append(file_contents[index-1])
Explanation
We get the indices of the file lines from gen_rand_index
we read the entire file into memory using readlines()
Then we get the lines we want, Rememebr to subtract 1 as the list is indexed from 0
The error you are getting is because you're trying to index a string variable (line) with a string index (row). Presumably row will contain something like '[2,3,1]'.
However, even if row was a numerical index, you're not indexing what you think you're indexing. The variable line is a string, and it contains (on any given iteration) one line of the file. Indexing this variable will give you a single character. For example, if line contains green $5, then line[2] will yield 'e'.
It looks like your intent is to index into a list of strings, which represent all the lines of the file.
If your file is not overly large, you can read the entire file into a list of lines, and then just index that array:
with open('file.txt') as fp:
lines = fp.readlines()
print(lines[2]).
In this case, lines[2] will yield the string 'blue $6\n'.
To discard the trailing newline, use lines[2].strip() instead.
I'll go line by line and raise some issues.
row = str(result[gen_rand_index]) #result[gen_rand_index] gives me the random list of numbers
Are you sure it is gen_rand_index and not gen_rand_index()? If gen_rand_index is a function, you should call the function. In the code you have, you are not calling the function, instead you are using the function directly as an index.
file = open("Foodinventory.txt", 'r')
for line in file:
print(line[row])
file.close()
The correct python idiom for opening a file and reading line by line is
with open("Foodinventory.txt.", "r") as f:
for line in f:
...
This way you do not have to close the file; the with clause does this for you automatically.
Now, what you want to do is to print the lines of the file that correspond to the elements in your variable row. So what you need is an if statement that checks if the line number you just read from the file corresponds to the line number in your array row.
with open("Foodinventory.txt", "r") as f:
for i, line in enumerate(f):
if i == row[i]:
print(line)
But this is wrong: it would work only if your list's elements are ordered. That is not the case in your question. So let's think a little bit. You could iterate over your file multiple times, and each time you iterate over it, print out one line. But this will be inefficient: it will take time O(nm) where n==len(row) and m == number of lines in your file.
A better solution is to read all the lines of the file and save them to an array, then print the corresponding indices from this array:
arr = []
with open("Foodinventory.txt", "r") as f:
arr = list(f)
for i in row:
print(arr[i - 1]) # arrays are zero-indiced

Having problems with strings and arrays

I want to read a text file and copy text that is in between '~~~~~~~~~~~~~' into an array. However, I'm new in Python and this is as far as I got:
with open("textfile.txt", "r",encoding='utf8') as f:
searchlines = f.readlines()
a=[0]
b=0
for i,line in enumerate(searchlines):
if '~~~~~~~~~~~~~' in line:
b=b+1
if '~~~~~~~~~~~~~' not in line:
if 's1mb4d' in line:
break
a.insert(b,line)
This is what I envisioned:
First I read all the lines of the text file,
then I declare 'a' as an array in which text should be added,
then I declare 'b' because I need it as an index. The number of lines in between the '~~~~~~~~~~~~~' is not even, that's why I use 'b' so I can put lines of text into one array index until a new '~~~~~~~~~~~~~' was found.
I check for '~~~~~~~~~~~~~', if found I increase 'b' so I can start adding lines of text into a new array index.
The text file ends with 's1mb4d', so once its found, the program ends.
And if '~~~~~~~~~~~~~' is not found in the line, I add text to the array.
But things didn't go well. Only 1 line of the entire text between those '~~~~~~~~~~~~~' is being copied to the each array index.
Here is an example of the text file:
~~~~~~~~~~~~~
Text123asdasd
asdasdjfjfjf
~~~~~~~~~~~~~
123abc
321bca
gjjgfkk
~~~~~~~~~~~~~
You could use regex expression, give a try to this:
import re
input_text = ['Text123asdasd asdasdjfjfjf','~~~~~~~~~~~~~','123abc 321bca gjjgfkk','~~~~~~~~~~~~~']
a = []
for line in input_text:
my_text = re.findall(r'[^\~]+', line)
if len(my_text) != 0:
a.append(my_text)
What it does is it reads line by line looks for all characters but '~' if line consists only of '~' it ignores it, every line with text is appended to your a list afterwards.
And just because we can, oneliner (excluding import and source ofc):
import re
lines = ['Text123asdasd asdasdjfjfjf','~~~~~~~~~~~~~','123abc 321bca gjjgfkk','~~~~~~~~~~~~~']
a = [re.findall(r'[^\~]+', line) for line in lines if len(re.findall(r'[^\~]+', line)) != 0]
In python the solution to a large part of problems is often to find the right function from the standard library that does the job. Here you should try using split instead, it should be way easier.
If I understand correctly your goal, you can do it like that :
joined_lines = ''.join(searchlines)
result = joined_lines.split('~~~~~~~~~~')
The first line joins your list of lines into a sinle string, and then the second one cut that big string every times it encounters the '~~' sequence.
I tried to clean it up to the best of my knowledge, try this and let me know if it works. We can work together on this!:)
with open("textfile.txt", "r",encoding='utf8') as f:
searchlines = f.readlines()
a = []
currentline = ''
for i,line in enumerate(searchlines):
currentline += line
if '~~~~~~~~~~~~~' in line:
a.append(currentline)
elif 's1mb4d' in line:
break
Some notes:
You can use elif for your break function
Append will automatically add the next iteration to the end of the array
currentline will continue to add text on each line as long as it doesn't have 's1mb4d' or the ~~~ which I think is what you want
s = ['']
with open('path\\to\\sample.txt') as f:
for l in f:
a = l.strip().split("\n")
s += a
a = []
for line in s:
my_text = re.findall(r'[^\~]+', line)
if len(my_text) != 0:
a.append(my_text)
print a
>>> [['Text123asdasd asdasdjfjfjf'], ['123abc 321bca gjjgfkk']]
If you're willing to impose/accept the constraint that the separator should be exactly 13 ~ characters (actually '\n%s\n' % ( '~' * 13) to be specific) ...
then you could accomplish this for relatively normal sized files using just
#!/usr/bin/python
## (Should be #!/usr/bin/env python; but StackOverflow's syntax highlighter?)
separator = '\n%s\n' % ('~' * 13)
with open('somefile.txt') as f:
results = f.read().split(separator)
# Use your results, a list of the strings separated by these separators.
Note that '~' * 13 is a way, in Python, of constructing a string by repeating some smaller string thirteen times. 'xx%sxx' % 'YY' is a way to "interpolate" one string into another. Of course you could just paste the thirteen ~ characters into your source code ... but I would consider constructing the string as shown to make it clear that the length is part of the string's specification --- that this is part of your file format requirements ... and that any other number of ~ characters won't be sufficient.
If you really want any line of any number of ~ characters to serve as a separator than you'll want to use the .split() method from the regular expressions module rather than the .split() method provided by the built-in string objects.
Note that this snippet of code will return all of the text between your separator lines, including any newlines they include. There are other snippets of code which can filter those out. For example given our previous results:
# ... refine results by filtering out newlines (replacing them with spaces)
results = [' '.join(each.split('\n')) for each in results]
(You could also use the .replace() string method; but I prefer the join/split combination). In this case we're using a list comprehension (a feature of Python) to iterate over each item in our results, which we're arbitrarily naming each), performing our transformation on it, and the resulting list is being boun back to the name results; I highly recommend learning and getting comfortable with list comprehension if you're going to learn Python. They're commonly used and can be a bit exotic compared to the syntax of many other programming and scripting languages).
This should work on MS Windows as well as Unix (and Unix-like) systems because of how Python handles "universal newlines." To use these examples under Python 3 you might have to work a little on the encodings and string types. (I didn't need to for my Python3.6 installed under MacOS X using Homebrew ... but just be forewarned).

Python: Extract single line from file

Very new, please be nice and explain slowly and clearly. Thanks :)
I've tried searching how to extract a single line in python, but all the responses seem much more complicated (and confusing) than what I'm looking for. I have a file, it has a lot of lines, I want to pull out just the line that starts with #.
My file.txt:
"##STUFF"
"##STUFF"
#DATA 01 02 03 04 05
More lines here
More lines here
More lines here
My attempt at a script:
file = open("file.txt", "r")
splitdata = []
for line in file:
if line.startswith['#'] = data
splitdata = data.split()
print splitdata
#expected output:
#splitdata = [#DATA, 1, 2, 3, 4, 5]
The error I get:
line.startswith['#'] = data
TypeError: 'builtin_function_or_method' object does not support item assignment
That seems to mean it doesn't like my "= data", but I'm not sure how to tell it that I want to take the line that starts with # and save it separately.
Correct the if statement and the indentation,
for line in file:
if line.startswith('#'):
print line
Although you're relatively new, you should start learning to use list comprehension, here is an example on how you can use it for your situation. I explained the details in the comments and the comments are matched to the corresponding order.
splitdata = [line.split() for line in file if line.startswith('#')]
# defines splitdata as a list because comprehension is wrapped in []
# make a for loop to iterate through file
#checks if the line "startswith" a '#'
# note: you should call functions/methods using the () not []
# split the line at spaces if the if startment returns True
That's an if condition that expects predicate statement not the assignment.
if line.startswith('#'):
startswith(...)
S.startswith(prefix[, start[, end]]) -> bool
Return True if S starts with the specified prefix, False otherwise.
With optional start, test S beginning at that position.
With optional end, stop comparing S at that position.
prefix can also be a tuple of strings to try.

How do I take a number from a text file and replace it with that number +1?

I want to take a number, in my case 0, and add 1, then replace it back into the file. This is what I have so far:
def caseNumber():
caseNumber = open('caseNumber.txt', "r")
lastCase = caseNumber.read().splitlines()[0]
Case = []
Case.append(lastCase)
newCase = [ int(x)+1 for x in Case ]
with open('caseNumber.txt', mode = 'a',
encoding = 'utf-8') as my_file:
my_file.write('{}'.format(newCase))
print('Thankyou, your case number is {}, Write it down!'.format(newCase))
After this is run, i get:
this is what is added to the file: 0000[1] (the number in the file was 0000
to start off with, but it added [1] aswell)
Basically, the part I am stuck on is adding 1 to the number without the brackets.
newCase is a list, which gets printed with its values enclosed in brackets. If you just want the value in the list to get written to the file, you'll need to say that.
You don't need to create a list comprehension since you only need 1 item.
Since you're converting list to string you get the list representation: with brackets.
Note that it's not the only problem: you're appending to your text file (a mode), you don't replace the number. You have to write the file from scratch. But for that, you have to save full file contents when reading the first time. My proposal:
with open("file.txt") as f:
number,_,rest = f.read().partition(" ") # split left value only
number = str(int(number)+1) # increment and convert back to string
with open('file.txt',"w") as f:
f.write(" ".join([number,rest])) # write back file fully
So if the file contains:
0 this is a file
hello
each time you run this code above, the leading number is incremented, but the trailint text is kept
1 this is a file
hello
and so on...

How to take individual input in python?

Calculator Language is what the problem is called and I am to code it in python. The coding part is done but I am having trouble while reading the input file.
So the input file looks like this :
A = B = 4
C = (D = 2)*_2
#
What i would like to do is to read each character, line by line ( each line is an expression and has to be calculated), characters as characters and integers as integers, since I push them into stacks. There are two stacks one for the characters and numbers and the other for the operators.
Anyway this is what I have done with the input so far :
#!/usr/bin/python
a = open("testinput1.txt","r+")
wordList = [line.strip() for line in a];
print wordList[1]
And what i get is :
C = (D = 2)*_2
Also the end of file is reached when the file reader hits #.
Any sort of help or suggestions are welcome.
wordList is list of lines, each element is line (stripped one, without \n')
You should split each line, to get its tokens.
Then for each token check if it is string or integer (using isdigit for example).
Now that your wordlist[0] contains your first statement, In python each and every string can be indexed directly without creating a seperate list for it.
for example: if wordlist[0] contains "c=a+b" , wordlist[0][0] will directly give you 'c'.

Categories