python: extract parts from line when using different delimiter - python

I am reading stdin line by line:
for line in sys.stdin:
...
Each line has following format:
: 1631373881:0;echo
I need to extract the first number (epoch time) and the command (last part after ';')
How can I extract these when the delimiter is not the same?

input_str = ": 1631373881:0;echo".split(";")
command = input_str[-1]
number = input_str[0].split(":")[1].replace(" ","")

If you know the lines always have the same format, you can bet on regular expressions:
import re
MASK = re.compile(': (\\d+):\\d+;(.+)')
def extract(line):
matches = MASK.findall(line)
return matches[0] if matches else None
def test():
assert extract(": 1631373881:0;echo test") == ("1631373881", "echo test")

Related

What Is the error in the code, i want to replace a set of characters from a text file when i give a work with blanks in it

i want to replace a set of characters from a text file when i give a work with blanks in it like for example :
i gave the line The Language Is _th_n !
it should return python replacing _ with text from a file like text.txt
i wrote this code please check once
with open('data/text','r', encoding='utf8') as file:
word_list = file.read()
def solve(message):
hint = []
for i in range(15,len(message) - 1):
if message[i] != '\\':
hint.append(message[i])
hint_string = ''
for i in hint:
hint_string += i
hint_replaced = hint_string.replace('_', '!')
solution = re.findall('^'+hint_replaced+'$', word_list, re.MULTILINE)
return solution```

Print line only if found match of regex is between whitespace in Python

I have a file data.txt:
<tag,1>moon sunlightcream potato</tag>
<tag,2>dishes light jellybeans</tag>
and a python file match.py:
for LINE in open("data.txt"):
STRING = "light"
if STRING in LINE:
print (LINE)
The output is:
<tag,1>moon sunlightcream potato</tag>
<tag,2>dishes light jellybeans</tag>
I want only:
dishes light jellybeans
How can I do that ?
The larger context is:
TAG = QUERY.split("&")[1]
LIST = []
for LINE in open(DATA):
STRING = "<tag,"
if STRING in LINE:
if TAG in LINE:
print LINE
So I can't so it seems do " light " ! Because "light" is a variable. So I can't do so it seems: " light "
the regex option was:
import re
def sub_list():
TAG = "light"
p_number = re.compile(r'<tag,.*,' + TAG + ',.*,>')
for LINE in open(DATA):
match = p_number.findall(LINE)
if match:
print LINE
But that doesn't help also.
But now it works with:
import re
TAG = "light"
for LINE in open(DATA):
STRING = "<tag,"
if STRING in LINE:
if re.search(r'\b{}\b'.format(TAG), LINE):
print (LINE)
You can use regex as below, \b match the word boundary, it match only at the beginning or end of a word, so it wouldn't match light if its a substring
import re
LINES = ['moon sunlightcream potato', 'dishes light jellybeans']
match_tag = 'light'
for LINE in LINES:
# you could also use re.search(r'\b' + match_tag + r'\b', LINE)
if re.search(r'\b{}\b'.format(match_tag), LINE):
print (LINE)
# only print 'dishes light jellybeans'

How do I remove the last newline character in a text in Python?

I have a program that decodes a caesar cipher and a few text files with multiple lines to decode.
There is always a blank line after the text according to my lecturers code checker, but I don't see anything when I run the code myself.
Removing the last character only removes the last letter or number in the text and not the newline.
Here's my code:
import sys
import string
import collections
ciphertext_lines = sys.stdin.readlines()
ciphertext = ''
for i in ciphertext_lines:
ciphertext += i
alphanum = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
def getShiftVal():
string = ''
for line in ciphertext:
string = string + line
most_common_letter = ((collections.Counter(string).most_common(2)[1])[0])
shift_val = (alphanum.index(most_common_letter) - 4)
return shift_val
def decrypt(ciphertext, n):
alphabet_numbers = collections.deque(string.ascii_uppercase + string.digits)
alphanum = ''.join(list(alphabet_numbers))
alphabet_numbers.rotate(n)
alphanum_rotated = ''.join(list(alphabet_numbers))
return ciphertext.translate(str.maketrans(alphanum, alphanum_rotated))
def main():
n = getShiftVal()
decrypted = decrypt(ciphertext, n)
print(decrypted)
if __name__ == '__main__':
main()
print by default adds a newline after its output. Under Python 2, use print decrypted, (note the trailing comma) to suppress the trailing newline. Under Python 3, use print(decrypted, end=''). Or, alternatively, you can just use sys.stdout.write(decrypted) to write the output without any formatting.

How to return (not print) all matching lines in str format? string is a long string separated by \t's & \n's

def Parser(string):
string = string.split('\n')
import re
for line in string:
line = re.search(r"\S+\t+(\S+\t+)\S+\t+\S+\t+(\S+)\t+\S+", line)
return line.group(1)+line.group(2)
That is the code I was looking for and finally got it. Thanks for hints...
def Parser(string):
string = string.split('\n')
firstline = string.pop(0)
import re
matches = ''
for line in string:
line = re.search(r"\S+\t+(\S+\t+)\S+\t+\S+\t+(\S+)\t+\S+", line)
if line:
match = line.group(1) + line.group(2)+'\n'
matches += match
return matches
Assuming the rest of your code, including the regex, is correct
def Parser(string):
string = string.split('\n')
import re
matches = []
for line in string:
line = re.search(r"\S+\t+(\S+\t+)\S+\t+\S+\t+(\S+)\t+\S+", line)
match = line.group(1) + line.group(2)
matches.extend(match)
return matches
Consider using a parser for your input. Python comes with the csv module:
import csv
def Parser(string):
output = []
for fields in csv.reader(string.split('\n'), 'excel-tab'):
if len(fields) >= 6:
output.append( fields[1] + '\t' + fields[4] )
return output

Parsing blast output in .xml format

I have a blast output in .xml format, but will not post an example here, since it is huge, unless you really require it. I go specifically to my question. The script below works OK. The only thing is I want to print the hit_def, which has different length in the file followed by a space. How to modify a code to print me hit_def? As you can see if I specify [:8] if will print me 8 characters, but then the length might be 10, 15 etc, how to improve this?
import re
import sys
base = sys.argv[1]
base = base.rstrip('xml')
if fasta_out == True:
seq_out = open(base+'fasta', 'w')
read_def = set()
with open(sys.argv[1],'rb') as xml:
for line in xml:
if re.search('<Iteration_query-def>', line) != None:
line = line.strip()
line = line.rstrip()
line = re.sub('<Iteration_query-def>', '', line)
line = re.sub('</Iteration_query-def>', '', line)
query_def = line
if re.search('<Hit_def>', line) != None:
line = line.strip()
line = line.rstrip()
line = re.sub('<Hit_def>', '', line)
line = re.sub('</Hit_def>', '', line)
hit_def = line[:8]
if fasta_out == True:
print >> seq_out, query_def+'\t'+hit_def+'\n'
if fasta_out == True:
seq_out.close()
This is an example how my hit_def looks,
>MLOC_36586.11 pep:known chromosome:030312v2:1:9883453:9888834:-1 gene:MLOC_36586 transcript:MLOC_36586.11 description:"Uncharacterized protein "
>MLOC_36586.2 pep:known chromosome:030312v2:1:9883444:9888847:-1 gene:MLOC_36586 transcript:MLOC_36586.2 description:"Uncharacterized protein "
>MLOC_51.2 pep:known chromosome:030312v2:1:322147737:322148802:1 gene:MLOC_51 transcript:MLOC_51.2 description:"Predicted protein\x3b Uncharacterized protein "
>MLOC_217.1 pep:known chromosome:030312v2:4:519918111:519919326:1 gene:MLOC_217 transcript:MLOC_217.1 description:"Uncharacterized protein "
Desired hit_def's,
MLOC_36586.11
MLOC_36586.2
MLOC_51.2
MLOC_217.1
If you know it's always the first item in the string, you can do something like this:
hit_def = line[:line.index(' ')]
If it isn't necessarily first, you might go for a regex like this:
hit_def = re.findall(r'(MLOC_\d+\.\d+) ',line)[0]
I'm assuming that your hit_defs are all of the form MLOC_XXX.X, but you get the idea.

Categories