how to convert a string to the desired form? - python

Good day! I'm trying to implement a module for testing knowledge. The user is given the task, he wrote the decision that is being sent and executed on the server. The question in the following. There are raw data that is stored in the file. Example -
a = 5
b = 7
There is a custom solution that is stored in the string. example
s = a * b
p = a + b
print s,p
Now it is all written in a separate file as a string.
'a = 5\n', 'b = 7', u's = a * b\r\np = a + b\r\nprint s,p'
How to do this so that the code used can be performed. Will be something like that.
a = 5
b = 7
s = a * b
p = a + b
print s,p
Here's my function to create a solution and executes it if necessary.
def create_decision(user_decision, conditions):
f1 = open('temp_decision.py', 'w')
f = open(conditions, 'r+')
contents = f.readlines()
contents.append(user_decision)
f1.write(str(contents))
f1.close()
output = []
child_stdin, child_stdout, child_stderr = os.popen3("python temp_decision.py")
output = child_stdout.read()
return output
Or tell me what I'm doing wrong? Thanks!

You don't need to create a tempfile, you can simply use exec. create_decision would then look so:
A
def create_decision(user_decision, conditions):
f = open(conditions, 'r+')
contents = f.readlines()
contents.append(user_decision)
# join list entries with a newline between and return result as string
output = eval('\n'.join(contents))
return output
B
import sys
import StringIO
import contextlib
#contextlib.contextmanager
def stdoutIO(stdout=None):
old = sys.stdout
if stdout is None:
stdout = StringIO.StringIO()
sys.stdout = stdout
yield stdout
sys.stdout = old
def create_decision(user_decision, conditions):
f = open(conditions, 'r+')
contents = f.readlines()
contents.append(user_decision)
with stdoutIO() as output:
#exec('\n'.join(contents)) # Python 3
exec '\n'.join(contents) # Python 2
return output.getvalue()
You should also use str.join() to make one string out of the list of conditions. Otherwise you couldn't write it to a file or execute it (I've done this already in the above function).
Edit: There was a bug in my code (exec doesn't return anything) so I've added a method with eval (but that won't work with print, because it evaluates and returns the result of one expression, more info here). The second method captures the output of print and stdoutIO is from an other question/answer (here). This method returns the output from print, but is a bit more complicated.

You can execute a string like so: exec("code")

Related

Load and modify YAML without breaking the indents?

Hi wanted to update this integer value of x to 8 but I could not get any good understanding on how can I do that without making any effect in indents inside sub in yaml file. I wanted to read this YAML file, replace the string to x=8 and save the yaml file as it is.
I am using Python for the modification, here is the sample code:
parent:
-
subchild: something
subchild2: something
- sub:
y = 4;
x = 6 # I wanted to replace this integer to 8
z = 10
Note Point: x=6 will be in multiple files, so I wanted to open files one by one, and do all the modifications ( x=8) and save those files one by one.
Problem
I was able to replace the file but what the problem I am facing is, the result becomes this:
parent:
-
subchild: something
subchild2: something
- sub: y = 4; x = 8; z = 10;
And what I want is the same indents inside sub as in the original yaml file. Hope you got point here.
[EDIT] Additional information:
This is the way I am reading the file and saving the file.
f = open(test.yaml, 'r')
newf = f.read().replace('6', '8')
overrides = yaml.load(newf)
f.close()
with open('updated_test.yaml', 'w') as ff:
yaml.dump(overrides, ff)
Just don't use the yaml module for this task at all, as you aren't doing anything what it is needed for ! ;)
f = open("test.yaml", 'r')
newf = f.read().replace('6', '8')
f.close()
with open('updated_test.yaml', 'w') as ff:
ff.write(newf)
Your input is invalid YAML, as you cannot start an indented sequence after the second something.
Your output is invalid as well.
If you want something like:
y = 4;
x = 6 # I wanted to replace this integer to 8
z = 10
in your YAML document as a multiline string, and preserve the multiple lines you
need to use ruamel.yaml and specify the string as a literal block scalar (using |).
And since x = 6 is part of the string, you need to do a string replacement to
change the value. I would use a regex for that:
import sys
import re
import ruamel.yaml
yaml_str = """\
- sub: |
y = 4;
x = 6 # I wanted to replace this integer to 8
z = 10
"""
yaml = ruamel.yaml.YAML()
# yaml.indent(mapping=4, sequence=4, offset=2)
# yaml.preserve_quotes = True
data = yaml.load(yaml_str)
target = data[0]['sub']
new_value = 8
data[0]['sub'] = type(target)(re.sub('x = [0-9]*', f'x = {new_value}', target))
yaml.dump(data, sys.stdout)
which gives:
- sub: |
y = 4;
x = 8 # I wanted to replace this integer to 8
z = 10

Python text file manipulation, add delta time to each line in seconds

I am a beginner at python and trying to solve the below:
I have a text file that each line starts like this:
<18:12:53.972>
<18:12:53.975>
<18:12:53.975>
<18:12:53.975>
<18:12:54.008>
etc
Instead of above I would like to add the elapsed time in seconds in the beginning of each line, but only if the line starts with '<'.
<0.0><18:12:53.972>
<0.003><18:12:53.975>
<0.003><18:12:53.975>
<0.003><18:12:53.975>
<0.036><18:12:54.008>
etc
Here comes a try :-)
#import datetime
from datetime import timedelta
from sys import argv
#get filename as argument
run, input, output = argv
#get number of lines for textfile
nr_of_lines = sum(1 for line in open(input))
#read in file
f = open(input)
lines = f.readlines()
f.close
#declarations
do_once = True
time = []
delta_to_list = []
i = 0
#read in and translate all timevalues from logfile to delta time.
while i < nr_of_lines:
i += 1
if lines[i-1].startswith('<'):
get_lines = lines[i-1] #get one line
get_time = (get_lines[1:13]) #get the time from that line
h = int(get_time[0:2])
m = int(get_time[3:5])
s = int(get_time[6:8])
ms = int(get_time[9:13])
time = timedelta(hours = h, minutes = m, seconds = s, microseconds = 0, milliseconds = ms)
sec_time = time.seconds + (ms/1000)
if do_once:
start_value = sec_time
do_once = False
delta = float("{0:.3f}".format(sec_time - start_value))
delta_to_list.append(delta)
#write back values to logfile.
k=0
s = str(delta_to_list[k])
with open(output, 'w') as out_file:
with open(input, 'r') as in_file:
for line in in_file:
if line.startswith('<'):
s = str(delta_to_list[k])
out_file.write("<" + s + ">" + line)
else:
out_file.write(line)
k += 1
As it is now, it works fine, but the last two lines is not written to the new file. It says: "s = str(delta_to_list[k]) IndexError: list index out of range.
At first I would like to get my code working, and second a suggestions for improvements. Thank you!
First point: never read a full file in memory when you don't have too (and specially when you don't know whether you have enough free memory).
Second point: learn to use python's for loop and iteration protocol. The way to iterate over a list and any other iterable is:
for item in some_iterable:
do_something_with(item)
This avoids messing with indexes and getting it wrong ;)
One of the nice things with Python file objects is that they actually are iterables, so to iterate over a file lines, the simplest way is:
for line in my_opened_file:
do_something_with(line)
Here's a simple yet working and mostly pythonic (nb: python 2.7.x) way to write your program:
# -*- coding: utf-8 -*-
import os
import sys
import datetime
import re
import tempfile
def totime(timestr):
""" returns a datetime object for a "HH:MM:SS" string """
# we actually need datetime objects for substraction
# so let's use the first available bogus date
# notes:
# `timestr.split(":")` will returns a list `["MM", "HH", "SS]`
# `map(int, ...)` will apply `int()` on each item
# of the sequence (second argument) and return
# the resulting list, ie
# `map(int, "01", "02", "03")` => `[1, 2, 3]`
return datetime.datetime(1900, 1, 1, *map(int, timestr.split(":")))
def process(instream, outstream):
# some may consider that regexps are not that pythonic
# but as far as I'm concerned it seems like a sensible
# use case.
time_re = re.compile("^<(?P<time>\d{2}:\d{2}:\d{2})\.")
first = None
# iterate over our input stream lines
for line in instream:
# should we handle this line at all ?
# (nb a bit redundant but faster than re.match)
if not line.startswith("<"):
continue
# looks like a candidate, let's try and
# extract the 'time' value from it
match = time_re.search(line)
if not match:
# starts with '<' BUT not followed by 'HH:MM:SS.' ?
# unexpected from the sample source but well, we
# can't do much about it either
continue
# retrieve the captured "time" (HH:MM:SS) part
current = totime(match.group("time"))
# store the first occurrence so we can
# compute the elapsed time
if first is None:
first = current
# `(current - first)` yields a `timedelta` object
# we now just have to retrieve it's `seconds` attribute
seconds = (current - first).seconds
# inject the seconds before the line
# and write the whole thing tou our output stream
newline = "{}{}".format(seconds, line)
outstream.write(newline)
def usage(err=None):
if err:
print >> sys.stderr, err
print >> sys.stderr, "usage: python retime.py <filename>"
# unix standards process exit codes
return 2 if err else 0
def main(*args):
# our entry point...
# gets the source filename, process it
# (storing the results in a temporary file),
# and if everything's ok replace the source file
# by the temporary file.
try:
sourcename = args[0]
except IndexError as e:
return usage("missing <filename> argument")
# `delete=False` prevents the tmp file to be
# deleted on closing.
dest = tempfile.NamedTemporaryFile(delete=False)
with open(sourcename) as source:
try:
process(source, dest)
except Exception as e:
dest.close()
os.remove(dest)
raise
# ok done
dest.close()
os.rename(dest.name, sourcename)
return 0
if __name__ == "__main__":
# only execute main() if we are called as a script
# (so we can also import this file as a module)
sys.exit(main(*sys.argv[1:]))
It gives the expected results on your sample data (running on linux - but it should be ok on any other supported OS afaict).
Note that I wrote it to work like your original code (replace the source file with the processed one), but if it were my code I would instead either explicitely provide a destination filename or as a default write to sys.stdout instead (and redirect stdout to another file). The process function can deal with any of those solution FWIW - it's only a matter of a couple edits in main().

Python: Writing multiple variables to a file

I'm fairly new to Python and I've written a scraper that prints the data I scrap the exact way I need it, but I'm having trouble writing the data to a file. I need it to look the exact same way and be in the same order as it does when it prints in IDLE
import requests
import re
from bs4 import BeautifulSoup
year_entry = raw_input("Enter year: ")
week_entry = raw_input("Enter week number: ")
week_link = requests.get("http://sports.yahoo.com/nfl/scoreboard/?week=" + week_entry + "&phase=2&season=" + year_entry)
page_content = BeautifulSoup(week_link.content)
a_links = page_content.find_all('tr', {'class': 'game link'})
for link in a_links:
r = 'http://www.sports.yahoo.com' + str(link.attrs['data-url'])
r_get = requests.get(r)
soup = BeautifulSoup(r_get.content)
stats = soup.find_all("td", {'class':'stat-value'})
teams = soup.find_all("th", {'class':'stat-value'})
scores = soup.find_all('dd', {"class": 'score'})
try:
game_score = scores[-1]
game_score = game_score.text
x = game_score.split(" ")
away_score = x[1]
home_score = x[4]
home_team = teams[1]
away_team = teams[0]
away_team_stats = stats[0::2]
home_team_stats = stats[1::2]
print away_team.text + ',' + away_score + ',',
for stats in away_team_stats:
print stats.text + ',',
print '\n'
print home_team.text + ',' + home_score +',',
for stats in home_team_stats:
print stats.text + ',',
print '\n'
except:
pass
I am totally confused on how to get this to print to a txt file the same way it prints in IDLE. The code is built to only run on completed weeks of the NFL season. So if you test the code, I recommend year = 2014 and week = 12 (or before)
Thanks,
JT
To write to a file you need to build up the line as a string, then write that line to a file.
You'd use something like:
# Open/create a file for your output
with open('my_output_file.csv', 'wb') as csv_out:
...
# Your BeautifulSoup code and parsing goes here
...
# Then build up your output strings
for link in a_links:
away_line = ",".join([away_team.text, away_score])
for stats in away_team_stats:
away_line += [stats.text]
home_line = ",".join(home_team.text, home_score])
for stats in home_team_stats:
home_line += [stats.text]
# Write your output strings to the file
csv_out.write(away_line + '\n')
csv_out.write(home_line + '\n')
This is a quick and dirty fix. To do it properly you probably want to look into the csv module (docs)
From the structure of your output I agree with Jamie that using CSV is a logical choice.
But since you're using Python 2, it's possible to use an alternate form of the print statement to print to a file.
From https://docs.python.org/2/reference/simple_stmts.html#the-print-statement
print also has an extended form, defined by the second portion of the
syntax described above. This form is sometimes referred to as “print
chevron.” In this form, the first expression after the >> must
evaluate to a “file-like” object, specifically an object that has a
write() method as described above. With this extended form, the
subsequent expressions are printed to this file object. If the first
expression evaluates to None, then sys.stdout is used as the file for
output.
Eg,
outfile = open("myfile.txt", "w")
print >>outfile, "Hello, world"
outfile.close()
However, this syntax is not supported in Python 3, so I guess it's probably not a good idea to use it. :) FWIW, I generally use the file write() method in my code when writing to files, except that I tend to use print >>sys.stderr for error messages.

Two regex functions together do not work

I am trying to get the index for the start of a tag and the end of another tag. However, when I use one regex it works absolutely fine but for two regex functions, it gives an error for the second one.
Kindly help in explaining the reason
The below code works fine:
import re
f = open('C:/Users/Jyoti/Desktop/PythonPrograms/try.xml','r')
opentag = re.search('<TEXT>',f.read())
begin = opentag.start()+6
print begin
But when I add another similar regex it give me the error
AttributeError: 'NoneType' object has no attribute 'start'
which I understand is due to the start() function returning None
Below is the code:
import re
f = open('C:/Users/Jyoti/Desktop/PythonPrograms/try.xml','r')
opentag = re.search('<TEXT>',f.read())
begin = opentag.start()+6
print begin
closetag = re.search('</TEXT>',f.read())
end = closetag.start() - 1
print end
Please provide a solution to how can I get this working. Also I am a newbie here so please don't mind if I ask more questions on the solution.
You are reading the file in f.read() which reads the whole file, and so the file descriptor moves forward, which means the text can't be read again when you do f.read() the next time.
If you need to search on the same text again, save the output of f.read(), and then do a regular expression search on it as below:
import re
f = open('C:/Users/Jyoti/Desktop/PythonPrograms/try.xml','r')
text = f.read()
opentag = re.search('<TEXT>',text)
begin = opentag.start()+6
print begin
closetag = re.search('</TEXT>',text)
end = closetag.start() - 1
print end
f.read() reads the whole file. So there's nothing left to read on the second f.read() call.
See https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects
First of all you have to know that f.read() after read file sets the pointer to the EOF so if you again use f.read() it gives you empty string ''. Secondly you should use r before string passed as a pattern of re.search function, which means raw, and automatically escapes special characters. So you have to do something like this:
import re
f = open('C:/Users/Jyoti/Desktop/PythonPrograms/try.xml','r')
data = f.read()
opentag = re.search(r'<TEXT>',data)
begin = opentag.start()+6
print begin
closetag = re.search(r'</TEXT>',data)
end = closetag.start() - 1
print end
gl & hf with Python :)

Python RegEx nested search and replace

I need to to a RegEx search and replace of all commas found inside of quote blocks.
i.e.
"thing1,blah","thing2,blah","thing3,blah",thing4
needs to become
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
my code:
inFile = open(inFileName,'r')
inFileRl = inFile.readlines()
inFile.close()
p = re.compile(r'["]([^"]*)["]')
for line in inFileRl:
pg = p.search(line)
# found comment block
if pg:
q = re.compile(r'[^\\],')
# found comma within comment block
qg = q.search(pg.group(0))
if qg:
# Here I want to reconstitute the line and print it with the replaced text
#print re.sub(r'([^\\])\,',r'\1\,',pg.group(0))
I need to filter only the columns I want based on a RegEx, filter further,
then do the RegEx replace, then reconstitute the line back.
How can I do this in Python?
The csv module is perfect for parsing data like this as csv.reader in the default dialect ignores quoted commas. csv.writer reinserts the quotes due to the presence of commas. I used StringIO to give a file like interface to a string.
import csv
import StringIO
s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()
result:
"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"
General Edit
There was
"thing1\\,blah","thing2\\,blah","thing3\\,blah",thing4
in the question, and now it is not there anymore.
Moreover, I hadn't remarked r'[^\\],'.
So, I completely rewrite my answer.
"thing1,blah","thing2,blah","thing3,blah",thing4
and
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
being displays of strings (I suppose)
import re
ss = '"thing1,blah","thing2,blah","thing3\,blah",thing4 '
regx = re.compile('"[^"]*"')
def repl(mat, ri = re.compile('(?<!\\\\),') ):
return ri.sub('\\\\',mat.group())
print ss
print repr(ss)
print
print regx.sub(repl, ss)
print repr(regx.sub(repl, ss))
result
"thing1,blah","thing2,blah","thing3\,blah",thing4
'"thing1,blah","thing2,blah","thing3\\,blah",thing4 '
"thing1\blah","thing2\blah","thing3\,blah",thing4
'"thing1\\blah","thing2\\blah","thing3\\,blah",thing4 '
You can try this regex.
>>> re.sub('(?<!"),(?!")', r"\\,",
'"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4
The logic behind this is to substitute a , with \, if it is not immediately both preceded and followed by a "
I came up with an iterative solution using several regex functions:
finditer(), findall(), group(), start() and end()
There's a way to turn all this into a recursive function that calls itself.
Any takers?
outfile = open(outfileName,'w')
p = re.compile(r'["]([^"]*)["]')
q = re.compile(r'([^\\])(,)')
for line in outfileRl:
pg = p.finditer(line)
pglen = len(p.findall(line))
if pglen > 0:
mpgstart = 0;
mpgend = 0;
for i,mpg in enumerate(pg):
if i == 0:
outfile.write(line[:mpg.start()])
qg = q.finditer(mpg.group(0))
qglen = len(q.findall(mpg.group(0)))
if i > 0 and i < pglen:
outfile.write(line[mpgend:mpg.start()])
if qglen > 0:
for j,mqg in enumerate(qg):
if j == 0:
outfile.write( mpg.group(0)[:mqg.start()] )
outfile.write( re.sub(r'([^\\])(,)',r'\1\\\2',mqg.group(0)) )
if j == (qglen-1):
outfile.write( mpg.group(0)[mqg.end():] )
else:
outfile.write(mpg.group(0))
if i == (pglen-1):
outfile.write(line[mpg.end():])
mpgstart = mpg.start()
mpgend = mpg.end()
else:
outfile.write(line)
outfile.close()
have you looked into str.replace()?
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old
replaced by new. If the optional argument count is given, only the
first count occurrences are replaced.
here is some documentation
hope this helps

Categories