Printing Python Output to .txt file - python

I'm having a bit of an issue trying to print the output of my code into a .txt file.
I am looking to print the results of my code to a .txt file, which I can then use the number printed as a variable for how many lines from the top of the .txt file to read from, separating that bit of text into it's own .txt file.
I have attached a link to a small diagram to explain this better 1
I've been researching for a while and can't seem to find it! Does anyone have any ideas on how to do this?
My code is the following:
from itertools import permutations
import string
# Enter new brand names to following lists
brand_names = "8x8, AmazonAWS, Checkpoint, Cisco, Citrix, Commvault, Dell, Dell_EMC, Google, Google_Cloud, HPE, Hyland, IBM, IBM_Cloud, Microsoft, Microsoft_Azure, NetApp, Oracle, Pure_Storage, SAP, Thompson_Reuters, Veritas, VMware, Aerohive, Aramark, Bloomberg, BMC, Box, CompuCom, Cybera, Extreme, FireEye, GE, Globoforce, GPSI, Infor, Lux_Research, MetTel, Oracle_Cloud, PAR_Invista, Puppet, Rackspace, Ruckus, Salesforce, SonicWall, SPI, Splunk, Stratix, Supermicro, Tenable, Ultipro, US_Bank, Veeam, VIP"
for group in permutations(['8x8', 'AmazonAWS', 'Checkpoint', 'Cisco', 'Citrix', 'Commvault', 'Dell', 'Dell_EMC', 'Google', 'Google_Cloud', 'HPE', 'Hyland', 'IBM', 'IBM_Cloud', 'Microsoft', 'Microsoft_Azure', 'NetApp', 'Oracle', 'Pure_Storage', 'SAP', 'Thompson_Reuters', 'Veritas', 'VMware', 'Aerohive', 'Aramark', 'Bloomberg', 'BMC', 'Box', 'CompuCom', 'Cybera', 'Extreme', 'FireEye', 'GE', 'Globoforce', 'GPSI', 'Infor', 'Lux Research', 'MetTel', 'Oracle_Cloud', 'PAR_Invista', 'Puppet', 'Rackspace', 'Ruckus', 'Salesforce', 'SonicWall', 'SPI', 'Splunk', 'Stratix', 'Supermicro', 'Tenable', 'Ultipro', 'US Bank', 'Veeam', 'VIP'], 3):
print('_'.join(group))
print
print
# Python3 code to demonstrate
# to count words in string
# using regex (findall())
import re
# printing original string
print ("The original string is : ")
print
print("'" + brand_names + "'")
# using regex (findall())
# to count words in string
res = len(re.findall(r'\w+', brand_names))
print
print
print
# printing result
print ("The number of brands are : " + str(res))
print
print
N = res
import sys
sys.stdout = open("file.txt", "w+")
print (group)

You'd use <filename>.write rather than print.
Using stdout would be dangerous and might affect parts of your code which you hadn't considered - I'd recommend removing that part of your code completely.

You can also use the > operator
python trial.py > trial.txt

Related

Python - DocxTemplate - Not Printing "&" in final document

I am running a script that takes the names from a csv file and populates them into individual word documents from a template. I got that part. But here is where I need a bit of help.
Some cells in the csv file are two names, such as "Bobby & Sammy." When I go check the populated word document, it only has "Bobby Sammy." I know that the "&" is a special character, but I am not sure what I have to do for it to populate the word documents correctly.
Any and all help is appreciated.
Edit: Code
csvfn = "Addresses.csv"
df = pd.read_csv('Addresses.csv')
def mkw(n):
tpl = DocxTemplate('Envelope_Template.docx')
df_to_doct = df.to_dict()
x = df.to_dict(orient='records')
context = x
tpl.render(context[n])
tpl.save("%s.docx" %str(n))
wait = time.sleep(random.randint(1,3))
~
print ("There will be ", df2, "files")
~
for i in range(0, df2):
print("Making file: ",f"{i}," ,"..Please Wait...")
mkw(i)
print("Done! - Now check your files")
~ Denotes new cell, I am using JupyterLab
File is a standard csv file
Standard CSV File
Without "&" Prints fine
Empty space where "&" is supposed to be

How to extract from text with in a range of time

I have a text below, How to extract the text between the time range. Code is available to extract all the values
s = '''00:00:14,099 --> 00:00:19,100
a classic math problem a
00:00:17,039 --> 00:00:28,470
will come from an unexpected place
00:00:18,039 --> 00:00:19,470
00:00:20,039 --> 00:00:21,470
00:00:22,100 --> 00:00:30,119
binary numbers first I'm going to give
00:00:30,119 --> 00:00:35,430
puzzle and then you can try to solve it
00:00:32,489 --> 00:00:37,170
like I said you have a thousand bottles'''
Can i extract the test from 00:00:17,039 --> 00:00:28,470 and 00:00:30,119
code to write back all the values
import re
lines = s.split('\n')
dict = {}
for line in lines:
is_key_match_obj = re.search('([\d\:\,]{12})(\s-->\s)([\d\:\,]{12})', line)
if is_key_match_obj:
#current_key = is_key_match_obj.group()
print (current_key)
continue
if current_key:
if current_key in dict:
if not line:
dict[current_key] += '\n'
else:
dict[current_key] += line
else:
dict[current_key] = line
print(dict.values())
Expected Out from 00:00:17,039 --> 00:00:28,470 to 00:00:30,119 --> 00:00:35,430
dict_values(['will come from an unexpected place ', '', '', 'binary numbers first I'm going to give', ' puzzle and then you can try to solve it'])
No need to iterate line by line. Try the below code. It will give you a dictionary as you wanted.
import re
dict = dict(re.findall('(\d{2}:\d{2}.*)\n(.*)', s))
print(dict.values())
Output
dict_values(['a classic math problem a', 'will come from an unexpected place', '', '', "binary numbers first I'm going to give", 'puzzle and then you can try to solve it', 'like I said you have a thousand bottles'])
import re
line = re.sub(r'\d{2}[:,\d]+[ \n](-->)*', "", s)
print(line)
will print:
" a classic math problem a\n\n will come from an unexpected place\n\n
\n \n binary numbers first I'm going to give\n\n puzzle and then you
can try to solve it\n\n like I said you have a thousand bottles"
Explanation
'\d{2}[:,\d] capture two digits numbers followed by : or , or a number - this captures both start and end timelines
[ \n] : captures an empty space after the first timeline and line break after the end timeline
(-->)* : captures the occurrence of 0 or more -->
As some else suggested in the comment, you might want to look at parser that do this for you by building a parse tree. They are more full-proof. Google search leads me to this srt python library

How to load a dataframe from a file containing unwanted characters?

I'm in need of some knowledge on how to fix an error I have made while collecting data. The collected data has the following structure:
["Author", "Message"]
["littleblackcat", " There's a lot of redditors here that live in the area maybe/hopefully someone saw something. "]
["Kruse", "In other words, it's basically creating a mini tornado."]
I normally wouldn't have added "[" or "]" to .txt file when writing the data to it, line per line. However, the mistake was made and thus when loading the file it will separate it the following way:
Is there a way to load the data properly to pandas?
On the snippet that I can cut and paste from the question (which I named test.txt), I could successfully read a dataframe via
Purging square brackets (with sed on a Linux command line, but this can be done e.g. with a text editor, or in python if need be)
sed -i 's/^\[//g' test.txt # remove left square brackets assuming they are at the beginning of the line
sed -i 's/\]$//g' test.txt # remove right square brackets assuming they are at the end of the line
Loading the dataframe (in a python console)
import pandas as pd
pd.read_csv("test.txt", skipinitialspace = True, quotechar='"')
(not sure that this will work for the entirety of your file though).
Consider below code which reads the text in myfile.text which looks like below:
["Author", "Message"]
["littleblackcat", " There's a lot of redditors here that live in the area maybe/hopefully someone saw something. "]
["Kruse", "In other words ,it's basically creating a mini tornado."]
The code below removes [ and ] from the text and then splits every string in the list of string by , excluding the first string which are headers. Some Message contains ,, which causes another column (NAN otherwise) and hence the code takes them into one string, which intended.
Code:
with open('myfile.txt', 'r') as my_file:
text = my_file.read()
text = text.replace("[", "")
text = text.replace("]", "")
df = pd.DataFrame({
'Author': [i.split(',')[0] for i in text.split('\n')[1:]],
'Message': [''.join(i.split(',')[1:]) for i in text.split('\n')[1:]]
}).applymap(lambda x: x.replace('"', ''))
Output:
Author Message
0 littleblackcat There's a lot of redditors here that live in the area maybe/hopefully someone saw something.
1 Kruse In other words it's basically creating a mini tornado.
Here are a few more options to add to the mix:
You could use parse the lines yourself using ast.literal_eval, and then load them into a pd.DataFrame directly using an iterator over the lines:
import pandas as pd
import ast
with open('data', 'r') as f:
lines = (ast.literal_eval(line) for line in f)
header = next(lines)
df = pd.DataFrame(lines, columns=header)
print(df)
Note, however, that calling ast.literal_eval once for each line may not be very fast, especially if your data file has a lot of lines. However, if the data file is not too big, this may be an acceptable, simple solution.
Another option is to wrap an arbitrary iterator (which yields bytes) in an IterStream. This very general tool (thanks to Mechanical snail) allows you to manipulate the contents of any file and then re-package it into a file-like object. Thus, you can fix the contents of the file, and yet still pass it to any function which expects a file-like object, such as pd.read_csv. (Note: I've answered a similar question using the same tool, here.)
import io
import pandas as pd
def iterstream(iterable, buffer_size=io.DEFAULT_BUFFER_SIZE):
"""
http://stackoverflow.com/a/20260030/190597 (Mechanical snail)
Lets you use an iterable (e.g. a generator) that yields bytestrings as a
read-only input stream.
The stream implements Python 3's newer I/O API (available in Python 2's io
module).
For efficiency, the stream is buffered.
"""
class IterStream(io.RawIOBase):
def __init__(self):
self.leftover = None
def readable(self):
return True
def readinto(self, b):
try:
l = len(b) # We're supposed to return at most this much
chunk = self.leftover or next(iterable)
output, self.leftover = chunk[:l], chunk[l:]
b[:len(output)] = output
return len(output)
except StopIteration:
return 0 # indicate EOF
return io.BufferedReader(IterStream(), buffer_size=buffer_size)
def clean(f):
for line in f:
yield line.strip()[1:-1]+b'\n'
with open('data', 'rb') as f:
# https://stackoverflow.com/a/50334183/190597 (Davide Fiocco)
df = pd.read_csv(iterstream(clean(f)), skipinitialspace=True, quotechar='"')
print(df)
A pure pandas option is to change the separator from , to ", " in order to have only 2 columns, and then, strip the unwanted characters, which to my understanding are [,], " and space:
import pandas as pd
import io
string = '''
["Author", "Message"]
["littleblackcat", " There's a lot of redditors here that live in the area maybe/hopefully someone saw something. "]
["Kruse", "In other words, it's basically creating a mini tornado."]
'''
df = pd.read_csv(io.StringIO(string),sep='\", \"', engine='python').apply(lambda x: x.str.strip('[\"] '))
# the \" instead of simply " is to make sure python does not interpret is as an end of string character
df.columns = [df.columns[0][2:],df.columns[1][:-2]]
print(df)
# Output (note the space before the There's is also gone
# Author Message
# 0 littleblackcat There's a lot of redditors here that live in t...
# 1 Kruse In other words, it's basically creating a mini...
For now the following solution was found:
sep = '[|"|]'
Using a multi-character separator allowed for the brackets to be stored in different columns in a pandas dataframe, which were then dropped. This avoids having to strip the words line for line.

Extracting data by regex and writing to CSV, Python glob (pandas?)

I have a large list of varyingly dirty CSVs containing phone numbers in various formats. What I want is to comb through all of them and export to a single-column CSV of cleaned phone numbers in a simple format. So far, I have pieced together something to work, though it has some issues: (partial revision further below)
import csv
import re
import glob
import string
with open('phonelist.csv', 'wb') as out:
seen = set()
output = []
out_writer = csv.writer(out)
csv_files = glob.glob('CSVs\*.csv')
for filename in csv_files:
with open(filename, 'rbU') as ifile:
read = csv.reader(ifile)
for row in read:
for column in row:
s1 = column.strip()
if re.match(r'\b\d\d\d\d\d\d\d\d\d\d\b', s1):
if s1 not in seen:
seen.add(s1)
output.append(s1)
elif re.search(r'\b\(\d\d\d\) \d\d\d-\d\d\d\d\b', s1):
s2 = filter(lambda x: x in string.digits, s1)
if s2 not in seen:
seen.add(s2)
output.append(s2)
for val in output:
out_writer.writerow([val])
I'm putting this together with no formal python knowledge, just piecing things I've gleaned on this site. Any advice regarding pythonic stylization or utilizing the pandas library for shortcuts would all be welcome.
First issue: What's the simplest way to filter to just the matched values? IE, I may get 9815556667 John Smith, but I just want the number.
Second issue: This takes forever. I assume it's the lambda part. Is there a faster or more efficient method?
Third issue: How do I glob *.csv in the directory of the program and the CSVs directory (as written)?
I know that's several questions at once, but I got myself halfway there. Any additional pointers are appreciated.
For examples, requested, this isn't from a file (these are multi-gigabyte files), but here's what I'm looking for:
John Smith, (981) 991-0987, 9987765543 extension 541, 671 Maple St 98402
(998) 222-0011, 13949811123, Foo baR Us, 2567 Appleberry Lane
office, www.somewebpage.com, City Group, Anchorage AK
9281239812
(345) 666-7777
Should become:
9819910987
9987765543
9982220011
3949811123
3456667777
(I forgot that I need to drop a leading 1 from 11-digit numbers, too)
EDIT: I've changed my current code to incorporate Shahram's advice, so now, from for column in row above, I have, instead of above:
for column in row:
s1 = column.strip()
result = re.match(
r'.*(\+?[2-9]?[0-9]?[0-9]?-?\(?[0-9][0-9][0-9]\)? ?[0-9][0-9][0-9]-?[0-9][0-9][0-9][0-9]).*', s1) or re.match(
r'.*(\+?[2-9]?[0-9]?[0-9]?-?\(?[0-9][0-9][0-9]\)?-?[0-9][0-9][0-9]-?[0-9][0-9][0-9][0-9]).*', s1)
if result:
tempStr = result.group(1)
for ch in ['(', ')', '-', ' ']:
tempStr = tempStr.replace(ch, '')
if tempStr not in seen:
seen.add(tempStr)
output.append(tempStr)
This seems to work for my purposes, but I still don't know how to glob the current directory and subdirectory, and I still don't know if my code has issues that I'm unaware of because of my hodge-podge-ing. Also, in my larger directory, this is taking forever - as in, about a gig of CSVs is timing out for me (by my hand) at around 20 minutes. I don't know if it's hitting a snag, but judging by the speed at which python normally chews through any number of CSVs, it feels like I'm missing something.
About your first question, You can use the below regular expression to capture different types of Phone Numbers:
result = re.match(r'.*(\+?[0-9]?[0-9]?[0-9]?-?\(?[0-9][0-9][0-9]\)?-?[0-9][0-9][0-9]-?[0-9][0-9][0-9][0-9]).*', s1)
if result:
if result.group(1) not in seen:
seen.add(result.group(1))
output.append(result.group(1))
About your second question: You may want to look at the replace function. So the above code can be changed to:
result = re.match(r'.*(\+?[0-9]?[0-9]?[0-9]?-?\(?[0-9][0-9][0-9]\)?-?[0-9][0-9][0-9]-?[0-9][0-9][0-9][0-9]).*', s1)
if result:
if result.group(1) not in seen:
tempStr = result.group(1)
tempStr.replace('-','')
seen.add(tempStr)
output.append(tempStr)

Taking file name from list and opening it?

I have list similar to this:
m=[['qw','wew','23','C:/xyz/s.wav'],['qw','wew','23','C:/xyz/s2.wav'],['qw','wew','23','C:/xyz/s1.wav']]
Now I want to these files
win=wave.open(m[0][3],'rb')
It is giving error how can I use this in this way...
I want to take the files name from the list
Please suggest???
do this:
m = [['qw','wew','23','C:/xyz/s.wav'],['qw','wew','23','C:/xyz/s2.wav'],['qw','wew','23','C:/xyz/s1.wav']]
fname = m[0][3]
print 'fname is', repr(fname)
win = wave.open(fname, 'rb')
and show us (using copy/paste into an edit of your question) everything that is printed, especially
(1) the result of print 'fname is', repr(fname)
(2) the ERROR MESSAGE

Categories