Python print and write output end in ". . ." rather than the complete line - python

I have tried moving around the strings and variables I am concatenating, using while loops, moved the line and method that I am opening the outfile, etc. No matter what I do my output prints/writes "curl" + my url variable. From there it ends in "..." ex: curl "https://examplesite/...
Does this have something to do with a buffer or slicing problem? Thank you for any and all help. Full code below.
import pandas as pd
# file = open("output.txt","wt")
header_list = ["COLA", "COLB"]
df = pd.read_csv("curl_data.csv", names=header_list)
df_length = len(df)
iterator = 0
with open("output.txt", "w") as file:
for row in df.iterrows():
url = '"https://examplesite'
lic = df.COLA # use %20 instead of spaces
name = df.COLB # use %20 instead of spaces
group = "example group" # use %20 instead of spaces
command = "curl " + url + "license=" + lic + "&name=" + name + "&group=" + group + '"'
print(command)
file.write(str(command))
iterator += 1
if iterator == 1:
break
file.close()

Solved. As Imre Kerr suggested in the comments the problem was with the length of the output.
I changed my for loop to be for i in range(len(df)): this only looped through the dataframe once (as per Barmars suggestion) and changed the references to the columns in my code from df.COLA to df.loc[i, "COLA] so that it did not print the whole dataset everytime. This fixed the problem of the lines being too long and thus I was able to see the full line for each outputted string.

Related

How to add a Google formula containing commas and quotes to a CSV file?

I'm trying to output a CSV file from Python and make one of the entries a Google sheet formula:
This is what the formula var would look like:
strLink = "https://xxxxxxx.xxxxxx.com/Interact/Pages/Content/Document.aspx?id=" + strId + "&SearchId=0&utm_source=interact&utm_medium=general_search&utm_term=*"
strLinkCellFormula = "=HYPERLINK(\"" + strLink + "\", \"" + strTitle + "\")"
and then for each row of the CSV I have this:
strCSV = strCSV + strId + ", " + "\"" + strTitle + "\", " + strAuthor + ", " + strDate + ", " + strStatus + ", " + "\"" + strSection + "\", \"" + strLinkCellFormula +"\"\n"
Which doesn't quite work, the hyperlink formula for Google sheets is like so:
=HYPERLINK(url, title)
and I can't seem to get that comma escaped. So in my Sheet I am getting an additional column with the title in it and obviously the formula does not work. Any help would be appreciated.
Try using ; as the formula argument separator. It should work the same.
Instead of reinventing the wheel, you should write your CSV rows using the builtin csv.writer class. This takes care of escaping any commas and quotes in the data, so you don't need to build your own escape logic. This helps you avoid the mess of escaping in your strLinkCellFormula = ... and strCSV = strCSV + ... lines.
For example:
import csv
urls = ["https://google.com", "https://stackoverflow.com/", "https://www.python.org/"]
titles = ["Google", "Stack Overflow", "Python"]
with open("file.csv", "w") as fw:
writer = csv.writer(fw)
writer.writerow(["Company", "Website"])
for u, t in zip(urls, titles):
formula = f'=HYPERLINK("{u}", "Visit {t}")'
row = [t, formula]
writer.writerow(row)
Note that in the line formula = ... above, I used the f-string syntax to format the URL and title into the string. I also used apostrophes to define the string, since I knew that the string was going to contain quotation marks and I didn't want to bother escaping them.
This gives the following CSV:
Company,Website
Google,"=HYPERLINK(""https://google.com"", ""Visit Google"")"
Stack Overflow,"=HYPERLINK(""https://stackoverflow.com/"", ""Visit Stack Overflow"")"
Python,"=HYPERLINK(""https://www.python.org/"", ""Visit Python"")"
where the escaping of commas and quotes is already taken care of.
It is also read by Excel/GSheets correctly, since it conforms to the standard CSV format:
For your specific case, you'd write to your CSV file like so:
with open(filename, "w") as wf:
writer = csv.writer(wf)
writer.writerow(headers) # if necessary
for ...:
strLink = f"https://xxxxxxx.xxxxxx.com/Interact/Pages/Content/Document.aspx?id={strID}&SearchId=0&utm_source=interact&utm_medium=general_search&utm_term=*"
strLinkCellFormula = f'=HYPERLINK("{strLink}", "{strTitle}")'
row = [strId, strTitle, strAuthor, strDate, strStatus, strSection, strLinkCellFormula]
writer.writerow(row)

Python: Writing multiple variables to a file

I'm fairly new to Python and I've written a scraper that prints the data I scrap the exact way I need it, but I'm having trouble writing the data to a file. I need it to look the exact same way and be in the same order as it does when it prints in IDLE
import requests
import re
from bs4 import BeautifulSoup
year_entry = raw_input("Enter year: ")
week_entry = raw_input("Enter week number: ")
week_link = requests.get("http://sports.yahoo.com/nfl/scoreboard/?week=" + week_entry + "&phase=2&season=" + year_entry)
page_content = BeautifulSoup(week_link.content)
a_links = page_content.find_all('tr', {'class': 'game link'})
for link in a_links:
r = 'http://www.sports.yahoo.com' + str(link.attrs['data-url'])
r_get = requests.get(r)
soup = BeautifulSoup(r_get.content)
stats = soup.find_all("td", {'class':'stat-value'})
teams = soup.find_all("th", {'class':'stat-value'})
scores = soup.find_all('dd', {"class": 'score'})
try:
game_score = scores[-1]
game_score = game_score.text
x = game_score.split(" ")
away_score = x[1]
home_score = x[4]
home_team = teams[1]
away_team = teams[0]
away_team_stats = stats[0::2]
home_team_stats = stats[1::2]
print away_team.text + ',' + away_score + ',',
for stats in away_team_stats:
print stats.text + ',',
print '\n'
print home_team.text + ',' + home_score +',',
for stats in home_team_stats:
print stats.text + ',',
print '\n'
except:
pass
I am totally confused on how to get this to print to a txt file the same way it prints in IDLE. The code is built to only run on completed weeks of the NFL season. So if you test the code, I recommend year = 2014 and week = 12 (or before)
Thanks,
JT
To write to a file you need to build up the line as a string, then write that line to a file.
You'd use something like:
# Open/create a file for your output
with open('my_output_file.csv', 'wb') as csv_out:
...
# Your BeautifulSoup code and parsing goes here
...
# Then build up your output strings
for link in a_links:
away_line = ",".join([away_team.text, away_score])
for stats in away_team_stats:
away_line += [stats.text]
home_line = ",".join(home_team.text, home_score])
for stats in home_team_stats:
home_line += [stats.text]
# Write your output strings to the file
csv_out.write(away_line + '\n')
csv_out.write(home_line + '\n')
This is a quick and dirty fix. To do it properly you probably want to look into the csv module (docs)
From the structure of your output I agree with Jamie that using CSV is a logical choice.
But since you're using Python 2, it's possible to use an alternate form of the print statement to print to a file.
From https://docs.python.org/2/reference/simple_stmts.html#the-print-statement
print also has an extended form, defined by the second portion of the
syntax described above. This form is sometimes referred to as “print
chevron.” In this form, the first expression after the >> must
evaluate to a “file-like” object, specifically an object that has a
write() method as described above. With this extended form, the
subsequent expressions are printed to this file object. If the first
expression evaluates to None, then sys.stdout is used as the file for
output.
Eg,
outfile = open("myfile.txt", "w")
print >>outfile, "Hello, world"
outfile.close()
However, this syntax is not supported in Python 3, so I guess it's probably not a good idea to use it. :) FWIW, I generally use the file write() method in my code when writing to files, except that I tend to use print >>sys.stderr for error messages.

Use grep on file in Python

I have searched the grep answers on here and cannot find an answer. They all seem to search for a string in a file, not a list of strings from a file. I already have a search function that works, but grep does it WAY faster. I have a list of strings in a file sn.txt (with one string on each line, no deliminators). I want to search another file (Merge_EXP.exp) for lines that have a match and write it out to a new file. The file I am searching in has a half millions lines, so searching for a few thousand in there takes hours without grep.
When I run it from command prompt in windows, it does it in minutes:
grep --file=sn.txt Merge_EXP.exp > Merge_EXP_Out.exp
How can I call this same process from Python? I don't really want alternatives in Python because I already have one that works but takes a while. Unless you think you can significantly improve the performance of that:
def match_SN(serialnumb, Exp_Merge, output_exp):
fout = open(output_exp,'a')
f = open(Exp_Merge,'r')
# skip first line
f.readline()
for record in f:
record = record.strip().rstrip('\n')
if serialnumb in record:
fout.write (record + '\n')
f.close()
fout.close()
def main(Output_CSV, Exp_Merge, updated_exp):
# create a blank output
fout = open(updated_exp,'w')
# copy header records
f = open(Exp_Merge,'r')
header1 = f.readline()
fout.write(header1)
header2 = f.readline()
fout.write(header2)
fout.close()
f.close()
f_csv = open(Output_CSV,'r')
f_csv.readline()
for rec in f_csv:
rec_list = rec.split(",")
sn = rec_list[2]
sn = sn.strip().rstrip('\n')
match_SN(sn,Exp_Merge,updated_exp)
Here is a optimized version of pure python code:
def main(Output_CSV, Exp_Merge, updated_exp):
output_list = []
# copy header records
records = open(Exp_Merge,'r').readlines()
output_list = records[0:2]
serials = open(Output_CSV,'r').readlines()
serials = [x.split(",")[2].strip().rstrip('\n') for x in serials]
for s in serials:
items = [x for x in records if s in x]
output_list.extend(items)
open(updated_exp, "w").write("".join(output_list))
main("sn.txt", "merge_exp.exp", "outx.txt")
Input
sn.txt:
x,y,0011
x,y,0002
merge_exp.exp:
Header1
Header2
0011abc
0011bcd
5000n
5600m
6530j
0034k
2000lg
0002gg
Output
Header1
Header2
0011abc
0011bcd
0002gg
Try this out and see how much time it takes...
When I use full path to grep location it worked (I pass it the grep_loc, Serial_List, Export):
import os
Export_Dir = os.path.dirname(Export)
Export_Name = os.path.basename(Export)
Output = Export_Dir + "\Output_" + Export_Name
print "\nOutput: " + Output + "\n"
cmd = grep_loc + " --file=" + Serial_List + " " + Export + " > " + Output
print "grep usage: \n" + cmd + "\n"
os.system(cmd)
print "Output created\n"
I think you have not chosen the right title for your question: What you want to do is the equivalent of a database JOIN. You can use grep for that in this particular instance, because one of your files only has keys and no other information. However, I think it is likely (but of course I don't know your case) that in the future your sn.txt may also contain extra information.
So I would solve the generic case. There are multiple solutions:
import all data into a database, then do a LEFT JOIN (in sql) or equivalent
use a python large data tool
For the latter, you could try numpy or, recommended because you are working with strings, pandas. Pandas has an optimized merge routine, which is very fast in my experience (uses cython under the hood).
Here is pandas PSEUDO code to solve your problem. It is close to real code but I need to know the names of the columns that you want to match on. I assumed here the one column in sn.txt is called key, and the matching column in merge_txt is called sn. I also see you have two header lines in merge_exp, read the docs for that.
# PSEUDO CODE (but close)
import pandas
left = pandas.read_csv('sn.txt')
right = pandas.read_csv('merge_exp.exp')
out = pandas.merge(left, right, left_on="key", right_on="sn", how='left')
out.to_csv("outx.txt")

python & pyparsing newb: how to open a file

Paul McGuire, the author of pyparsing, was kind enough to help a lot with a problem I'm trying to solve. We're on 1st down with a yard to goal, but I can't even punt it across the goal line. Confucius said if he gave a student 1/4 of the solution, and he did not return with the other 3/4s, then he would not teach that student again. So it is after almost a week of frustation and with great anxiety that I ask this...
How do I open an input file for pyparsing and print the output to another file?
Here is what I've got so far, but it's really all his work
from pyparsing import *
datafile = open( 'test.txt' )
# Backaus Nuer Form
num = Word(nums)
accessionDate = Combine(num + "/" + num + "/" + num)("accDate")
accessionNumber = Combine("S" + num + "-" + num)("accNum")
patMedicalRecordNum = Combine(num + "/" + num + "-" + num + "-" + num)("patientNum")
gleason = Group("GLEASON" + Optional("SCORE:") + num("left") + "+" + num("right") + "=" + num("total"))
patientData = Group(accessionDate + accessionNumber + patMedicalRecordNum)
partMatch = patientData("patientData") | gleason("gleason")
lastPatientData = None
# PARSE ACTIONS
def patientRecord( datafile ):
for match in partMatch.searchString(datafile):
if match.patientData:
lastPatientData = match
elif match.gleason:
if lastPatientData is None:
print "bad!"
continue
print "{0.accDate}: {0.accNum} {0.patientNum} Gleason({1.left}+{1.right}={1.total})".format(
lastPatientData.patientData, match.gleason
)
patientData.setParseAction(lastPatientData)
# MAIN PROGRAM
if __name__=="__main__":
patientRecord()
It looks like you need to call datafile.read() in order to read the contents of the file. Right now you are trying to call searchString on the file object itself, not the text in the file. You should really look at the Python tutorial (particularly this section) to get up to speed on how to read files, etc.
It seems like you need some help putting it together. The advice of #BrenBarn is spot-on, work with problem of simple complexity before you put it all together. I can help by giving you a minimal example of what you are trying to do, with a much simpler grammar. You can use this as a template to learn how to read/write a file in python. Consider the input text file data.txt:
cat 3
dog 5
foo 7
Let's parse this file and output the results. To have some fun, let's mulpitply the second column by 2:
from pyparsing import *
# Read the input data
filename = "data.txt"
FIN = open(filename)
TEXT = FIN.read()
# Define a simple grammar for the text, multiply the first col by 2
digits = Word(nums)
digits.setParseAction(lambda x:int(x[0]) * 2)
blocks = Group(Word(alphas) + digits)
grammar = OneOrMore(blocks)
# Parse the results
result = grammar.parseString( TEXT )
# This gives a list of lists
# [['cat', 6], ['dog', 10], ['foo', 14]]
# Open up a new file for the output
filename2 = "data2.txt"
FOUT = open(filename2,'w')
# Walk through the results and write to the file
for item in result:
print item
FOUT.write("%s %i\n" % (item[0],item[1]))
FOUT.close()
This gives in data2.txt:
cat 6
dog 10
foo 14
Break each piece down until you understand it. From here, you can slowly adapt this minimal example to your more complex problem above. It's OK to read the file in (as long as it is relatively small) since Paul himself notes:
parseFile is really just a simple shortcut around parseString, pretty
much the equivalent of expr.parseString(open(filename).read()).

Python RegEx nested search and replace

I need to to a RegEx search and replace of all commas found inside of quote blocks.
i.e.
"thing1,blah","thing2,blah","thing3,blah",thing4
needs to become
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
my code:
inFile = open(inFileName,'r')
inFileRl = inFile.readlines()
inFile.close()
p = re.compile(r'["]([^"]*)["]')
for line in inFileRl:
pg = p.search(line)
# found comment block
if pg:
q = re.compile(r'[^\\],')
# found comma within comment block
qg = q.search(pg.group(0))
if qg:
# Here I want to reconstitute the line and print it with the replaced text
#print re.sub(r'([^\\])\,',r'\1\,',pg.group(0))
I need to filter only the columns I want based on a RegEx, filter further,
then do the RegEx replace, then reconstitute the line back.
How can I do this in Python?
The csv module is perfect for parsing data like this as csv.reader in the default dialect ignores quoted commas. csv.writer reinserts the quotes due to the presence of commas. I used StringIO to give a file like interface to a string.
import csv
import StringIO
s = '''"thing1,blah","thing2,blah","thing3,blah"
"thing4,blah","thing5,blah","thing6,blah"'''
source = StringIO.StringIO(s)
dest = StringIO.StringIO()
rdr = csv.reader(source)
wtr = csv.writer(dest)
for row in rdr:
wtr.writerow([item.replace('\\,',',').replace(',','\\,') for item in row])
print dest.getvalue()
result:
"thing1\,blah","thing2\,blah","thing3\,blah"
"thing4\,blah","thing5\,blah","thing6\,blah"
General Edit
There was
"thing1\\,blah","thing2\\,blah","thing3\\,blah",thing4
in the question, and now it is not there anymore.
Moreover, I hadn't remarked r'[^\\],'.
So, I completely rewrite my answer.
"thing1,blah","thing2,blah","thing3,blah",thing4
and
"thing1\,blah","thing2\,blah","thing3\,blah",thing4
being displays of strings (I suppose)
import re
ss = '"thing1,blah","thing2,blah","thing3\,blah",thing4 '
regx = re.compile('"[^"]*"')
def repl(mat, ri = re.compile('(?<!\\\\),') ):
return ri.sub('\\\\',mat.group())
print ss
print repr(ss)
print
print regx.sub(repl, ss)
print repr(regx.sub(repl, ss))
result
"thing1,blah","thing2,blah","thing3\,blah",thing4
'"thing1,blah","thing2,blah","thing3\\,blah",thing4 '
"thing1\blah","thing2\blah","thing3\,blah",thing4
'"thing1\\blah","thing2\\blah","thing3\\,blah",thing4 '
You can try this regex.
>>> re.sub('(?<!"),(?!")', r"\\,",
'"thing1,blah","thing2,blah","thing3,blah",thing4')
#Gives "thing1\,blah","thing2\,blah","thing3\,blah",thing4
The logic behind this is to substitute a , with \, if it is not immediately both preceded and followed by a "
I came up with an iterative solution using several regex functions:
finditer(), findall(), group(), start() and end()
There's a way to turn all this into a recursive function that calls itself.
Any takers?
outfile = open(outfileName,'w')
p = re.compile(r'["]([^"]*)["]')
q = re.compile(r'([^\\])(,)')
for line in outfileRl:
pg = p.finditer(line)
pglen = len(p.findall(line))
if pglen > 0:
mpgstart = 0;
mpgend = 0;
for i,mpg in enumerate(pg):
if i == 0:
outfile.write(line[:mpg.start()])
qg = q.finditer(mpg.group(0))
qglen = len(q.findall(mpg.group(0)))
if i > 0 and i < pglen:
outfile.write(line[mpgend:mpg.start()])
if qglen > 0:
for j,mqg in enumerate(qg):
if j == 0:
outfile.write( mpg.group(0)[:mqg.start()] )
outfile.write( re.sub(r'([^\\])(,)',r'\1\\\2',mqg.group(0)) )
if j == (qglen-1):
outfile.write( mpg.group(0)[mqg.end():] )
else:
outfile.write(mpg.group(0))
if i == (pglen-1):
outfile.write(line[mpg.end():])
mpgstart = mpg.start()
mpgend = mpg.end()
else:
outfile.write(line)
outfile.close()
have you looked into str.replace()?
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old
replaced by new. If the optional argument count is given, only the
first count occurrences are replaced.
here is some documentation
hope this helps

Categories