Given these 2 different string interpolation stanzas, neither one works. Both return the {n} and %(text) inserts as plain, raw text in the output. What am I missing?
Ive been using the %(string)s method forever in python2, now porting to python 3.
bulk_string = (
"bulk insert SomeDB.dbo.%(lru)s\n" +
"from 'C:\\Someplace\\"
"project\\%(filename)s'\n" +
"with (\n" +
" FIELDTERMINATOR = ',',\n" +
" ROWTERMINATOR = '%(term)s'\n" +
");"
% {
'lru': lru,
'filename': filename,
'term' : "\n"
}
)
and:
bulk_string = (
"bulk insert SomeDB.dbo.{0}\n" +
"from 'C:\\Someplace\\"
"project\\{1}'\n" +
"with (\n" +
" FIELDTERMINATOR = ',',\n" +
" ROWTERMINATOR = '{2}'\n" +
");"
.format(lru, filename, "\n")
)
either format or % apply only to the last string of your added strings. You could use """ (triple quoted strings) or parenthesize the strings (that, you did, but incorrectly):
bulk_string = (
"bulk insert SomeDB.dbo.{0}\n" +
"from 'C:\\Someplace\\"
"project\\{1}'\n" +
"with (\n" +
" FIELDTERMINATOR = ',',\n" +
" ROWTERMINATOR = '{2}'\n" +
");")
.format(lru, filename, "\\n")
or with triple quotes/raw string/automatic format positionning:
bulk_string = r"""bulk insert SomeDB.dbo.{}
from 'C:\Someplace\project\{}'
with (
FIELDTERMINATOR = ',',
ROWTERMINATOR = '{}'
);""".format(lru, filename, "\\n")
Aside: the third parameter of format should be \\n or r\n if you want to generate a literal \n in your code.
Here is the most readable way to do this:
In Python 3, you can use literal string interpolation, or f-strings, see PEP 498 - just be careful with escaping the backslashes properly, especially in front of a curly bracket such as in C:\Someplace\project\\{filename}
lru = "myTable"
filename = "myFile"
rt = "\\n"
bulk_string = f"""bulk insert SomeDB.dbo.{lru}
from 'C:\Someplace\project\\{filename}'
with ( FIELDTERMINATOR = ,
ROWTERMINATOR = '{rt}');"""
Related
Can someone explain to me what
nextLine().split("\\s+")
does and how would I convert that to python?
Thanks
i wanted to use it but its in java
split takes an input string, possibly a regular expression (in your case) and uses the regex as a delimiter. Here, the regex is simply \s+ (the extra backslash is to escape the string), where \s denotes any sort of white space and + means "one or more", so basically, if I have the string "Hello world ! ." you will have the output ["Hello", "world", "!", "."].
In Python, you need to use the re library for this functionality:
re.split(r"\s+", input_str)
Or, just for this specific case (as #Kurt pointed out), input_str.split() will do the trick.
The nextLine() is used to read user input, and split("\\s+") will split it to a bunch of elements based on a specific delimiter, and for this case the delim is a regex \\s+.
The equivalent of it in python is this, by using the :
import re
s = input()
sub_s = re.split(r"\s+", s)
# hello and welcome everyone
# ['hello', 'and', 'welcome', 'everyone']
code in java
import java.util.*;
public class MyClass {
public static void main(String args[]) {
String s = "Hello my Wonderful\nWorld!";
// nextLine()
Scanner scanner = new Scanner(s);
System.out.println("'" + scanner.nextLine() + "'");
System.out.println("'" + scanner.nextLine() + "'");
scanner.close();
// nextLine().split("\\s+")
scanner = new Scanner(s);
String str[] = scanner.nextLine().split("\\s+");
System.out.println("*" + str[2] + "*");
scanner.close();
}
}
python
s = "Hello my Wonderful\nWorld!";
o = s.split("\n")
print ("'" + o[0] + "'")
print ("'" + o[1] + "'")
'''
resp. use of
i = s.find('\n')
print (s[:i])
print (s[i+1:])
e.g.
def get_lines(str):
start = 0
end = 0
sub = '\n'
while True:
end = str.find(sub, start)
if end==-1:
yield str[start:]
return
else:
yield str[start:end]
start = end + 1
i = iter(get_lines(s))
print ("'" + next (i) + "'")
print ("'" + next (i) + "'")
'''
o = s.split()
print ("*" + o[2] + "*")
output
'Hello my Wonderful'
'World!'
*Wonderful*
I try to convert a string value into Dictionary format. I must respect this syntax but when I try to convert the string value into a dictionary I have a error. The json.load convert the bracket nested into a string and not into a dictionary format.
Do you have any idea to resolve this ?
import json
dict_info = "{\"strategy\" : \"" + "ok" + "\", \"aa\" : \"" + "{\"strategy\" : \"" +
"strg" + "\"}" + "\"}"
json.loads(dict_info)
Thank you
This is because here you are specifying that the value for key aa is a string by adding " before and after the value,
ie this part: \" " + "{\"strategy\" : \"" + "strg" + "\"}" + " \"
If you remove those the code should work, here
Code:
import json
dict_info = "{\"strategy\" : \"" + "ok" + "\", \"aa\" : " + "{\"strategy\" : \"" + "strg" + "\"}" + "}"
d = json.loads(dict_info)
print(d)
print(d['aa'])
Output:
{'strategy': 'ok', 'aa': {'strategy': 'strg'}}
{'strategy': 'strg'}
In attempt to utilize a foreign table for PostgreSQL's log analysis, I set log_destination=csvlog. Because our application uses large JSON/XML data, which get logged in the messages, from day 1 we have been getting errors like the following, because the large contents in the message or detail field get truncated due to certain system limit.
ERROR: extra data after last expected column
CONTEXT: COPY postgresql_log_1, line 268367: "2017-05-17 09:46:37.419 PDT,"user","dbname",75303,"10.181.55.93:50206",591549a8.12627,11820,"I..."
pgBadger similarly errors out with large contents like the following, which I have submitted an issue at https://github.com/dalibo/pgbadger/issues/342.
FATAL: cannot use CSV on /var/log/postgresql/postgresql-9.5-proddb.csv.1, EIQ - QUO character not allowed at line 766714
DETAIL: 2017-04-19 12:45:05.389 PDT,"user","dbname",56389,"10.181.55.94:50466",58f6f766.dc45,3870,"INSERT",2017-04-18 22:36:38 PDT,71/104232,3111351054,LOG,00000,"duration: 82.541 ms execute <unnamed>: insert into EVENT_NOTIFICATION (ACTOR_ID, CREATED, DATA, TYPE, UPDATED, ID) values ($1, $2, $3, $4, $5, $6)","parameters: $1 = NULL, $2 = '2017-04-19 12:45:05.245-07', $3 = '{""productID"":""093707231228"",""fileData"":{""name"":""EPSON010.JPG"",""mimeType"":""image/jpeg"",""content"":""/9j/4QEGRXhpZgAASUkqAAgAAAAIAA4BAgAQAA...
...AUUAF2017-04-19 12:45:11.174 PDT,"user","dbname",56389,"10.181.55.94:50466",58f6f766.dc45,3871,"SELECT",2017-04-18 22:36:38 PDT,71/104241,0,LOG,00000,"duration: 125.202 ms execute <unnamed>: select max(cast(version as integer)) from category where org = $1","parameters: $1 = 'ST'",,,,,,,,""
reset CSV parser
I ended up with a workaround using the following Python script to repair the bad CSV contents. I can now use a foreign table to analyze my csvlog via SQL queries, by getting rid of a few bad CSV records from the log.
I figure this must have been a common issue with a lot of Postgres folks out there and hope to find a more elegant approach to this issue. Kindly share your solution on this post, if you have one.
Here is mine for now.
#!/usr/bin/env python
import csv
import os
import sys
import linecache as lc
import pyparsing as pp
from datetime import datetime
csv.field_size_limit(sys.maxsize) # Max out the limit for large CSV contents
filename = sys.argv[1]
headers = [ 'log_time', 'user_name', 'database_name', 'process_id', 'connection_from',
'session_id', 'session_line_num', 'command_tag', 'session_start_time',
'virtual_transaction_id', 'transaction_id', 'error_severity', 'sql_state_code',
'message', 'detail', 'hint', 'internal_query', 'internal_query_pos', 'context',
'query', 'query_pos', 'location', 'application_name' ]
'''
Identify ranges of lines containing invalid CSV data
'''
bad_ranges = []
l_start = 0
with open(filename) as f:
reader = csv.DictReader(f, fieldnames=headers)
for csv_dict in reader:
# Extraneous columns beyond the predefined headers, keyed as None, indicate a bad csvlog.
if None in csv_dict:
bad_ranges += [(csv_dict, l_start, reader.line_num + 1,)]
else:
try: # Validate datetime format on log_time.
datetime.strptime(csv_dict['log_time'], '%Y-%m-%d %H:%M:%S.%f %Z')
except ValueError:
bad_ranges += [(csv_dict, l_start, reader.line_num + 1,)]
l_start = reader.line_num + 1
line_count = reader.line_num + 1
yyyy = pp.Word(pp.nums, exact=4).setName("yyyy")
mm = pp.Word(pp.nums, exact=2).setName("mm")
dd = pp.Word(pp.nums, exact=2).setName("dd")
HH24 = pp.Word(pp.nums, exact=2).setName("HH24")
MI = pp.Word(pp.nums, exact=2).setName("MI")
SS = pp.Word(pp.nums, exact=2).setName("SS")
TZ = pp.Word(pp.alphas.upper(), exact=3).setName("TZ")
date = yyyy + "-" + mm + "-" + dd
time = HH24 + ":" + MI + ":" + SS + pp.Optional("." + pp.Word(pp.nums, max=3)) + " " + TZ
timestamptz = pp.Combine(date + " " + time)
mlDblQuoteString = pp.QuotedString('"', escQuote='""', multiline=True)
slDblQuoteString = pp.QuotedString('"', escQuote='""', multiline=False)
comma = pp.Suppress(',')
validCSVLog = timestamptz("log_time") + comma \
+ slDblQuoteString('user_name') + comma \
+ slDblQuoteString('database_name') + comma \
+ pp.Word(pp.nums)('process_id') + comma \
+ slDblQuoteString('connection_from') + comma \
+ pp.Word(pp.hexnums + ".")('session_id') + comma \
+ pp.Word(pp.nums)('session_line_num') + comma \
+ slDblQuoteString('command_tag') + comma \
+ timestamptz('session_start_time') + comma \
+ pp.Combine(pp.Word(pp.nums) + pp.Literal("/")
+ pp.Word(pp.nums))('virtual_transaction_id') + comma \
+ pp.Word(pp.nums)('transaction_id') + comma \
+ pp.Word(pp.alphas.upper())('error_severity') + comma \
+ pp.Word(pp.alphanums)('sql_state_code') + comma \
+ pp.Optional(mlDblQuoteString)('message') + comma \
+ pp.Optional(mlDblQuoteString)('detail') + comma \
+ pp.Optional(mlDblQuoteString)('hint') + comma \
+ pp.Optional(mlDblQuoteString)('internal_query') + comma \
+ pp.Optional(pp.nums)('internal_query_pos') + comma \
+ pp.Optional(mlDblQuoteString)('context') + comma \
+ pp.Optional(mlDblQuoteString)('query') + comma \
+ pp.Optional(pp.nums)('query_pos') + comma \
+ pp.Optional(slDblQuoteString)('location') + comma \
+ pp.Optional(slDblQuoteString)('application_name') + pp.LineEnd().suppress()
'''
1. Scan for any valid CSV data to salvage from the malformed contents.
2. Make a new copy without the malformed CSV rows.
'''
if bad_ranges:
l_lower = 0
with open(filename+'.new', 'w') as t:
for bad_dict, l_start, l_end in bad_ranges:
t.writelines([lc.getline(filename, l) for l in range(l_lower,l_start)])
bad_csv = ''.join([lc.getline(filename, l) for l in range(l_start,l_end)])
print("{0:>8}: line[{1}:{2}] log_time={log_time} malformed CSV row found".format(
'NOTICE', l_start, l_end-1, **bad_dict))
for valid_dict, c_start, c_end in validCSVLog.scanString(bad_csv):
print("{0:>8}: line[{1}:{2}] log_time={log_time} as valid CSV portion retained".format(
'INFO', l_start, l_end-1, **valid_dict))
good_csv = bad_csv[c_start:c_end]
t.write(good_csv)
l_lower = l_end
t.writelines([lc.getline(filename, l) for l in range(l_lower, line_count)])
# Back up old file as .bak
backup = filename+'.bak'
os.rename(filename, backup)
print("{0:>8}: original file renamed to {1}".format('NOTICE', backup))
os.rename(filename+'.new', filename)
I am trying to build a string that needs to contain specific double and single quotation characters for executing a SQL expression.
I need my output to be formatted like this:
" "Full_Stree" = 'ALLENDALE RD' "
where the value of ALLENDALE RD will be a variable defined through a For Loop. In the following code sample, the variable tOS is what I am trying to pass into the query variable.
tOS = "ALLENDALE RD"
query = '" "Full_Stree" = ' + "'" + tOS + "' " + '"'
and when I print the value of query variable I get this output:
'" "Full_Stree" = \'ALLENDALE RD\' "'
The slashes are causing my query to fail. I also tried using a modulus operator to pass the value of the tOS variable, but get the same results:
where = '" "Full_Stree" = \'%s\' "' % (tOS)
print where
'" "Full_Stree" = \'ALLENDALE RD\' "'
How can I get my string concatenated into the correct format, leaving the slashes out of the expression?
What you are seeing is the repr of your string.
>>> s = '" "Full_Stree" = \'ALLENDALE RD\' "'
>>> s # without print console displays the repr
'" "Full_Stree" = \'ALLENDALE RD\' "'
>>> print s # with print the string itself is displayed
" "Full_Stree" = 'ALLENDALE RD' "
Your real problem is the extra quotes at the beginning and end of your where-clause.
This
query = '" "Full_Stree" = ' + "'" + tOS + "' " + '"'
should be
query = '"Full_Stree" = ' + "'" + tOS + "'"
It is more clearly written as
query = """"Full_Stree" = '%s'""" % tOS
The ArcGis docs recommend something more like this
dataset = '/path/to/featureclass/shapefile/or/table'
field = arcpy.AddFieldDelimiters(dataset, 'Full_Stree')
whereclause = "%s = '%s'" % (field, tOS)
arcpy.AddFieldDelimiters makes sure that the field name includes the proper quoting style for the dataset you are using (some use double-quotes and some use square brackets).
Somehow the way I already tried worked out:
where = '" "Full_Stree" = \'%s\' "' % (tOS)
print where
'" "Full_Stree" = \'ALLENDALE RD\' "'
Can't you just use triple quotes?
a=""" "Full_Street" = 'ALLENDALE RD' """
print a
"Full_Street" = 'ALLENDALE RD'
When I tried to parse a csv which was exported by MS SQL 2005 express edition's query, the string python gives me is totally unexpected. For example if the line in the csv file is :"
aaa,bbb,ccc,dddd", then when python parsed it as string, it becomes :" a a a a , b b b , c c c, d d d d" something like that.....What happens???
I tried to remove the space in the code but don't work.
import os
import random
f1 = open('a.txt', 'r')
f2 = open('dec_sql.txt', 'w')
text = 'abc'
while(text != ''):
text = f1.readline()
if(text==''):
break
splited = text.split(',')
for i in range (0, 32):
splited[i] = splited[i].replace(' ', '')
sql = 'insert into dbo.INBOUND_RATED_DEC2010 values ('
sql += '\'' + splited[0] + '\', '
sql += '\'' + splited[1] + '\', '
sql += '\'' + splited[2] + '\', '
sql += '\'' + splited[3] + '\', '
sql += '\'' + splited[4] + '\', '
sql += '\'' + splited[5] + '\', '
sql += '\'' + splited[6] + '\', '
sql += '\'' + splited[7] + '\', '
sql += '\'' + splited[8] + '\', '
sql += '\'' + splited[9] + '\', '
sql += '\'' + splited[10] + '\', '
sql += '\'' + splited[11] + '\', '
sql += '\'' + splited[12] + '\', '
sql += '\'' + splited[13] + '\', '
sql += '\'' + splited[14] + '\', '
sql += '\'' + splited[15] + '\', '
sql += '\'' + splited[16] + '\', '
sql += '\'' + splited[17] + '\', '
sql += '\'' + splited[18] + '\', '
sql += '\'' + splited[19] + '\', '
sql += '\'' + splited[20] + '\', '
sql += '\'' + splited[21] + '\', '
sql += '\'' + splited[22] + '\', '
sql += '\'' + splited[23] + '\', '
sql += '\'' + splited[24] + '\', '
sql += '\'' + splited[25] + '\', '
sql += '\'' + splited[26] + '\', '
sql += '\'' + splited[27] + '\', '
sql += '\'' + splited[28] + '\', '
sql += '\'' + splited[29] + '\', '
sql += '\'' + splited[30] + '\', '
sql += '\'' + splited[31] + '\', '
sql += '\'' + splited[32] + '\' '
sql += ')'
print sql
f2.write(sql+'\n')
f2.close()
f1.close()
Sounds to me like the output of the MS SQL 2005 query is a unicode file. The python csv module cannot handle unicode files, but there is some sample code in the documentation for the csv module describing how to work around the problem.
Alternately, some text editors allow you to save a file with a different encoding. For example, I opened the results of a MS SQL 2005 query in Notepad++ and it told me the file was UCS-2 encoded and I was able to convert it to UTF-8 from the Encoding menu.
Try to open the file in notepad and use the replace all function to replace ' ' with ''
Your file is most likely encoded with a 2byte character encoding - most likely utf-16 (but it culd be some other encoding.
To get the CSV proper reading it, you'd open it with a codec so that it is decoded as its read - doing that you have Unicode objects (not string objects) inside your python program.
So, instead of opening the file with
my_file = open ("data.dat", "rt")
Use:
import codecs
my_file = codecs.open("data.dat", "rt", "utf-16")
And then feed this to the CSV module, with:
import csv
reader = csv.reader(my_file)
first_line = False
for line in reader:
if first_line: #skips header line
first_line = True
continue
#assemble sql query and issue it
Another thing is that your "query" being constructed into 32 lines of repetitive code is a nice thing to do when programing. Even in languages that lack rich string processing facilities, there are better ways to do it, but in Python, you can simply do:
sql = 'insert into dbo.INBOUND_RATED_DEC2010 values (%s);' % ", ".join("'%s'" % value for value in splited )
Instead of those 33 lines assembling your query. (I am telling it to insert a string inside
the parentheses on the first string. After the %operator, the string ", " is used with the "join" method so that it is used to paste together all elements on the sequence passed as a parameter to join. This sequence is made of a string, containing a value enclosed inside single quotes for each value in your splited array.
It may help to use Python's built in CSV reader. Looks like an issue with unicode, a problem that frustrated me a lot.
import tkFileDialog
import csv
ENCODING_REGEX_REPLACEMENT_LIST = [(re.compile('\xe2\x80\x99'), "'"),
(re.compile('\xe2\x80\x94'), "--"),
(re.compile('\xe2\x80\x9c'), '"'),
(re.compile('\xe2\x80\x9d'), '"'),
(re.compile('\xe2\x80\xa6'), '...')]
def correct_encoding(csv_row):
for key in csv_row.keys():
# if there is a value for the current key
if csv_row[key]:
try:
csv_row[key] = unicode(csv_row[key], errors='strict')
except ValueError:
# we have a bad encoding, try iterating through all the known
# bad encodings in the ENCODING_REGEX_REPLACEMENT and replace
# everything and then try again
for (regex, replacement) in ENCODING_REGEX_REPLACEMENT_LIST:
csv_row[key] = regex.sub(replacement,csv_row[key])
print(csv_row)
csv_row[key] = unicode(csv_row[key])
# if there is NOT a value for the current key
else:
csv_row[key] = unicode('')
return csv_row
filename = tkFileDialog.askopenfilename()
csv_reader = csv.DictReader(open(filename, "rb"), dialect='excel') # assuming similar dialect
for csv_row in csv_reader:
csv_row = correct_encoding(csv_row)
# your application logic here