Trying to learn some geospatial python. More or less following the class notes here.
My Code
#!/usr/bin/python
# import modules
import ogr, sys, os
# set working dir
os.chdir('/home/jacques/misc/pythongis/data')
# create the text file we're writing to
file = open('data_export.txt', 'w')
# import the required driver for .shp
driver = ogr.GetDriverByName('ESRI Shapefile')
# open the datasource
data = driver.Open('road_surveys.shp', 1)
if data is None:
print 'Error, could not locate file'
sys.exit(1)
# grab the datalayer
layer = data.GetLayer()
# loop through the features
feature = layer.GetNextFeature()
while feature:
# acquire attributes
id = feature.GetFieldAsString('Site_Id')
date = feature.GetFieldAsString('Date')
# get coordinates
geometry = feature.GetGeometryRef()
x = str(geometry.GetX())
y = str(geometry.GetY()
# write to the file
file.Write(id + ' ' + x + ' ' + y + ' ' + cover + '\n')
# remove the current feature, and get a new one
feature.Destroy()
feature = layer.GetNextFeature()
# close the data source
datasource.Destroy()
file.close()
Running that gives me the following:
File "shape_summary.py", line 38
file.write(id + ' ' + x + ' ' + y + ' ' + cover + '\n')
^
SyntaxError: invalid syntax
Running Python 2.7.1
Any help would be fantastic!
Previous line is missing a close parenthesis:
y = str(geometry.GetY())
Also, just a style comment: it's a good idea to avoid using the variable name file in python because it actually has a meaning. Try opening a new python session and running help(file)
1)write should shouldn't be upper case in your code (Python is case sensitive)
2)make sure id is a string; if it's isn't use str(id) in your term, same for "cover" and "x" and "y"
Related
I am trying to get each entity to draw a certain type of graph using the Understand Python API. All the inputs are good and the database is opened but the the single item is not drawn. There is no error and no output file. The code is listed below. The Python code is called from a C# application which calls upython.exe with the associated arguments.
The Python file receives the Scitools directory and opens the understand database. The entities are also an argument which is loaded into a temp file. The outputPath references the directory where the SVG file will be placed. I can't seem to figure out why the item.draw method isn't working.
import os
import sys
import argparse
import re
import json
#parse arguments
parser = argparse.ArgumentParser(description = 'Creates butterfly or call by graphs from and Understand DB')
parser.add_argument('PathToSci', type = str, help = "Path to Sci Understand libraries")
parser.add_argument('PathToDB', type = str, help = "Path to the Database you want to create graphs from")
parser.add_argument('PathToOutput', type = str, help='Path to where the graphs should be outputted')
parser.add_argument('TypeOfGraph', type = str,
help="The type of graph you want to generate. Same names as in Understand GUI. IE 'Butterfly' 'Called By' 'Control Flow' ")
parser.add_argument("entities", help='Path to json list file of Entity long names you wish to create graphs for')
args, unknown = parser.parse_known_args()
# they may have entered a path with a space broken into multiple strings
if len(unknown) > 0:
print("Unkown argument entered.\n Note: Individual arguments must be passed as a single string.")
quit()
pathToSci = args.PathToSci
pathToDB = args.PathToDB
graphType = args.TypeOfGraph
entities = json.load(open(args.entities,))
pathToOutput = args.PathToOutput
pathToSci = os.path.join(pathToSci, "Python")
sys.path.append(pathToSci)
import understand
db = understand.open(pathToDB)
count = 0
for name in entities:
count += 1
print("Completed: " + str(count) + "/" + str(len(entities)))
#if it is an empty name don't make a graph
if len(name) == 0:
break
pattern = re.compile((name + '$').replace("\\", "/"))
print("pattern: " + str(pattern))
sys.stdout.flush()
ent = db.lookup(pattern)
print("ent: " + str(ent))
sys.stdout.flush()
print("Type: " + str(type(ent[0])))
sys.stdout.flush()
for item in ent:
try:
filename = os.path.join(pathToOutput, item.longname() + ".svg")
print("Graph Type: " + graphType)
sys.stdout.flush()
print("filename: " + filename)
sys.stdout.flush()
print("Item Kind: " + str(ent[0].kind()))
sys.stdout.flush()
item.draw(graphType, filename)
except understand.UnderstandError:
print("error creating graph")
sys.stdout.flush()
except Exception as e:
print("Could not create graph for " + item.kind().longname() + ": " + item.longname())
sys.stdout.flush()
print(e)
sys.stdout.flush()
db.close()
The output is below:
Completed: 1/1
pattern: re.compile('CSC03.SIN_COS$')
ent: [#lCSC03.SIN_COS#kacsc03.sin_cos(long_float,csc03.sctype)long_float#f./../../../IOSSP/Source_Files/OGP/OGP_71/csc03/csc03.ada]
Type: <class 'understand.Ent'>
Graph Type: Butterfly
filename: C:\Users\M73720\Documents\DFS\DFS-OGP-25-Aug-2022-11-24\SVGs\Entities\CSC03.SIN_COS.svg
Item Kind: Function
It turns out that it was a problem in the Understand API. The latest build corrected the problem. This was found by talking with SciTools Support group.
So I have a rather general question I was hoping to get some help with. I put together a Python program that runs through and automates workflows at the state level for all the different counties. The entire program was created for research at school - not actual state work. Anyways, I have two designs shown below. The first is an updated version. It takes about 40 minutes to run. The second design shows the original work. Note that it is not a well structured design. However, it takes about five minutes to run the entire program. Could anybody give any insight why there are such differences between the two? The updated version is still ideal as it is much more reusable (can run and grab any dataset in the url) and easy to understand. Furthermore, 40 minutes to get about a hundred workflows completed is still a plus. Also, this is still a work in progress. A couple minor issues still need to be addressed in the code but it is still a pretty cool program.
Updated Design
import os, sys, urllib2, urllib, zipfile, arcpy
from arcpy import env
path = os.getcwd()
def pickData():
myCount = 1
path1 = 'path2URL'
response = urllib2.urlopen(path1)
print "Enter the name of the files you need"
numZips = raw_input()
numZips2 = numZips.split(",")
myResponse(myCount, path1, response, numZips2)
def myResponse(myCount, path1, response, numZips2):
myPath = os.getcwd()
for each in response:
eachNew = each.split(" ")
eachCounty = eachNew[9].strip("\n").strip("\r")
try:
myCountyDir = os.mkdir(os.path.expanduser(myPath+ "\\counties" + "\\" + eachCounty))
except:
pass
myRetrieveDir = myPath+"\\counties" + "\\" + eachCounty
os.chdir(myRetrieveDir)
myCount+=1
response1 = urllib2.urlopen(path1 + eachNew[9])
for all1 in response1:
allNew = all1.split(",")
allFinal = allNew[0].split(" ")
allFinal1 = allFinal[len(allFinal)-1].strip(" ").strip("\n").strip("\r")
numZipsIter = 0
path8 = path1 + eachNew[9][0:len(eachNew[9])-2] +"/"+ allFinal1
downZip = eachNew[9][0:len(eachNew[9])-2]+".zip"
while(numZipsIter <len(numZips2)):
if (numZips2[numZipsIter][0:3].strip(" ") == "NWI") and ("remap" not in allFinal1):
numZips2New = numZips2[numZipsIter].split("_")
if (numZips2New[0].strip(" ") in allFinal1 and numZips2New[1] != "remap" and numZips2New[2].strip(" ") in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"):
urllib.urlretrieve (path8, allFinal1)
zip1 = zipfile.ZipFile(myRetrieveDir +"\\" + allFinal1)
zip1.extractall(myRetrieveDir)
#maybe just have numzips2 (raw input) as the values before the county number
#numZips2[numZipsIter][0:-7].strip(" ") in allFinal1 or numZips2[numZipsIter][0:-7].strip(" ").lower() in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"
elif (numZips2[numZipsIter].strip(" ") in allFinal1 or numZips2[numZipsIter].strip(" ").lower() in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"):
urllib.urlretrieve (path8, allFinal1)
zip1 = zipfile.ZipFile(myRetrieveDir +"\\" + allFinal1)
zip1.extractall(myRetrieveDir)
numZipsIter+=1
pickData()
#client picks shapefiles to add to map
#section for geoprocessing operations
# get the data frames
#add new data frame, title
#check spaces in ftp crawler
os.chdir(path)
env.workspace = path+ "\\symbology\\"
zp1 = os.listdir(path + "\\counties\\")
def myGeoprocessing(layer1, layer2):
#the code in this function is used for geoprocessing operations
#it returns whatever output is generated from the tools used in the map
try:
arcpy.Clip_analysis(path + "\\symbology\\Stream_order.shp", layer1, path + "\\counties\\" + layer2 + "\\Streams.shp")
except:
pass
streams = arcpy.mapping.Layer(path + "\\counties\\" + layer2 + "\\Streams.shp")
arcpy.ApplySymbologyFromLayer_management(streams, path+ '\\symbology\\streams.lyr')
return streams
def makeMap():
#original wetlands layers need to be entered as NWI_line or NWI_poly
print "Enter the layer or layers you wish to include in the map"
myInput = raw_input();
counter1 = 1
for each in zp1:
print each
print path
zp2 = os.listdir(path + "\\counties\\" + each)
for eachNew in zp2:
#print eachNew
if (eachNew[-4:] == ".shp") and ((myInput in eachNew[0:-7] or myInput.lower() in eachNew[0:-7])or((eachNew[8:12] == "poly" or eachNew[8:12]=='line') and eachNew[8:12] in myInput)):
print eachNew[0:-7]
theMap = arcpy.mapping.MapDocument(path +'\\map.mxd')
df1 = arcpy.mapping.ListDataFrames(theMap,"*")[0]
#this is where we add our layers
layer1 = arcpy.mapping.Layer(path + "\\counties\\" + each + "\\" + eachNew)
if(eachNew[7:11] == "poly" or eachNew[7:11] =="line"):
arcpy.ApplySymbologyFromLayer_management(layer1, path + '\\symbology\\' +myInput+'.lyr')
else:
arcpy.ApplySymbologyFromLayer_management(layer1, path + '\\symbology\\' +eachNew[0:-7]+'.lyr')
# Assign legend variable for map
legend = arcpy.mapping.ListLayoutElements(theMap, "LEGEND_ELEMENT", "Legend")[0]
# add wetland layer to map
legend.autoAdd = True
try:
arcpy.mapping.AddLayer(df1, layer1,"AUTO_ARRANGE")
#geoprocessing steps
streams = myGeoprocessing(layer1, each)
# more geoprocessing options, add the layers to map and assign if they should appear in legend
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, streams,"TOP")
df1.extent = layer1.getExtent(True)
arcpy.mapping.ExportToJPEG(theMap, path + "\\counties\\" + each + "\\map.jpg")
# Save map document to path
theMap.saveACopy(path + "\\counties\\" + each + "\\map.mxd")
del theMap
print "done with map " + str(counter1)
except:
print "issue with map or already exists"
counter1+=1
makeMap()
Original Design
import os, sys, urllib2, urllib, zipfile, arcpy
from arcpy import env
response = urllib2.urlopen('path2URL')
path1 = 'path2URL'
myCount = 1
for each in response:
eachNew = each.split(" ")
myCount+=1
response1 = urllib2.urlopen(path1 + eachNew[9])
for all1 in response1:
#print all1
allNew = all1.split(",")
allFinal = allNew[0].split(" ")
allFinal1 = allFinal[len(allFinal)-1].strip(" ")
if allFinal1[-10:-2] == "poly.ZIP":
response2 = urllib2.urlopen('path2URL')
zipcontent= response2.readlines()
path8 = 'path2URL'+ eachNew[9][0:len(eachNew[9])-2] +"/"+ allFinal1[0:len(allFinal1)-2]
downZip = str(eachNew[9][0:len(eachNew[9])-2])+ ".zip"
urllib.urlretrieve (path8, downZip)
# Set the path to the directory where your zipped folders reside
zipfilepath = 'F:\Misc\presentation'
# Set the path to where you want the extracted data to reside
extractiondir = 'F:\Misc\presentation\counties'
# List all data in the main directory
zp1 = os.listdir(zipfilepath)
# Creates a loop which gives use each zipped folder automatically
# Concatinates zipped folder to original directory in variable done
for each in zp1:
print each[-4:]
if each[-4:] == ".zip":
done = zipfilepath + "\\" + each
zip1 = zipfile.ZipFile(done)
extractiondir1 = extractiondir + "\\" + each[:-4]
zip1.extractall(extractiondir1)
path = os.getcwd()
counter1 = 1
# get the data frames
# Create new layer for all files to be added to map document
env.workspace = "E:\\Misc\\presentation\\symbology\\"
zp1 = os.listdir(path + "\\counties\\")
for each in zp1:
zp2 = os.listdir(path + "\\counties\\" + each)
for eachNew in zp2:
if eachNew[-4:] == ".shp":
wetlandMap = arcpy.mapping.MapDocument('E:\\Misc\\presentation\\wetland.mxd')
df1 = arcpy.mapping.ListDataFrames(wetlandMap,"*")[0]
#print eachNew[-4:]
wetland = arcpy.mapping.Layer(path + "\\counties\\" + each + "\\" + eachNew)
#arcpy.Clip_analysis(path + "\\symbology\\Stream_order.shp", wetland, path + "\\counties\\" + each + "\\Streams.shp")
streams = arcpy.mapping.Layer(path + "\\symbology\\Stream_order.shp")
arcpy.ApplySymbologyFromLayer_management(wetland, path + '\\symbology\\wetland.lyr')
arcpy.ApplySymbologyFromLayer_management(streams, path+ '\\symbology\\streams.lyr')
# Assign legend variable for map
legend = arcpy.mapping.ListLayoutElements(wetlandMap, "LEGEND_ELEMENT", "Legend")[0]
# add the layers to map and assign if they should appear in legend
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, streams,"TOP")
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, wetland,"AUTO_ARRANGE")
df1.extent = wetland.getExtent(True)
# Export the map to a pdf
arcpy.mapping.ExportToJPEG(wetlandMap, path + "\\counties\\" + each + "\\wetland.jpg")
# Save map document to path
wetlandMap.saveACopy(path + "\\counties\\" + each + "\\wetland.mxd")
del wetlandMap
print "done with map " + str(counter1)
counter1+=1
Have a look at this guide:
https://wiki.python.org/moin/PythonSpeed/PerformanceTips
Let me quote:
Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. This strongly suggests that where appropriate, functions should handle data aggregates.
So effectively this suggests, to not factor out something as a function that is going to be called hundreds of thousands of times.
In Python functions won't be inlined, and calling them is not cheap. If in doubt use a profiler to find out how many times is each function called, and how long does it take on average. Then optimize.
You might also give PyPy a shot, as they have certain optimizations built in. Reducing the function call overhead in some cases seems to be one of them:
Python equivalence to inline functions or macros
http://pypy.org/performance.html
I ran into a curious problem while parsing json objects in large text files, and the solution I found doesn't really make much sense. I was working with the following script. It copies bz2 files, unzips them, then parses each line as a json object.
import os, sys, json
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# USER INPUT
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
args = sys.argv
extractDir = outputDir = ""
if (len(args) >= 2):
extractDir = args[1]
else:
extractDir = raw_input('Directory to extract from: ')
if (len(args) >= 3):
outputDir = args[2]
else:
outputDir = raw_input('Directory to output to: ')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# RETRIEVE FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
tweetModel = [u'id', u'text', u'lang', u'created_at', u'retweeted', u'retweet_count', u'in_reply_to_user_id', u'coordinates', u'place', u'hashtags', u'in_reply_to_status_id']
filenames = next(os.walk(extractDir))[2]
for file in filenames:
if file[-4:] != ".bz2":
continue
os.system("cp " + extractDir + '/' + file + ' ' + outputDir)
os.system("bunzip2 " + outputDir + '/' + file)
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# PARSE DATA
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
input = open (outputDir + '/' + file[:-4], 'r')
output = open (outputDir + '/p_' + file[:-4], 'w+')
for line in input.readlines():
try:
tweet = json.loads(line)
for field in enumerate(tweetModel):
if tweet.has_key(field[1]) and tweet[field[1]] != None:
if field[0] != 0:
output.write('\t')
fieldData = tweet[field[1]]
if not isinstance(fieldData, unicode):
fieldData = unicode(str(fieldData), "utf-8")
output.write(fieldData.encode('utf8'))
else:
output.write('\t')
except ValueError as e:
print ("Parse Error: " + str(e))
print line
line = input.readline()
quit()
continue
print "Success! " + str(len(line))
input.flush()
output.write('\n')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# REMOVE OLD FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
os.system("rm " + outputDir + '/' + file[:-4])
While reading in certain lines in the for line in input.readlines(): loop, the lines would occasionally be truncated at inconsistent locations. Since the newline character was truncated as well, it would keep reading until it found the newline character at the end of the next json object. The result was an incomplete json object followed by a complete json object, all considered one line by the parser. I could not find the reason for this issue, but I did find that changing the loop to
filedata = input.read()
for line in filedata.splitlines():
worked. Does anyone know what is going on here?
After looking at the source code for file.readlines and string.splitlines I think I see whats up. Note: This is python 2.7 source code so if you're using another version... maybe this answer pertains maybe not.
readlines uses the function Py_UniversalNewlineFread to test for a newline splitlines uses a constant STRINGLIB_ISLINEBREAK that just tests for \n or \r. I would suspect Py_UniversalNewlineFread is picking up some character in the file stream as linebreak when its not really intended as a line break, could be from the encoding.. I don't know... but when you just dump all that same data to a string the splitlines checks it against \r and \n theres no match so splitlines moves on until the real line break is encountered and you get your intended line.
I have a python file that creates and populates a table in ms sql. The only sticking point is that the code breaks if there are any non-ascii characters or single apostrophes (and there are quite a few of each). Although I can run the replace function to rid the strings of apostrophes, I would prefer to keep them intact. I have also tried converting the data into utf-8, but no luck there either.
Below are th error messages I get:
"'ascii' codec can't encode character u'\2013' in position..." (for non-ascii characters)
and for the single quotes
class 'pyodbc.ProgrammingError'>: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server] Incorrect syntax near 'S, 230 X 90M.; Eligibilty....
When I try to encode string in utf-8, I instead get the following error message:
<type 'exceptions.UnicodeDecodeError'>: ascii' codec can't decode byte 0xe2 in position 219: ordinal not in range(128)
The python code is included below. I believe the point in the code where this break occurs is after the following line: InsertValue = str(row.GetValue(CurrentField['Name'])).
# -*- coding: utf-8 -*-
import pyodbc
import sys
import arcpy
import arcgisscripting
gp = arcgisscripting.create(9.3)
SQL_KEYWORDS = ['PERCENT', 'SELECT', 'INSERT', 'DROP', 'TABLE']
#SourceFGDB = '###'
#SourceTable = '###'
SourceTable = sys.argv[1]
TempInputName = sys.argv[2]
SourceTable2 = sys.argv[3]
#---------------------------------------------------------------------------------------------------------------------
# Target Database Settings
#---------------------------------------------------------------------------------------------------------------------
TargetDatabaseDriver = "{SQL Server}"
TargetDatabaseServer = "###"
TargetDatabaseName = "###"
TargetDatabaseUser = "###"
TargetDatabasePassword = "###"
# Get schema from FGDB table.
# This should be an ordered list of dictionary elements [{'FGDB_Name', 'FGDB_Alias', 'FGDB_Type', FGDB_Width, FGDB_Precision, FGDB_Scale}, {}]
if not gp.Exists(SourceTable):
print ('- The source does not exist.')
sys.exit(102)
#### Should see if it is actually a table type. Could be a Feature Data Set or something...
print(' - Processing Items From : ' + SourceTable)
FieldList = []
Field_List = gp.ListFields(SourceTable)
print(' - Getting number of rows.')
result = gp.GetCount_management(SourceTable)
Number_of_Features = gp.GetCount_management(SourceTable)
print(' - Number of Rows: ' + str(Number_of_Features))
print(' - Getting fields.')
Field_List1 = gp.ListFields(SourceTable, 'Layer')
Field_List2 = gp.ListFields(SourceTable, 'Comments')
Field_List3 = gp.ListFields(SourceTable, 'Category')
Field_List4 = gp.ListFields(SourceTable, 'State')
Field_List5 = gp.ListFields(SourceTable, 'Label')
Field_List6 = gp.ListFields(SourceTable, 'DateUpdate')
Field_List7 = gp.ListFields(SourceTable, 'OBJECTID')
for Current_Field in Field_List1 + Field_List2 + Field_List3 + Field_List4 + Field_List5 + Field_List6 + Field_List7:
print(' - Field Found: ' + Current_Field.Name)
if Current_Field.AliasName in SQL_KEYWORDS:
Target_Name = Current_Field.Name + '_'
else:
Target_Name = Current_Field.Name
print(' - Alias : ' + Current_Field.AliasName)
print(' - Type : ' + Current_Field.Type)
print(' - Length : ' + str(Current_Field.Length))
print(' - Scale : ' + str(Current_Field.Scale))
print(' - Precision: ' + str(Current_Field.Precision))
FieldList.append({'Name': Current_Field.Name, 'AliasName': Current_Field.AliasName, 'Type': Current_Field.Type, 'Length': Current_Field.Length, 'Scale': Current_Field.Scale, 'Precision': Current_Field.Precision, 'Unique': 'UNIQUE', 'Target_Name': Target_Name})
# Create table in SQL Server based on FGDB table schema.
cnxn = pyodbc.connect(r'DRIVER={SQL Server};SERVER=###;DATABASE=###;UID=sql_webenvas;PWD=###')
cursor = cnxn .cursor()
#### DROP the table first?
try:
DropTableSQL = 'DROP TABLE dbo.' + TempInputName + '_Test;'
print DropTableSQL
cursor.execute(DropTableSQL)
dbconnection.commit()
except:
print('WARNING: Can not drop table - may not exist: ' + TempInputName + '_Test')
CreateTableSQL = ('CREATE TABLE ' + TempInputName + '_Test '
' (Layer varchar(500), Comments varchar(5000), State int, Label varchar(500), DateUpdate DATETIME, Category varchar(50), OBJECTID int)')
cursor.execute(CreateTableSQL)
cnxn.commit()
# Cursor through each row in the FGDB table, get values, and insert into the SQL Server Table.
# We got Number_of_Features earlier, just use that.
Number_Processed = 0
print(' - Processing ' + str(Number_of_Features) + ' features.')
rows = gp.SearchCursor(SourceTable)
row = rows.Next()
while row:
if Number_Processed % 10000 == 0:
print(' - Processed ' + str(Number_Processed) + ' of ' + str(Number_of_Features))
InsertSQLFields = 'INSERT INTO ' + TempInputName + '_Test ('
InsertSQLValues = 'VALUES ('
for CurrentField in FieldList:
InsertSQLFields = InsertSQLFields + CurrentField['Target_Name'] + ', '
InsertValue = str(row.GetValue(CurrentField['Name']))
if InsertValue in ['None']:
InsertValue = 'NULL'
# Use an escape quote for the SQL.
InsertValue = InsertValue.replace("'","' '")
if CurrentField['Type'].upper() in ['STRING', 'CHAR', 'TEXT']:
if InsertValue == 'NULL':
InsertSQLValues = InsertSQLValues + "NULL, "
else:
InsertSQLValues = InsertSQLValues + "'" + InsertValue + "', "
elif CurrentField['Type'].upper() in ['GEOMETRY']:
## We're not handling geometry transfers at this time.
if InsertValue == 'NULL':
InsertSQLValues = InsertSQLValues + '0' + ', '
else:
InsertSQLValues = InsertSQLValues + '1' + ', '
else:
InsertSQLValues = InsertSQLValues + InsertValue + ', '
InsertSQLFields = InsertSQLFields[:-2] + ')'
InsertSQLValues = InsertSQLValues[:-2] + ')'
InsertSQL = InsertSQLFields + ' ' + InsertSQLValues
## print InsertSQL
cursor.execute(InsertSQL)
cnxn.commit()
Number_Processed = Number_Processed + 1
row = rows.Next()
print(' - Processed all ' + str(Number_Processed))
del row
del rows
James, I believe the real issue is that your are not using Unicode accross the board. Try to do the following:
Make sure that your input file that you are using to populate the DB is in UTF-8 and that you are reading it with the UTF-8 encoder.
Make sure your DB is actually storing the data as Unicode
When you retrieve data from the file or from the DB or want to manipulate strings (with the + operator for instance) you need to make sure that all parts are proper Unicode. You can NOT use the str() method. You need to use unicode() as Dave pointed out. If you define strings in your code use u'my string' instead of 'my string' (otherwise it is not considered unicode).
Also, please provide us the full stack trace and the exception name.
I'm going to use my psychic debugging skills and say you are trying to str()ify something and getting an error with the ascii codec. What you really should do is to use the utf-8 codec instead like this:
insert_value_uni = unicode(row.GetValue(CurrentField['Name']))
InsertValue = insert_value_uni.encode('utf-8')
Or you can take the view that only ASCII is allowed and use the awesomely named Unicode Hammer
In general you want to convert to unicode on data input, and convert to the desired encoding on output.
So it will be easier to find your problem if you do this. This means changing all strings to unicode, 'INSERT INTO ' to u'INSERT INTO '. (Notice the "u" before the string)
Then when you send the string to be executed convert to the desired encoding, "utf8".
cursor.execute(InsertSQL.encode("utf8")) # Where InsertSQL is unicode
Also, you should add the encoding string to the top of your source code.
This means adding the encoding cookie to one of the first two lines of the file:
#!/usr/bin/python
# -*- coding: <encoding name> -*-
If your pulling data from a file to build your string you can uses codecs.open to auto convert from a specific encoding to unicode on load.
When I converted my str() to unicode, that solved the problem. A simple answer, and I appreciate everyone's help on this.
I am using Python 2.7.9. I'm working on a program that is supposed to produce the following output in a .csv file per loop:
URL,number
Here's the main loop of the code I'm using:
csvlist = open(listfile,'w')
f = open(list, "r")
def hasQuality(item):
for quality in qualities:
if quality in item:
return True
return False
for line in f:
line = line.split('\n')
line = line[0]
# print line
itemname = urllib.unquote(line).decode('utf8')
# print itemhash
if hasQuality(itemname):
try:
looptime = time.time()
url = baseUrl + line
results = json.loads(urlopen(url).read())
# status = results.status_code
content = results
if 'median_price' in content:
medianstr = str(content['median_price']).replace('$','')
medianstr = medianstr.replace('.','')
median = float(medianstr)
volume = content['volume']
print url+'\n'+itemname
print 'Median: $'+medianstr
print 'Volume: '+str(volume)
if (median > minprice) and (volume > minvol):
csvlist.write(line + ',' + medianstr + '\n')
print '+ADDED TO LIST'
else:
print 'No median price given for '+itemname+'.\nGiving up on item.'
print "Finished loop in " + str(round(time.time() - looptime,3)) + " seconds."
except ValueError:
print "we blacklisted fool?? cause we skippin beats"
else:
print itemname+'is a commodity.\nGiving up on item.'
csvlist.close()
f.close()
print "Finished script in " + str(round(time.time() - runtime, 3)) + " seconds."
It should be generating a list that looks like this:
AWP%20%7C%20Asiimov%20%28Field-Tested%29,3911
M4A1-S%20%7C%20Hyper%20Beast%20%28Field-Tested%29,4202
But it's actually generating a list that looks like this:
AWP%20%7C%20Asiimov%20%28Field-Tested%29
,3911
M4A1-S%20%7C%20Hyper%20Beast%20%28Field-Tested%29
,4202
Whenever it is ran on a Windows machine, I have no issue. Whenever I run it on my EC2 instance, however, it adds that extra newline. Any ideas why? Running commands on the file like
awk 'NR%2{printf $0" ";next;}1' output.csv
do not do anything. I have transferred it to my Windows machine and it still reads the same. However, when I paste the output into Steam's chat client it concatenates it in the way that I want.
Thanks in advance!
This is where the problem occurs
code:
csvlist.write(line + ',' + medianstr + '\n')
This can be cleared is you strip the space
modified code:
csvlist.write(line.strip() + ',' + medianstr + '\n')
Problem:
The problem is due to the fact you are reading raw lines from the input file
Raw_lines contain \n to indicate there is a new line for every line which is not the last and for the last line it just ends with the given character .
for more details:
Just type print(repr(line)) before writing and see the output