Python: Creating a file based on an array of strings - python

I'm trying to write a program that will go to a website and download all of the songs they have posted. Right now I'm having trouble creating new file names for each of the songs I download. I initially get all of the file names and the locations of the songs (html). However, when I try to create new files for the songs to be put in, I get an error saying:
IOError: [Errno 22] invalid mode ('w') or filename
I have tried using different modes like "w+", "a", and, "a+" to see if these would solve the issue but so far I keep getting the error message. I have also tried "% name"-ing the string but that has not worked either. My code is following, any help would be appreciated.
import urllib
import urllib2
def earmilk():
SongList = []
SongStrings = []
SongNames = []
earmilk = urllib.urlopen("http://www.earmilk.com/category/pop")
reader = earmilk.read()
#gets the position of the playlist
PlaylistPos = reader.find("var newPlaylistTracks = ")
#finds the number of songs in the playlist
NumberSongs = reader[reader.find("var newPlaylistIds = " ): PlaylistPos].count(",") + 1
initPos = PlaylistPos
#goes though the playlist and records the html address and name of the song
for song in range(0, NumberSongs):
songPos = reader[initPos:].find("http:") + initPos
namePos = reader[songPos:].find("name") + songPos
namePos += reader[namePos:].find(">")
nameEndPos = reader[namePos:].find("<") + namePos
SongStrings.append(reader[songPos: reader[songPos:].find('"') + songPos])
SongNames.append(reader[namePos + 1: nameEndPos])
#initPos += len(SongStrings[song])
initPos = nameEndPos
for correction in range(0, NumberSongs):
SongStrings[correction] = SongStrings[correction].replace('\\/', "/")
#downloading songs
#for download in range(0, NumberSongs):
#print reader.find("So F*")
#x= SongNames[0]
songDL = open(SongNames[0].formant(name), "w+")
songDL.write(urllib.urlretrieve(SongStrings[0], SongNames[0] + ".mp3"))
songDL.close()
print SongStrings
for name in range(0, NumberSongs):
print SongNames[name] + "\n"
earmilk.close()

You need to use filename = '%s' % (SongNames[0],) to construct the name but you also need to make sure that your file name is a valid one - I don't know of any songs called *.* but I wouldn't like to chance it so something like:
filename = ''.join([a.isalnum() and a or '_' for a in SongNames[0]])

Related

File indexing issue in python

for this function, i need to traverse through a file and count each line based on certain signifiers. If that certain signifier is present in the line, i need to add the string as a key to the dictionary and increment its value by one each time its present. I am not outright looking for the answer, I am just lost as to what I have done wrong and where I can proceed from here.
Both of the counter variables and the dictionary are returning empty. I need them to return the values based on what is present on a given file.
file line example:
RT #taylorswift13: Feeling like the luckiest person alive to get to take these brilliant artists out on tour w/ me: #paramore, #beabad00bee & #OwennMusic. I can’t WAIT to see you. It’s been a long time coming 🥰
code:
def top_retweeted(tweets_file_name, num_top_retweeted):
total_tweets = 0
total_retweets = 0
retweets_users = {}
f_read = open(tweets_file_name, "r")
f_write = open(tweets_file_name, "w")
lines = f_read.readlines()
for line in lines:
total_tweets =+1
elements = line.split(":")
for element in elements:
if "RT" in element:
total_retweets =+1
user_name = element.split()
retweet_users[user_name]=+1
print("There were " + str(total_tweets) + " tweets in the file, " + str(total_retweets) + " of which were retweets")
return retweets_user
f_read = open(tweets_file_name, "r")
f_write = open(tweets_file_name, "w")
You're opening the file for reading and then also opening it for writing, which destroys the existing contents.

I can not find a way to deal with new pages in docx using Python

I have a docx file with 40 pages of text and I want to separate each page and import its context into a list. Is this possible? The only way I have found is to find the empty spots in my list but that does not always mean a page break. With my code I get the text after the word "Subject" is found and it stops
after a blank spot is found. The thing is that need a way to recognise pagebreak in my code to solve some issues. This way page break is also being treated as a " " . Thanks in advance
import os
import docx
def read(name):
doc = docx.Document(name)
text =[]
for par in doc.paragraphs:
text.append(par.text)
return text
''''''
for basename in os.listdir('files'):
path = os.path.join('files', basename)
jerk = read(path)
lari =[]
vaccum = []
indices = []
for i in jerk:
if not i.find('Subject'):
lari.append(jerk.index(i))
indices.append(jerk.index(i))
for j in jerk:
if jerk.index(j) in lari:
for k in range(20):
if jerk[jerk.index(j)+k]!='':
vaccum.append(jerk[jerk.index(j) + k + 1])
else:
break
final =[]
var =''
for k in vaccum:
var = var+k
if k =='':
final.append(var)
var =''
print(vaccum)

Why Python program execution slows down when using functions?

So I have a rather general question I was hoping to get some help with. I put together a Python program that runs through and automates workflows at the state level for all the different counties. The entire program was created for research at school - not actual state work. Anyways, I have two designs shown below. The first is an updated version. It takes about 40 minutes to run. The second design shows the original work. Note that it is not a well structured design. However, it takes about five minutes to run the entire program. Could anybody give any insight why there are such differences between the two? The updated version is still ideal as it is much more reusable (can run and grab any dataset in the url) and easy to understand. Furthermore, 40 minutes to get about a hundred workflows completed is still a plus. Also, this is still a work in progress. A couple minor issues still need to be addressed in the code but it is still a pretty cool program.
Updated Design
import os, sys, urllib2, urllib, zipfile, arcpy
from arcpy import env
path = os.getcwd()
def pickData():
myCount = 1
path1 = 'path2URL'
response = urllib2.urlopen(path1)
print "Enter the name of the files you need"
numZips = raw_input()
numZips2 = numZips.split(",")
myResponse(myCount, path1, response, numZips2)
def myResponse(myCount, path1, response, numZips2):
myPath = os.getcwd()
for each in response:
eachNew = each.split(" ")
eachCounty = eachNew[9].strip("\n").strip("\r")
try:
myCountyDir = os.mkdir(os.path.expanduser(myPath+ "\\counties" + "\\" + eachCounty))
except:
pass
myRetrieveDir = myPath+"\\counties" + "\\" + eachCounty
os.chdir(myRetrieveDir)
myCount+=1
response1 = urllib2.urlopen(path1 + eachNew[9])
for all1 in response1:
allNew = all1.split(",")
allFinal = allNew[0].split(" ")
allFinal1 = allFinal[len(allFinal)-1].strip(" ").strip("\n").strip("\r")
numZipsIter = 0
path8 = path1 + eachNew[9][0:len(eachNew[9])-2] +"/"+ allFinal1
downZip = eachNew[9][0:len(eachNew[9])-2]+".zip"
while(numZipsIter <len(numZips2)):
if (numZips2[numZipsIter][0:3].strip(" ") == "NWI") and ("remap" not in allFinal1):
numZips2New = numZips2[numZipsIter].split("_")
if (numZips2New[0].strip(" ") in allFinal1 and numZips2New[1] != "remap" and numZips2New[2].strip(" ") in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"):
urllib.urlretrieve (path8, allFinal1)
zip1 = zipfile.ZipFile(myRetrieveDir +"\\" + allFinal1)
zip1.extractall(myRetrieveDir)
#maybe just have numzips2 (raw input) as the values before the county number
#numZips2[numZipsIter][0:-7].strip(" ") in allFinal1 or numZips2[numZipsIter][0:-7].strip(" ").lower() in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"
elif (numZips2[numZipsIter].strip(" ") in allFinal1 or numZips2[numZipsIter].strip(" ").lower() in allFinal1) and (allFinal1[-3:]=="ZIP" or allFinal1[-3:]=="zip"):
urllib.urlretrieve (path8, allFinal1)
zip1 = zipfile.ZipFile(myRetrieveDir +"\\" + allFinal1)
zip1.extractall(myRetrieveDir)
numZipsIter+=1
pickData()
#client picks shapefiles to add to map
#section for geoprocessing operations
# get the data frames
#add new data frame, title
#check spaces in ftp crawler
os.chdir(path)
env.workspace = path+ "\\symbology\\"
zp1 = os.listdir(path + "\\counties\\")
def myGeoprocessing(layer1, layer2):
#the code in this function is used for geoprocessing operations
#it returns whatever output is generated from the tools used in the map
try:
arcpy.Clip_analysis(path + "\\symbology\\Stream_order.shp", layer1, path + "\\counties\\" + layer2 + "\\Streams.shp")
except:
pass
streams = arcpy.mapping.Layer(path + "\\counties\\" + layer2 + "\\Streams.shp")
arcpy.ApplySymbologyFromLayer_management(streams, path+ '\\symbology\\streams.lyr')
return streams
def makeMap():
#original wetlands layers need to be entered as NWI_line or NWI_poly
print "Enter the layer or layers you wish to include in the map"
myInput = raw_input();
counter1 = 1
for each in zp1:
print each
print path
zp2 = os.listdir(path + "\\counties\\" + each)
for eachNew in zp2:
#print eachNew
if (eachNew[-4:] == ".shp") and ((myInput in eachNew[0:-7] or myInput.lower() in eachNew[0:-7])or((eachNew[8:12] == "poly" or eachNew[8:12]=='line') and eachNew[8:12] in myInput)):
print eachNew[0:-7]
theMap = arcpy.mapping.MapDocument(path +'\\map.mxd')
df1 = arcpy.mapping.ListDataFrames(theMap,"*")[0]
#this is where we add our layers
layer1 = arcpy.mapping.Layer(path + "\\counties\\" + each + "\\" + eachNew)
if(eachNew[7:11] == "poly" or eachNew[7:11] =="line"):
arcpy.ApplySymbologyFromLayer_management(layer1, path + '\\symbology\\' +myInput+'.lyr')
else:
arcpy.ApplySymbologyFromLayer_management(layer1, path + '\\symbology\\' +eachNew[0:-7]+'.lyr')
# Assign legend variable for map
legend = arcpy.mapping.ListLayoutElements(theMap, "LEGEND_ELEMENT", "Legend")[0]
# add wetland layer to map
legend.autoAdd = True
try:
arcpy.mapping.AddLayer(df1, layer1,"AUTO_ARRANGE")
#geoprocessing steps
streams = myGeoprocessing(layer1, each)
# more geoprocessing options, add the layers to map and assign if they should appear in legend
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, streams,"TOP")
df1.extent = layer1.getExtent(True)
arcpy.mapping.ExportToJPEG(theMap, path + "\\counties\\" + each + "\\map.jpg")
# Save map document to path
theMap.saveACopy(path + "\\counties\\" + each + "\\map.mxd")
del theMap
print "done with map " + str(counter1)
except:
print "issue with map or already exists"
counter1+=1
makeMap()
Original Design
import os, sys, urllib2, urllib, zipfile, arcpy
from arcpy import env
response = urllib2.urlopen('path2URL')
path1 = 'path2URL'
myCount = 1
for each in response:
eachNew = each.split(" ")
myCount+=1
response1 = urllib2.urlopen(path1 + eachNew[9])
for all1 in response1:
#print all1
allNew = all1.split(",")
allFinal = allNew[0].split(" ")
allFinal1 = allFinal[len(allFinal)-1].strip(" ")
if allFinal1[-10:-2] == "poly.ZIP":
response2 = urllib2.urlopen('path2URL')
zipcontent= response2.readlines()
path8 = 'path2URL'+ eachNew[9][0:len(eachNew[9])-2] +"/"+ allFinal1[0:len(allFinal1)-2]
downZip = str(eachNew[9][0:len(eachNew[9])-2])+ ".zip"
urllib.urlretrieve (path8, downZip)
# Set the path to the directory where your zipped folders reside
zipfilepath = 'F:\Misc\presentation'
# Set the path to where you want the extracted data to reside
extractiondir = 'F:\Misc\presentation\counties'
# List all data in the main directory
zp1 = os.listdir(zipfilepath)
# Creates a loop which gives use each zipped folder automatically
# Concatinates zipped folder to original directory in variable done
for each in zp1:
print each[-4:]
if each[-4:] == ".zip":
done = zipfilepath + "\\" + each
zip1 = zipfile.ZipFile(done)
extractiondir1 = extractiondir + "\\" + each[:-4]
zip1.extractall(extractiondir1)
path = os.getcwd()
counter1 = 1
# get the data frames
# Create new layer for all files to be added to map document
env.workspace = "E:\\Misc\\presentation\\symbology\\"
zp1 = os.listdir(path + "\\counties\\")
for each in zp1:
zp2 = os.listdir(path + "\\counties\\" + each)
for eachNew in zp2:
if eachNew[-4:] == ".shp":
wetlandMap = arcpy.mapping.MapDocument('E:\\Misc\\presentation\\wetland.mxd')
df1 = arcpy.mapping.ListDataFrames(wetlandMap,"*")[0]
#print eachNew[-4:]
wetland = arcpy.mapping.Layer(path + "\\counties\\" + each + "\\" + eachNew)
#arcpy.Clip_analysis(path + "\\symbology\\Stream_order.shp", wetland, path + "\\counties\\" + each + "\\Streams.shp")
streams = arcpy.mapping.Layer(path + "\\symbology\\Stream_order.shp")
arcpy.ApplySymbologyFromLayer_management(wetland, path + '\\symbology\\wetland.lyr')
arcpy.ApplySymbologyFromLayer_management(streams, path+ '\\symbology\\streams.lyr')
# Assign legend variable for map
legend = arcpy.mapping.ListLayoutElements(wetlandMap, "LEGEND_ELEMENT", "Legend")[0]
# add the layers to map and assign if they should appear in legend
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, streams,"TOP")
legend.autoAdd = True
arcpy.mapping.AddLayer(df1, wetland,"AUTO_ARRANGE")
df1.extent = wetland.getExtent(True)
# Export the map to a pdf
arcpy.mapping.ExportToJPEG(wetlandMap, path + "\\counties\\" + each + "\\wetland.jpg")
# Save map document to path
wetlandMap.saveACopy(path + "\\counties\\" + each + "\\wetland.mxd")
del wetlandMap
print "done with map " + str(counter1)
counter1+=1
Have a look at this guide:
https://wiki.python.org/moin/PythonSpeed/PerformanceTips
Let me quote:
Function call overhead in Python is relatively high, especially compared with the execution speed of a builtin function. This strongly suggests that where appropriate, functions should handle data aggregates.
So effectively this suggests, to not factor out something as a function that is going to be called hundreds of thousands of times.
In Python functions won't be inlined, and calling them is not cheap. If in doubt use a profiler to find out how many times is each function called, and how long does it take on average. Then optimize.
You might also give PyPy a shot, as they have certain optimizations built in. Reducing the function call overhead in some cases seems to be one of them:
Python equivalence to inline functions or macros
http://pypy.org/performance.html

Python IndexError: list index out of range when using iterations

I've been trying to download screenshots from the App Store and here's my code (I'm a beginner).
The problem I encounter is list index out of range at line 60 (screenshotList = data["results"][resultCounter]["screenshotUrls"]
The thing is that sometimes, the search API returns 0 results for the search term used, and therefore it gets messed up because "resultCount" = 0.
I'm not sure what else it could be/nor how I can fix it. Any help?
# Required libraries
import urllib
import string
import random
import json
import time
""" screenshotCounter is used so that all screenshots have a different name
resultCounter is used to go from result to result in downloaded JSON file
"""
screenshotCounter = 0
resultCounter = 0
""" Create three random letters as search term on App Store
Download JSON results file
Shows used search term
"""
searchTerm = (''.join(random.choice(string.ascii_lowercase) for i in range(3)))
urllib.urlretrieve("https://itunes.apple.com/search?country=us&entity=software&limit=3&term=" + str(searchTerm), "download.txt")
print "Used search term: " + str(searchTerm)
# Function to download screenshots + give it a name + confirmation msg
def download_screenshot(screenshotLink, screenshotName):
urllib.urlretrieve(screenshotLink, screenshotName)
print "Downloaded with success:" + str(screenshotName)
# Opens newly downloaded JSON file
with open ('download.txt') as data_file:
data = json.load(data_file)
""" Get the first list of screenshots from stored JSON file,
resultCounter = 0 on first iteration
"""
screenshotList = data["results"][resultCounter]["screenshotUrls"]
# Gives the number of found results and serves as iteration limit
iterationLimit = data["resultCount"]
# Prints the number of found results
print str(iterationLimit) + " results found."
""" Change the number of iterations to the number of results, which will be
different for every request, minus 1 since indexing starts at 0
"""
iterations = [0] * iterationLimit
""" For each iteration (number of results), find each screenshot in the
screenshotList, name it, download it. Then change result to find the next
screenshotList and change screenshotList variable.
"""
for number in iterations:
for screenshotLink in screenshotList:
screenshotName = "screenshot" + str(screenshotCounter) + ".jpeg"
download_screenshot(screenshotLink, screenshotName)
screenshotCounter = screenshotCounter + 1
resultCounter = resultCounter + 1
screenshotList = data["results"][resultCounter]["screenshotUrls"]
# Sleeping to avoid crash
time.sleep(1)
I rewrote your code to check for the presence of results before trying anything. If there aren't any, it goes back through the loop with a new search term. If there are, it will stop at the end of that iteration.
# Required libraries
import urllib
import string
import random
import json
import time
# Function to download screenshots + give it a name + confirmation msg
def download_screenshot(screenshotLink, screenshotName):
urllib.urlretrieve(screenshotLink, screenshotName)
print "Downloaded with success:" + str(screenshotName)
success = False
while success == False:
""" Create three random letters as search term on App Store
Download JSON results file
Shows used search term
"""
searchTerm = (''.join(random.choice(string.ascii_lowercase) for i in range(3)))
urllib.urlretrieve("https://itunes.apple.com/search?country=us&entity=software&limit=3&term=" + str(searchTerm), "download.txt")
print "Used search term: " + str(searchTerm)
# Opens newly downloaded JSON file
with open ('download.txt') as data_file:
data = json.load(data_file)
""" Get the first list of screenshots from stored JSON file,
resultCounter = 0 on first iteration
"""
resultCount = len(data["results"])
if resultCount == 0:
continue #if no results, skip to the next loop
success = True
print str(resultCount) + " results found."
for j, resultList in enumerate(data["results"]):
screenshotList = resultList["screenshotUrls"]
""" For each iteration (number of results), find each screenshot in the
screenshotList, name it, download it. Then change result to find the next
screenshotList and change screenshotList variable.
"""
for i, screenshotLink in enumerate(screenshotList):
screenshotName = "screenshot" + str(i) + '_' + str(j) + ".jpeg"
download_screenshot(screenshotLink, screenshotName)
# Sleeping to avoid crash
time.sleep(1)
have you tried
try:
for screenshotLink in screenshotList:
screenshotName = "screenshot" + str(screenshotCounter) + ".jpeg"
download_screenshot(screenshotLink, screenshotName)
screenshotCounter = screenshotCounter + 1
except IndexError:
pass

Creating a dynamic forum signature generator in python

I have searched and searched but I have only found solutions involving php and not python/django. My goal is to make a website (backend coded in python) that will allow a user to input a string. The backend script would then be run and output a dictionary with some info. What I want is to use the info from the dictionary to sort of draw it onto an image I have on the server and give the new image to the user. How can I do this offline for now? What libraries can I use? Any suggestions on the route I should head on would be lovely.
I am still a novice so please forgive me if my code needs work. So far I have no errors with what I have but like I said I have no clue where to go next to achieve my goal. Any tips would be greatly appreciated.
This is sort of what I want the end goal to be http://combatarmshq.com/dynamic-signatures.html
This is what I have so far (I used beautiful soup as a parser from here. If this is too excessive or if I did it in a not so good way please let me know if there is a better alternative. Thanks):
The url where I'm getting the numbers I want (These are dynamic) is this: http://combatarms.nexon.net/ClansRankings/PlayerProfile.aspx?user=
The name of the player will go after user so an example is http://combatarms.nexon.net/ClansRankings/PlayerProfile.aspx?user=-aonbyte
This is the code with the basic functions to scrape the website:
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
def get_avatar(player_name):
'''Return the players avatar as a binary string.'''
player_name = str(player_name)
url = 'http://combat.nexon.net/Avatar/MyAvatar.srf?'
url += 'GameName=CombatArms&CharacterID=' + player_name
sock = urlopen(url)
data = sock.read()
sock.close()
return data
def save_avatar(data, file_name):
'''Saves the avatar data from get_avatar() in png format.'''
local_file = open(file_name + '.png', 'w' + 'b')
local_file.write(data)
local_file.close()
def get_basic_info(player_name):
'''Returns basic player statistics as a dictionary'''
url = 'http://combatarms.nexon.net/ClansRankings'
url += '/PlayerProfile.aspx?user=' + player_name
sock = urlopen(url)
html_raw = sock.read()
sock.close()
html_original_parse = BeautifulSoup(''.join(html_raw))
player_info = html_original_parse.find('div', 'info').find('ul')
basic_info_list = range(6)
for i in basic_info_list:
basic_info_list[i] = str(player_info('li', limit = 7)[i+1].contents[1])
basic_info = dict(date = basic_info_list[0], rank = basic_info_list[1], kdr = basic_info_list[2], exp = basic_info_list[3], gp_earned = basic_info_list[4], gp_current = basic_info_list[5])
return basic_info
And here is the code that tests out those functions:
from grabber import get_avatar, save_avatar, get_basic_info
player = raw_input('Player name: ')
print 'Downloading avatar...'
avatar_data = get_avatar(player)
file_name = raw_input('Save as? ')
print 'Saving avatar as ' + file_name + '.png...'
save_avatar(avatar_data, file_name)
print 'Retrieving ' + player + '\'s basic character info...'
player_info = get_basic_info(player)
print ''
print ''
print 'Info for character named ' + player + ':'
print 'Character creation date: ' + player_info['date']
print 'Rank: ' + player_info['rank']
print 'Experience: ' + player_info['exp']
print 'KDR: ' + player_info['kdr']
print 'Current GP: ' + player_info['gp_current']
print ''
raw_input('Press enter to close...')
If I understand you correctly, you want to get an image from one place, get some textual information from another place, draw text on top of the image, and then return the marked-up image. Do I have that right?
If so, get PIL, the Python Image Library. Both PIL and BeatifulSoup are capable of reading directly from an opened URL, so you can forget that socket nonsense. Get the player name from the HTTP request, open the image, use BeautifulSoup to get the data, use PIL's text functions to write on the image, save the image back into the HTTP response, and you're done.

Categories