I need some help with converting the below code into something a bit more manageable.
I'm pretty sure I need modify it to include some while statements. But have been hitting my head against a wall for the last day or so. I think I'm close....
for LevelItemList[1] in LevelUrlList[1]:
if LevelItemList[1][1] == "Folder":
printFolderHeader(1,LevelItemList[1][0])
LevelUrlList[2] = parseHTML (LevelItemList[1][2])
for LevelItemList[2] in LevelUrlList[2]:
if LevelItemList[2][1] == "Folder":
printFolderHeader(2,LevelItemList[2][0])
LevelUrlList[3] = parseHTML (LevelItemList[2][2])
for LevelItemList[3] in LevelUrlList[3]:
if LevelItemList[3][1] == "Folder":
printFolderHeader(3,LevelItemList[3][0])
LevelUrlList[4] = parseHTML (LevelItemList[3][2])
for LevelItemList[4] in LevelUrlList[4]:
if LevelItemList[4][1] == "Folder":
printFolderHeader(4,LevelItemList[4][0])
LevelUrlList[5] = parseHTML (LevelItemList[4][2])
for LevelItemList[5] in LevelUrlList[5]:
if LevelItemList[5][1] == "Folder":
printFolderHeader(5,LevelItemList[5][0])
LevelUrlList[6] = parseHTML (LevelItemList[5][2])
for LevelItemList[6] in LevelUrlList[6]:
printPage(6,LevelItemList[6][0])
printFolderFooter(5,LevelItemList[5][0])
else:
printPage(5,LevelItemList[5][0])
printFolderFooter(4,LevelItemList[4][0])
else:
printPage(4,LevelItemList[4][0])
printFolderFooter(3,LevelItemList[3][0])
else:
printPage(3,LevelItemList[3][0])
printFolderFooter(2,LevelItemList[2][0])
else:
printPage(2,LevelItemList[2][0])
printFolderFooter(1,LevelItemList[1][0])
else:
printPage(1,LevelItemList[1][0])
I don't have the full context of the code, but I think you can reduce it down to something like this:
def printTheList(LevelItemList, index):
for item in LevelItemList:
if item[1] == "Folder":
printFolderHeader(index,item[0])
printTheList(parseHTML (item[2]), index + 1) # note the + 1
printFolderFooter(index,item[0])
else:
printPage(index,item[0])
# and the initial call looks like this.
printTheList(LevelUrlList[1], 1)
This code makes the assumption that you don't actually need to assign the values into LevelUrlList and LevelItemList the way you are doing in your code. If you do need that data later, I suggest passing in a different data structure to hold the resulting values.
Related
Basically I have a function that returns an API response with a huge amount of dictionaries, simplified to their keys, I then have another function, called getPlayerData which sends an api call to the same api to get information about the specific player, instead of all of them, the problem is that alone, getPlayerData is fast, but in this scenario, it is way more than unusable.
Is there a way i can speed up this? getPlayerData is not required, I can just make a request too.
The dictionary search
residents = []
for resident in getListData("resident"):
if getPlayerData(resident)["town"] == town:
residents.append(resident)
print(residents)
getPlayerData()
def getPlayerData(player):
r = requests.get("http://srv.earthpol.com/api/json/residents.php?name=" + player)
j = r.json()
player = player.lower()
global emptyresult
emptyresult = False
if str(j) == "{}":
emptyresult = True
else:
result = {"town": j[player]["town"],
"town-rank": j[player]["townRank"],
"nation-ranks": j[player]["nationRanks"],
"lastOnline:": j[player]["lastOnline"],
"registered": j[player]["registered"],
"town-title": j[player]["title"],
"nation-title": j[player]["surname"],
"friends": j[player]["friends"],
"uuid": j[player]["uuid"],
"avatar": "https://crafatar.com/avatars/"+ j[player]["uuid"]}
return result
Not really sure how to word this question properly, but I'm basically playing around with python and using Selenium to scrape a website and I'm trying to create a JSON file with the data.
Here's the goal I'm aiming to achieve:
{
"main1" : {
"sub1" : "data",
"sub2" : "data",
"sub3" : "data",
"sub4" : "data"
},
"main2" : {
"sub1" : "data",
"sub2" : "data",
"sub3" : "data",
"sub4" : "data"
}
}
The problem I'm facing at the moment is that the website has no indentation or child elements. It looks like this (but longer and actual copy, of course):
<h3>Main1</h3>
<p>Sub1</p>
<p>Sub2</p>
<p>Sub3</p>
<p>Sub4</p>
<h3>Main2</h3>
Now I want to iterate through the HTML in order to use the <h3> tags as the parent ("Main" in the JSON example) and <p> tags as the children(sub[num]). I'm new to both python and Selenium, so I may have done this wrong, but I've tried using items.find_elements_by_tag_name('el') to separate two, but I don't know how to put them back together in the order that they originally came.
I then tried looping through all the elements and separating the tags using if (item.tag_name == "el"): loops. This works perfectly when I print the results of each loop, but when it comes to putting them together in a JSON file, I have the same issue as the previous method where I cannot seem to get the 2 to join. I've tried a few variations and I either get key errors or only the last item in the loop gets recorded.
Just for reference, here's the code for this step:
items = browser.find_element_by_xpath(
'//*[#id="main-content"]') #Main Content
itemList = items.find_elements_by_xpath(".//*")
statuses = [
"Status1",
"Status2",
"Status3",
"Status4"
]
for item in itemList: #iterate through the HTML
if (item.tag_name == "h3"): #Separate H3 Tags
main = item.text
print("======================================")
print(main)
print("======================================")
if (item.tag_name == 'p'): #Separate P tags
for status in statuses:
if(status in item.text): #Filter P tags to only display info that contains words in the Status array
delimeters = ":", "(", "See"
regexPattern = "|".join(map(re.escape, delimeters))
zoneData = re.split(regexPattern, item.text)
#Split P tags into separate parts
sub1 = zoneData[0]
sub2 = zoneData[1].translate({ord('*'): None})
sub3 = zoneData[2].translate({ord(")"): None})
print(sub1)
print(sub2)
print(sub3)
The final option I've decided to try is to try going through all the HTML again, but using enumerate() and using the element's IDs and including all the tags between the 2 IDs, but I'm not really sure what my plan of action is with this just yet.
In general, the last option seems a bit convoluted and I'm pretty certain there's a simpler way to do this. What would you suggest?
Here's my idea, but I didn't do the data part, you can add it later.
I assume that there's no duplicate in main name, or else you will lose some info.
items = browser.find_element_by_xpath(
'//*[#id="main-content"]') #Main Content
itemList = items.find_elements_by_xpath(".//p|.//h3") # only finds h3 or p
def construct(item_list):
current_main = ''
final_dict: dict = {}
for item in item_list:
if item.tag_name == "h3":
current_main = item.text
final_dict[current_main] = {} # create empty dict inside main. remove if you want to update the main dict
if item.tag_name == "p":
p_name = item.text
final_dict[current_main][p_name] = "data"
return final_dict
Given a list of paths as:
'alpha/beta/gamma/delta alpha/beta/sigma beta/phi/pi/rho'
I want to Print it as:
-alpha
-beta
-gamma
delta
-sigma
-beta
-phi
-pi
rho
Can you please help me out with this?
I was able to make a list of dictionaries of dictionaries. (I am kinda lost here)
There are simpler ways to do this where I can directly print the data but I want to do it in a structure such that I might be able to use this data somewhere else too.
paths = 'alpha/beta/gamma/delta alpha/beta/sigma b/f/g/h r/g/t/y q/w/er/rat'
folder_list = []
def get_children(ippath, e_dict):
remaining_path = '/'.join(ippath.split('/')[1:])
try:
splitted_path = ippath.split('/')[0]
if splitted_path:
e_dict[splitted_path] = {}
e_dict[splitted_path].update(get_children(remaining_path, e_dict[ippath.split('/')[0]]))
return e_dict
else:
return e_dict
except:
return remaining_path
for path in paths.split(' '):
end_dict = dict()
output = get_children(path, end_dict)
if output:
folder_list.append(output)
# final_list.update(output)
else:
continue
print(folder_list)
It gives me a list of nested dictionaries but still not what I want.
Thank you, I really appreciate the help
Are you fine with using another library? if so, dpath will work great for this.
It allows you to create dicts based on strings
https://pypi.org/project/dpath/
Here's a straightforward solution:
First, build a set of all distinct full paths, including the intermediate paths.
Sort the paths. This puts them in depth-first order, guaranteeing that a parent directory will always appear before its children.
Iterate through the paths, maintaining a stack:
Pop from the stack until you find the parent of the current path.
Print just the difference between the current path and its parent. The indentation level is determined by the length of the stack.
Push the current path to the stack.
To get the - symbols in the right place, we can keep track of which paths are leaf nodes in the tree. Here's the code:
def dir_tree(s):
paths = set()
for path in s.split():
parts = path.split('/')
is_leaf = True
while parts:
path = '/'.join(parts) + '/'
paths.add( (path, is_leaf) )
parts.pop()
is_leaf = False
stack = ['']
for path, is_leaf in sorted(paths):
while not path.startswith(stack[-1]):
stack.pop()
suffix = path[len(stack[-1]):-1]
tabs = len(stack) - 1
print('\t'*tabs + ('' if is_leaf else '-') + suffix)
stack.append(path)
Output:
-alpha
-beta
-gamma
delta
sigma
-beta
-phi
-pi
rho
I finally got it to work.. :)
Ron Serruya's suggested library helped me rethink my structure.
import json
paths = 'alpha/beta/gamma/delta alpha/beta/sigma beta/phi/pi/rho'
folder_list = {}
def get_children(ippath, e_dict):
remaining_path = '/'.join(ippath.split('/')[1:])
try:
splitted_path = ippath.split('/')[0]
if splitted_path:
e_dict[splitted_path] = {}
e_dict[splitted_path].update(get_children(remaining_path, e_dict[ippath.split('/')[0]]))
return e_dict
else:
return e_dict
except:
return remaining_path
def merge_dictionaries(new_dictionary, main_dictionary):
key = list(new_dictionary.keys())[0]
if list(new_dictionary[key].keys())[0] in list(main_dictionary[key].keys()):
merge_dictionaries(new_dictionary[key], main_dictionary[key])
else:
main_dictionary[key][list(new_dictionary[key].keys())[0]] = new_dictionary[key][list(new_dictionary[key].keys())[0]]
def main():
for path in paths.split(' '):
end_dict = dict()
output = get_children(path, end_dict)
if output:
if list(output.keys())[0] not in list(folder_list.keys()):
folder_list.update(output)
else:
merge_dictionaries(output, folder_list)
else:
continue
print(str(json.dumps(folder_list, sort_keys=True, indent=4, separators=('', ''))).replace('{', '').replace('}', ''))
main()
Gives Output:
"alpha"
"beta"
"gamma"
"delta"
"sigma"
"beta"
"phi"
"pi"
"rho"
Sorry for really bad structure of the code, I am up for suggestion to improve this structurally.
The code in the two if-blocks smells to violate DRY. How can it be written more generic?
selected_class = eval(choice) # bad (see comments)
selected_class = getattr(models, choice) # good (see comments)
records = selected_class.objects.all()
if (choice == 'Treatment'):
for record in records:
response.write(str(record.id) + ',' + str(record.available_hours) + '\n')
if (choice == 'Patient'):
for record in records:
response.write(str(record.id) + ',' + record.first_name + '\n')
I could write in each model (Treatment and Patient) a method 'make_csv'. But, there must be a better way.
A simple solution:
for record in records:
if choice == 'Treatment':
item = str(record.available_hours)
elif choice == 'Patient':
item = record.first_name
response.write('{},{}\n'.format(record.id, item))
Or, if you want a slightly more complex solution that avoids repeating the if:
choices_dict = {
'Treatment': 'available_hours',
'Patient': 'first_name',
}
record_field = choices_dict[choice]
for record in records:
item = getattr(record, record_field)
response.write('{},{}\n'.format(record.id, item))
It's also more flexible in case you may want to change or add options to choices_dict, but that may not be relevant.
I am facing a peculiar problem. I will describe in brief bellow
Suppose i have this piece of code -
class MyClass:
__postBodies = []
.
.
.
for the_file in os.listdir("/dir/path/to/file"):
file_path = os.path.join(folder, the_file)
params = self.__parseFileAsText(str(file_path)) #reads the file and gets some parsed data back
dictData = {'file':str(file_path), 'body':params}
self.__postBodies.append(dictData)
print self.__postBodies
dictData = None
params = None
Problem is, when i print the params and the dictData everytime for different files it has different values (the right thing), but as soon as the append occurs, and I print __postBodies a strange thing happens. If there are thee files, suppose A,B,C, then
first time __postBodies has the content = [{'body':{A dict with some
data related to file A}, 'file':'path/of/A'}]
second time it becomes = [{'body':{A dict with some data relaed to
file B}, 'file':'path/of/A'}, {'body':{A dict with some data relaed to
file B}, 'file':'path/of/B'}]
AND third time = [{'body':{A dict with some data relaed to file C},
'file':'path/of/A'}, {'body':{A dict with some data relaed to file C},
'file':'path/of/B'}, {'body':{A dict with some data relaed to file C},
'file':'path/of/C'}]
So, you see the 'file' key is working very fine. Just strangely the 'body' key is getting overwritten for all the entries with the one last appended.
Am i making any mistake? is there something i have to? Please point me to a direction.
Sorry if I am not very clear.
EDIT ------------------------
The return from self.__parseFileAsText(str(file_path)) call is a dict that I am inserting as 'body' in the dictData.
EDIT2 ----------------------------
as you asked, this is the code, but i have checked that params = self.__parseFileAsText(str(file_path)) call is returning a diff dict everytime.
def __parseFileAsText(self, fileName):
i = 0
tempParam = StaticConfig.PASTE_PARAMS
tempParam[StaticConfig.KEY_PASTE_PARAM_NAME] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = "text"
tempParam[StaticConfig.KEY_PASTE_PARAM_EXPIREDATE] = "N"
tempParam[StaticConfig.KEY_PASTE_PARAM_PRIVATE] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = ""
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = ""
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = ""
for line in fileinput.input([fileName]):
temp = str(line)
temp2 = temp.strip()
if i == 0:
postValues = temp2.split("|||")
if int(postValues[(len(postValues) - 1)]) == 0 or int(postValues[(len(postValues) - 1)]) == 2:
tempParam[StaticConfig.KEY_PASTE_PARAM_NAME] = str(postValues[0])
if str(postValues[1]) == '':
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = 'text'
else:
tempParam[StaticConfig.KEY_PASTE_PARAM_PASTEFORMAT] = postValues[1]
if str(postValues[2]) != "N":
tempParam[StaticConfig.KEY_PASTE_PARAM_EXPIREDATE] = str(postValues[2])
tempParam[StaticConfig.KEY_PASTE_PARAM_PRIVATE] = str(postValues[3])
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = StaticConfig.API_USER_KEY
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = StaticConfig.API_KEY
else:
tempParam[StaticConfig.KEY_PASTE_PARAM_USER] = StaticConfig.API_USER_KEY
tempParam[StaticConfig.KEY_PASTE_PARAM_DEVKEY] = StaticConfig.API_KEY
i = i+1
else:
if tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] != "" :
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = str(tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE])+"\n"+temp2
else:
tempParam[StaticConfig.KEY_PASTE_FORMAT_PASTECODE] = temp2
return tempParam
You are likely returning the same dictionary with every call to MyClass.__parseFileAsText(), a couple of common ways this might be happening:
__parseFileAsText() accepts a mutable default argument (the dict that you eventually return)
You modify an attribute of the class or instance and return that instead of creating a new one each time
Making sure that you are creating a new dictionary on each call to __parseFileAsText() should fix this problem.
Edit: Based on your updated question with the code for __parseFileAsText(), your issue is that you are reusing the same dictionary on each call:
tempParam = StaticConfig.PASTE_PARAMS
...
return tempParam
On each call you are modifying StaticConfig.PASTE_PARAMS, and the end result is that all of the body dictionaries in your list are actually references to StaticConfig.PASTE_PARAMS. Depending on what StaticConfig.PASTE_PARAMS is, you should change that top line to one of the following:
# StaticConfig.PASTE_PARAMS is an empty dict
tempParam = {}
# All values in StaticConfig.PASTE_PARAMS are immutable
tempParam = dict(StaticConfig.PASTE_PARAMS)
If any values in StaticConfig.PASTE_PARAMS are mutable, you could use copy.deepcopy but it would be better to populate tempParam with those default values on your own.
What if __postBodies wasn't a class attribute, as it is defined now, but just an instance attribute?