Python nested dictionary not keeping elements in order - python

I've been trying to create an input library for Selenium using the nested dictionary data type, and while at first it was working perfectly I am now realizing I have gotten myself into a position where I cannot be assured that my elements will stay in order (which is very necessary for this library).
Here is an example of how I am trying to structure this code:
qlib = {
'code_xp':
{'keywords':
{'javascript':0,
'web design':1,
'python':0},
'answer':
{'4',
'yes'}}
}
for prompt, info in qlib.items()
for t, i in enumerate(list(info['answer'])):
if t == 0:
try:
print(i)
except:
pass
If you run this yourself, you will soon realize that after a few runs it will have rearranged the output from the list ['4', 'yes'], switching between ['4'] to ['yes']. Given that I depend on only referencing the first element for certain inputs ('4'), I can't allow this.
As for the 'keywords' section, I have used the structure i.e. 'javascript':0 as a necessary tag element for data processing. While this is not relevant for this problem, any solution would have to account for this. Here is my full data processing engine for those that would like to see the original context. Please note this comes before the 'for' loop listed above:
trs = 'translate(., "ABCDEFGHIJKLMNOPQRSTUVWXYZ", "abcdefghijklmnopqrstuvwxyz")'
input_type = ['numeric', 'text', 'multipleChoice', 'multipleChoice']
element_type = ['label', 'label', 'label', 'legend']
for index, item in enumerate(input_type):
print(f"Current input: {item}")
form_number = driver.find_elements(By.XPATH,'//' +element_type[index]+ '[starts-with(#for, "urn:li:fs_easyApplyFormElement")][contains(#for, "' +item+ '")]')
if form_number:
print(item)
for link in form_number:
for prompt, info in qlib.items():
keywords = info['keywords']
full_path = []
for i, word in enumerate(keywords):
path = 'contains(' +trs+ ', "' +word+ '")'
if i < len(keywords) - 1:
if keywords[word] == 1:
path += " and"
elif keywords[word] == 0:
path += " or"
full_path.append(path)
full_string = ' '.join(full_path)
answer = ' '.join(info['answer'])
I've been trying to find the right datatype for this code for a while now, and while this almost works perfectly the problem I'm facing makes it unusable. I've considered an OrderedDict as well, however I am not confident I can keep the structures that I depend on. Looking for anything that will work. Thank you so much for any help!

Related

Extract words from random strings

Below I have some strings in a list:
some_list = ['a','l','p','p','l','l','i','i','r',i','r','a','a']
Now I want to take the word april from this list. There are only two april in this list. So I want to take that two april from this list and append them to another extract list.
So the extract list should look something like this:
extract = ['aprilapril']
or
extract = ['a','p','r','i','l','a','p','r','i','l']
I tried many times trying to get the everything in extract in order, but I still can't seems to get it.
But I know I can just do this
a_count = some_list.count('a')
p_count = some_list.count('p')
r_count = some_list.count('r')
i_count = some_list.count('i')
l_count = some_list.count('l')
total_count = [a_count,p_count,r_count,i_count,l_count]
smallest_count = min(total_count)
extract = ['april' * smallest_count]
Which I wouldn't be here If I just use the code above.
Because I made some rules for solving this problem
Each of the characters (a,p,r,i and l) are some magical code elements, these code elements can't be created out of thin air; they are some unique code elements, that has some uniquw identifier, like a secrete number that is associated with them. So you don't know how to create this magical code elements, the only way to get the code elements is to extract them to a list.
Each of the characters (a,p,r,i and l) must be in order. Imagine they are some kind of chains, they will only work if they are together. Meaning that we got to put p next to and in front of a, and l must come last.
These important code elements are some kind of top secrete stuff, so if you want to get it, the only way is to extract them to a list.
Below are some examples of a incorrect way to do this: (breaking the rules)
import re
word = 'april'
some_list = ['aaaaaaappppppprrrrrriiiiiilll']
regex = "".join(f"({c}+)" for c in word)
match = re.match(regex, text)
if match:
lowest_amount = min(len(g) for g in match.groups())
print(word * lowest_amount)
else:
print("no match")
from collections import Counter
def count_recurrence(kernel, string):
# we need to count both strings
kernel_counter = Counter(kernel)
string_counter = Counter(string)
effective_counter = {
k: int(string_counter.get(k, 0)/v)
for k, v in kernel_counter.items()
}
min_recurring_count = min(effective_counter.values())
return kernel * min_recurring_count
This might sounds really stupid, but this is actually a hard problem (well for me). I originally designed this problem for myself to practice python, but it turns out to be way harder than I thought. I just want to see how other people solve this problem.
If anyone out there know how to solve this ridiculous problem, please help me out, I am just a fourteen-year-old trying to do python. Thank you very much.
I'm not sure what do you mean by "cannot copy nor delete the magical codes" - if you want to put them in your output list you will need to "copy" them somehow.
And btw your example code (a_count = some_list.count('a') etc) won't work since count will always return zero.
That said, a possible solution is
worklist = [c for c in some_list[0]]
extract = []
fail = False
while not fail:
lastpos = -1
tempextract = []
for magic in magics:
if magic in worklist:
pos = worklist.index(magic, lastpos+1)
tempextract.append(worklist.pop(pos))
lastpos = pos-1
else:
fail = True
break
else:
extract.append(tempextract)
Alternatively, if you don't want to pop the elements when you find them, you may compute the positions of all the occurences of the first element (the "a"), and set lastpos to each of those positions at the beginning of each iteration
May not be the most efficient way, although code works and is more explicit to understand the program logic:
some_list = ['aaaaaaappppppprrrrrriiiiiilll']
word = 'april'
extract = []
remove = []
string = some_list[0]
for x in range(len(some_list[0])//len(word)): #maximum number of times `word` can appear in `some_list[0]`
pointer = i = 0
while i<len(word):
j=0
while j<(len(string)-pointer):
if string[pointer:][j] == word[i]:
extract.append(word[i])
remove.append(pointer+j)
i+=1
pointer = j+1
break
j+=1
if i==len(word):
for r_i,r in enumerate(remove):
string = string[:r-r_i] + string[r-r_i+1:]
remove = []
elif j==(len(string)-pointer):
break
print(extract,string)

Extracting multiple data from a single list

I working on a text file that contains multiple information. I converted it into a list in python and right now I'm trying to separate the different data into different lists. The data is presented as following:
CODE/ DESCRIPTION/ Unity/ Value1/ Value2/ Value3/ Value4 and then repeat, an example would be:
P03133 Auxiliar helper un 203.02 417.54 437.22 675.80
My approach to it until now has been:
Creating lists to storage each information:
codes = []
description = []
unity = []
cost = []
Through loops finding a code, based on the code's structure, and using the code's index as base to find the remaining values.
Finding a code's easy, it's a distinct type of information amongst the other data.
For the remaining values I made a loop to find the next value that is numeric after a code. That way I can delimitate the rest of the indexes:
The unity would be the code's index + index until isnumeric - 1, hence it's the first information prior to the first numeric value in each line.
The cost would be the code's index + index until isnumeric + 2, the third value is the only one I need to store.
The description is a little harder, the number of elements that compose it varies across the list. So I used slicing starting at code's index + 1 and ending at index until isnumeric - 2.
for i, carc in enumerate(txtl):
if carc[0] == "P" and carc[1].isnumeric():
codes.append(carc)
j = 0
while not txtl[i+j].isnumeric():
j = j + 1
description.append(" ".join(txtl[i+1:i+j-2]))
unity.append(txtl[i+j-1])
cost.append(txtl[i+j])
I'm facing some problems with this approach, although there will always be more elements to the list after a code I'm getting the error:
while not txtl[i+j].isnumeric():
txtl[i+j] list index out of range.
Accepting any solution to debug my code or even new solutions to problem.
OBS: I'm also going to have to do this to a really similar data font, but the code would be just a sequence of 7 numbers, thus harder to find amongst the other data. Any solution that includes this facet is also appreciated!
A slight addition to your code should resolve this:
while i+j < len(txtl) and not txtl[i+j].isnumeric():
j += 1
The first condition fails when out of bounds, so the second one doesn't get checked.
Also, please use a list of dict items instead of 4 different lists, fe:
thelist = []
thelist.append({'codes': 69, 'description': 'random text', 'unity': 'whatever', 'cost': 'your life'})
In this way you always have the correct values together in the list, and you don't need to keep track of where you are with indexes or other black magic...
EDIT after comment interactions:
Ok, so in this case you split the line you are processing on the space character, and then process the words in the line.
from pprint import pprint # just for pretty printing
textl = 'P03133 Auxiliar helper un 203.02 417.54 437.22 675.80'
the_list = []
def handle_line(textl: str):
description = ''
unity = None
values = []
for word in textl.split()[1:]:
# it splits on space characters by default
# you can ignore the first item in the list, as this will always be the code
# str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296
if not word.replace(',', '').replace('.', '').isnumeric():
if len(description) == 0:
description = word
else:
description = f'{description} {word}' # I like f-strings
elif not unity:
# if unity is still None, that means it has not been set yet
unity = word
else:
values.append(word)
return {'code': textl.split()[0], 'description': description, 'unity': unity, 'values': values}
the_list.append(handle_line(textl))
pprint(the_list)
str.isnumeric() doesn't work with floats, only integers. See https://stackoverflow.com/a/23639915/9267296

How to append an item to a dictionary in a list?

Hello I a developing a text based game and I'm struggling with the drop command ; the way it works is you write "D name of the item" then it checks if the item is actually in the inventory and if it is, it puts it in a variable, deletes it from the inventory and I want it to append to the content indice of a room (a dictionary) and that dictionary is in a list, and I can't append to it.
this is the code(some of it):
room = []
room.append({'number': 1, 'content': ""})
roomnumber = 1
inv = ["sword"]
command = input(": ")
first_letter = command(0)
if first_letter == "D":
item = command.split(" ", 2)
item.remove("D")
for i in range(0, len(inv):
inv.pop(i)
#this doesn't work
` room[roomnumber]['content'].append(item[0])`
item.pop(0)
After I have entered: "D sword", it gives me this error:
Traceback (most recent call last):
File "/Users/antony/PycharmProjects/TextBased/Main.py", line 54, in <module>
room[roomnumber]['content'].append(item[0])
AttributeError: 'str' object has no attribute 'append'
I don't get it, please help !
Do you want a room to be able to contain more than one thing? If so, make your content field a list rather than a string:
room.append({'number': 1, 'content': []})
Now you can append any number of things to the content.
room = []
room.append({'number': 1, 'content': ""})
## room = [{'number': 1, 'content': ""}]
roomnumber = 1
## you should actually change this to 0. Otherwise you will get an "index out
## of range" error (see comment below)
inv = ["sword"]
command = input(": ")
first_letter = command(0)
if first_letter == "D":
item = command.split(" ", 2)
## why are you adding a max split here??
item.remove("D")
for i in range(0, len(inv)):
## equals for i in range(0, 5):, so iterates 0,1,2,3,4
## you forgot to add a closing bracket for the range
inv.pop(i)
## what are you trying to do here? This looks strange to me, given that after each
## pop, the value of your inv will be shortened..?
#this doesn't work
room[roomnumber]['content'].append(item[0])
## this does not work because your room list only contains one element, a
## dictionary, at index position 0. Probably THIS is your biggest issue here.
## Second, you're trying to change the value of 'content'. To change the value of a
## dictionarie's key, you need to use "=", and not ".append", as ".append" is used
## to append an element at the end of a list, to a list, and not to a dictionary.
## So, use room[roomnumber]['content'] = item[0]
item.pop(0)
From what I understood, you want to add a content, in function of the roomnumber, to the content value of the dictionary of the corresponding room number. In this case, your whole syntax is wrong, and you must rather use:
room = []
room.append({1:""})
## and then, after the rest of the code:
room[roomnumber-1][roomnumber] = item[0]
or, even simpler, given that the simultaneous use of lists and dictionaries is actually obsolete here
## initiate one dictionary, containing all the rooms in key = roomnumber
## value = content pairs
rooms = {}
## the syntax to add a new room number with a new content into the dictionary
## would then simply be: rooms[number] = content, e.g.:
rooms[1] = ""
## to then set the value of the content for a given roomnumber, you simply use
rooms[roomnumber] = item[0]
I recommend you to learn about the basic differences in between lists and dictionaries in python, you seem to lack some basic understanding of how elements of lists and dictionaries are accessed / modified (no offense of course)
ok thanks a lot guys :), I just wanted to say that the reason that some things are a bit weird is because that is only about 20 % of the code ( there are more rooms for example witch is why I needed to use dictionaries in a list) and my room number does start with 0 in my actual code :), the pop is to remove the item from the inventory (because this is the drop command) and I remove it from the item variable just to be safe that it doesn't cause any unwanted bugs. and otherwise, yes the content WAS supposed to be a list, I forgot, thanks for pointing it out and the for loop is actually closed it ( I was just in a bit of a rush when I wrote this ). Anyway, thanks everyone :)

Filtering results from lists Python

I have been trying to find the answer around the web without any results.
I am trying to create a system where a user can search through lists and return their subjects and grades, with a filter to only show subjects from one area (for example Informational science) and also filtering the level of the subject(if it is a 100lvl, 200lvl or 300lvl) I have tried with Sub_string but doesnt work properly.
So the view code i have so far(With sub_string) is this:
def finn():
global Karakterer
global Emner
print("Velg fag og/eller emnenivå (<enter> for alle)")
Fag = input("-Fag: ")
for sub_string in Emner:
if str(Fag) in sub_string:
print(*([sub_string] + ([Karakterer[sub_string]] if sub_string in Karakterer else [])))
these are my lists (converted to Dicts for it to work)
Emner = ["INFO100","INFO104","INFO110","INFO150","INFO125", "RELV102"]
FagKoder = [["Informasjonsvitenskap","INF"],["Kognitiv vitenskap","Kog"],
["Religionsvitenskap","REL"],["DigitalKultur","DIK"],["Økonomi","ECO"]]
Karakterer=[["INFO100","C"],["INFO104","B"],["INFO110","E"], ["RELV102","A"]]
Karakterer=dict(Karakterer)
FagKoder = dict(FagKoder)
This is how it is printed out now, and is the way i need it to be printed:
My problem is that Sub_string dosent work properly for what i need, because i need to be able to allow the user to select an Area (INFO for example) and aswell a spesific level so (level 200) and then print out all INFO subjects at level 200.
But sub_string only litteraly checks that the string is contained in the list and prints that.
Does anyone have a better soluting?
hope that makes sense
Thank you!
A minimal fix might be to split out the number from the end and compare that separately.
def finn():
global Karakterer # ugh
global Emner # ugh
want_subj = input("Velg fag (<enter> for alle): ")
want_level = input("Velg emnenivå (<enter> for alle): ")
try:
want_level = int(want_level)
except ValueError:
want_level = None
for subject in Emner:
# no need for str(Fag); input by definition returns a string
if want_subj in subject:
if not want_level or int(subject[-3:]) == want_level:
print(*([sub_string] + ([Karakterer[sub_string]] if sub_string in Karakterer else [])))
A better solution might be to store the courses and their level as separate items so you don't have to parse out the number when you need it. (As an aside, you should not assign to a list and then recast as a dict when you can easily define a dict directly.)
Emner = [("INFO",100),("INFO",104),("INFO",110),("INFO",150),("INFO",125, ("RELV",102)]
FagKoder = {
"INF": "Informasjonsvitenskap",
"Kog": "Kognitiv vitenskap",
"REL": "Religionsvitenskap",
"DIK": "DigitalKultur"
"ECO": "Økonomi"
}
It should be fairly obvious how to adapt the code to work with these structures instead.
(As an aside, you seem to have "RELV" in Emner but "REL" in FagKoder.)

Processing a sub-list of variable size within a larger list

I'm a biological engineering PhD student here trying to self-learn Python programming for use in automating a part of my research, but I've ran into a problem with processing sub-lists within a bigger list that I can't seem to solve.
Basically, the goal of what I'm trying to do is write a small script that will process a CSV file containing a list of plasmid sequences that I'm building using various DNA assembly methods, and then spit out the primer sequences that I need to order in order to build the plasmid.
Here's the scenario that I'm dealing with:
When I want to build a plasmid, I have to enter into my Excel spreadsheet the full sequence of that plasmid. I have to choose between two DNA assembly methods, called "Gibson" and "iPCR". Each "iPCR" assembly only requires one line in the list, so I know how to process those guys already, as I just have to put in one cell the full sequence of the plasmid I'm trying to build. "Gibson" assemblies, on the other hand, require that I have to split up the full DNA sequence into smaller chunks, so sometimes I need 2-5 lines within the Excel spreadsheet to fully describe one plasmid.
So I end up with a spreadsheet that sort of ends up looking like this:
Construct.....Strategy.....Name
1.....Gibson.....P(OmpC)-cI::P(cI)-LacZ controller
1.....Gibson.....P(OmpC)-cI::P(cI)-LacZ controller
1.....Gibson.....P(OmpC)-cI::P(cI)-LacZ controller
2.....iPCR.......P(cpcG2)-K1F controller with K1F pos. feedback
3.....Gibson.....P(cpcG2)-K1F controller with swapped promoter positions
3.....Gibson.....P(cpcG2)-K1F controller with swapped promoter positions
4.....iPCR.......P(cpcG2)-K1F controller with stronger K1F RBS library
I think the list at this length is representative enough.
So the problem I'm running into is, I'd like to be able to run through the list and process the Gibsons, but I can't seem to get the code to work the way I want. Here's the code I've written so far:
#import BioPython Tools
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
#import csv tools
import csv
import sys
import os
with open('constructs-to-make.csv', 'rU') as constructs:
construct_list = csv.reader(constructs, delimiter=',')
construct_list.next()
construct_number = 1
primer_list = []
temp_list = []
counter = 2
for row in construct_list:
print('Current row is row number ' + str(counter))
print('Current construct number is ' + str(construct_number))
print('Current assembly type is ' + row[1])
if row[1] == "Gibson": #here, we process the Gibson assemblies first
print('Current construct number is: #' + row[0] + ' on row ' + str(counter) + ', which is a Gibson assembly')
## print(int(row[0]))
## print(row[3])
if int(row[0]) == construct_number:
print('Adding DNA sequence from row ' + str(counter) + ' for construct number ' + row[0])
temp_list.append(str(row[3]))
counter += 1
if int(row[0]) > construct_number:
print('Current construct number is ' + str(row[0]) + ', which is greater than the current construct number, ' + str(construct_number))
print('Therefore, going to work on construct number ' + str(construct_number))
for part in temp_list: #process the primer design work here
print('test')
## print(part)
construct_number += 1
temp_list = []
print('Adding DNA from row #' + str(counter) + ' from construct number ' + str(construct_number))
temp_list.append(row)
print('Next construct number is number ' + str(construct_number))
counter += 1
## counter += 1
if str(row[1]) == "iPCR":
print('Current construct number is: ' + row[0] + ' on row ' + str(counter) + ', which is an iPCR assembly.')
#process the primer design work here
#get first 60 nucleotides from the sequence
sequence = row[3]
fw_primer = sequence[1:61]
print('Sequence of forward primer:')
print(fw_primer)
last_sixty = sequence[-60:]
## print(last_sixty)
re_primer = Seq(last_sixty).reverse_complement()
print('Sequence of reverse primer:')
print(re_primer)
#ending code: add 1 to counter and construct number
counter += 1
construct_number += 1
## if int(row[0]) == construct_number:
## else:
## counter += 1
## construct_number += 1
## print(temp_list)
## for row in temp_list:
## print(temp_list)
## print(temp_list[-1])
# fw_primer = temp_list[counter - 1].
(I know the code probably looks noob - I've never done any programming class beyond introductory Java.)
The problem with this code is that if I have n "constructs" (a.k.a. plasmids) that I'm trying to build by "Gibson" assembly, it will process the first n-1 plasmids, but not the last one. I also can't think of any better way to write this code, however, but I can see that for the workflow that I'm trying to implement, knowing how to process "n" things in a list, but with each "thing" of variable numbers of rows, would come in really handy for me.
I'd really appreciate anybody's help here! Thanks a lot!
The problem with this code is that if I have n "constructs" (a.k.a. plasmids) that I'm trying to build by "Gibson" assembly, it will process the first n-1 plasmids, but not the last one.
This is actually a general problem, and the simplest way around it is to add a check after the loop, like this:
for row in construct_list:
do all your existing code
if we have a current Gibson list:
repeat the code to process it.
Of course you don't want to repeat yourself… so you move that work into a function, which you call in both places.
However, I'd probably write this differently, using groupby. I know this will probably seem "way too advanced" at first glance, but it's worth trying to see if you can understand it, because it makes things a lot simpler.
def get_strategy(row):
return row[0]
for group in itertools.groupby(construct_list, key=get_strategy):
Now, you'll get each construct as a separate list, so you don't need the temp_list at all. For example, the first group will be:
[[1, 'Gibson', 'P(OmpC)-cI::P(cI)-LacZ controller'],
[1, 'Gibson', 'P(OmpC)-cI::P(cI)-LacZ controller'],
[1, 'Gibson', 'P(OmpC)-cI::P(cI)-LacZ controller']]
The next will be:
[[2, 'iPCR', 'P(cpcG2)-K1F controller with K1F pos. feedback']]
And there won't be a left-over group at the end to worry about.
So:
for group in itertools.groupby(construct_list, key=get_strategy):
construct_strategy = get_strategy(group[0])
if construct_strategy == "Gibson":
# your existing code, using group instead of temp_list,
# and no need to maintain temp_list at all
elif construct_strategy == 'iPCR":
# your existing code, using group[0] instead of row
Once you get over the abstraction hurdle, it's a whole lot simpler to think about the problem this way.
In fact, once you start to grasp iterators intuitively, you'll start finding that itertools (and the recipes on its docs page, and the third-party library more_itertools, and similar code you can write yourself) turn a lot of complicated questions into very simple ones. The answer to "How do I keep track of the current group of matching rows within a list of rows?" is "Keep a temporary list, and remember to check it every time the group changes and then check again at the end for leftovers", but the answer to the equivalent question "How do I transform row iteration into row-group iteration?" is "Wrap the iterator in groupby."
You also might want to add in an assert or other check that all(row[1] == construct_strategy for row in group[1:]), that len(group) == 1 in the iPCR case, that there is no unexpected third strategy, etc., so when you inevitable run into an error, it'll be easier to tell whether it was bad data or bad code.
Meanwhile, instead of using a csv.reader, skipping the first row, and referring to the columns by meaningless numbers, it might be better to use a DictReader:
with open('constructs-to-make.csv', 'rU') as constructs:
primer_list = []
def get_strategy(row):
return row["Strategy"]
for group in itertools.groupby(csv.DictReader(constructs), key=get_strategy):
# same as before, but with
# ... row["Construct"] instead of row[0]
# ... row["Strategy"] instead of row[1]
# ... row["Name"] instead of row[2]
Just some general coding help with python. If you haven't read PEP8 do so.
To maintain clear code it can be helpful to assign variables to fields referenced in a record/row.
I would add something like this for any field referenced:
construct_idx = 0
Also, I would recommend using string formatting, it's cleaner.
So:
print('Current construct number is: #{} on row {}, which is a Gibson assembly'.format(row[construct_idx], counter))
Instead of:
print('Current construct number is: #' + row[0] + ' on row ' + str(counter) + ', which is a Gibson assembly')
If you're creating a csv reader object, making it's variable name "*_list" can be miss-leading. Calling it "*_reader" is more intuitive.
construct_reader = csv.reader(constructs, delimiter=',')
Instead of:
construct_list = csv.reader(constructs, delimiter=',')

Categories