Use list/set comprehension merely as a "for" loop? - python

I am creating a set of NUM_RECORDS tuples in Python. This is my code.
record_key_list = {(choice(tuple(studentID_list)),
choice(tuple(courseID_list)),
randint(2012, 2016),
choice(semesters),
choice(grades)[0])
for no_use in range(NUM_RECORDS)}
An alternative is to code the problem like this.
record_key_list = set()
while len(record_key_list) < NUM_RECORDS:
record_key_list.add((choice(tuple(studentID_list)),
choice(tuple(courseID_list)),
randint(2012, 2016),
choice(semesters),
choice(grades)[0]))
I timed the two code snippets and they are roughly the same as fast for 20000 records. I prefer the first version of the code stylistically.
Is the first version of the code a correct usage of set comprehension? Or should I always stick to the second method?
EDIT: Improved formatting as suggested. I mostly just copied and pasted from the IDE. Sorry about that, guys.

The first code snippet looks totally fine. If anything, I would extract the record creation to a function for clarity and easier refactoring.
def random_record():
studentID = choice(studentID_list)
courseID = choice(courseID_list)
year = randint(2012, 2016)
semester = choice(semesters)
grade = choice(grades)[0]
return (studentID, courseID, year, semester, grade)
# ...
record_key_list = {random_record() for _ in range(NUM_RECORDS)}

Related

Finding the line number of specific values in pandas dataframe - python

I am working on a school project and I am trying to simulate a library's catalogue system. I have .csv files that hold all the data I need but I am having a problem with checking if an inputted title, author, bar code, etc. is in the data set. I have searched around for quite a while trying different solutions but nothing is working.
The idea that I have right now is that if I can find at what line the inputted data, then I can use .loc[] to get the needed info.
Is this the right track? is there another, more efficient way to do this?
import pandas
mainData = pandas.read_csv("mainData.csv")
barcodes = mainData["Barcode"]
authors = mainData["Author"]
titles = mainData["Title/Subtitle"]
callNumbers = mainData["Call Number"]
k = "Han, Jenny,"
for i in authors:
if k == i:
print("Success")
k = authors.index[k]
print(authors[k])
else:
print("Fail" + k)
# Please Note: This code only checks for an author match and has all other fields left out as I thought this code was too inefficient to add the rest of the fields. The code also does not find the line on witch the matched are located, therefore .loc[] can not be used to print out all the data found.
This is the code I am using right now, It outputs the result along with an error Python IndexError: only integers, slices (\`:\`), ellipsis (\`\...\`), numpy.newaxis (\`None\`) and integer or boolean arrays are valid indices and is very slow. I would like the code to be able to output the books and their respective info. I have found the the .loc[] feature (mentioned above) outputs the info quite nicely. Here is the data I am using .
Edit: I have been able to reduce the time it takes for the program to run and made a functional "prototype"
authorFirst = authorFirst.lower()
authorFirst = authorFirst.title()
authorFirst += ","
authorSecond = input("Enter author's last name: ")
authorSecond = authorSecond.lower()
authorSecond = authorSecond.title()
authorSecond += ", "
authorInput = authorSecond + authorFirst
print(mainData[mainData["Author"].isin([authorInput])])
bookChoice = input("Please Enter the number to the left of the barcode to select a book: ")
print(mainData.loc[int(bookChoice)])
id provides the functionality that I am looking for but I feel that there has to be a better way of doing it. (Not asking the user to input the row number). Idk if this is possible tho.
I am new to python and this is my first time using pandas so i'm sorry if this is really shitty and hurts your brain.
Thank-you so much for your time!
Pandas does not really need to find the numeric index of something, to do indexing.
Since you have not provided any starting point or data, I'll just provide a few pointers here as there are mans ways to match and index things in pandas.
import pandas as pd
# build a library
library = pd.DataFrame({
"Author": ["H.G. Wells", "Hubert Selby Jr.", "Ken Kesey"],
"Title": [
"The War of the Worlds",
"Requiem for a Dream",
"One Flew Over the Cuckoo's Nest",
],
"Published": [1898, 1979, 1962],
})
# find on some characteristics
mask_wells = library.Author.str.contains("Wells")
mask_rfad = library["Title"] == "Requiem for a Dream"
mask_xixth = library["Published"] < 1900

Eliminate Indentations in Python

I'm using the Google Docs API to retrieve the contents of a document and process it using Python. However, the document is of complex structure and I have to loop through multiple nodes of the returned JSON, so I have to use multiple for loops to get the desired content and do the filter necessary. Is there a way that I can eliminate some of the indentations to make the format look much more organized?
Here is a snippet of my loops:
for key, docContent in docs_api_result.json().items():
if key == "body":
content = docContent['content']
for i, body_content in enumerate(content):
if "table" in body_content:
for sKey, tableContent in content[i]['table'].items():
if sKey == "tableRows":
for tableRowContent in tableContent:
for tableCellMain in tableRowContent['tableCells']:
for tableCellContent in tableCellMain['content']:
hasBullet = False
for tableCellElement in tableCellContent['paragraph']['elements']:
if "bullet" in tableCellContent['paragraph']:
...
I know that instead of having
if True:
# some code here
I can replace it with
if False:
continue
# some code here
to remove some of the indents, but that only solves part of the problem. I still have 7 for-loops left and I hope that I could remove some of the indentations as well.
Any help is appreciated! :)
The general method for reducing indentation levels would be to identify blocks of code to go in their own functions.
E.g. looking at your loop, I guess I would try something like:
class ApiResultProcessor(object):
def process_api_result(self, api_result):
doc_dict = api_result.json()
if "body" in doc_dict:
self.process_body(doc_dict["body"])
def process_body(self, body_dict):
content = body_dict["content"]
for i, content_element_dict in enumerate(content):
if "table" in content_element_dict:
self.process_table(content_element_dict["table"])
...
def process_table(self, table_dict):
for tableRowContent in table_dict["tableRows"]:
for tableCellMain in tableRowContent["tableCells"]:
for tableCellContent in tableCellMain['content']:
self.process_cell_content(tableCellContent)
def process_cell_content(self, table_cell_dict):
hasBullet = False
for tableCellElement in table_cell_dict["paragraph"]["elements"]:
if "bullet" in table_cell_dict["paragraph"]:
...
The only refactoring that I have done is trying to avoid the dreadful "for if" antipattern.
I am not that experienced with python, but I am pretty sure you can use only one space and not multiple of four every indentation and it won't be an indentation error. Although it is not according to the PEP 8 protocol...
So just remove every 4 spaces/tab you have in this bunch of code, to 1 space.

How to create a dataframe with a dynamic text parameter passed to a function in Python

(I'm pretty new to Python,(and even to coding)forgive me for my stupidity.)
I'm trying to pass a text value and a list as parameters to a function. Here's an example :
Names = File['Student_Name']
Scores = File['Marks']
for a in range(0,100):
Student_Name = [Names[a]]
Marks = []
NewDf = pd.DataFrame(PreCovid(Student_Name,Marks))
Master_Sheet_PreCovid = NewDf
Master_Sheet_PreCovid
What I wish to achieve is passing Name of a Student, as a string, one at a time, to the function. In this code, I'm vaguely creating a df with each loop iteration, which obviously will only return me the last value, however, I wish to get the output for complete list of Students. What modifications/additions do I make in this code to make it work.
I followed this thread, Why the function is only returning the last value? , which was similar to my query, however might not work with my requirements.
Edited : I actually have 2 sheets that I'm fetching my data from,one is a Main Sheet,that has all the data with redundancy,I've a Rule book with unique values and the rules for calculation.In this code I'm only fetching values from Rule Book,then going to the function,fetching data based on these values from Main Sheet,performing my calculations,creating a new dataframe,inserting the values I get here into that dataframe as well,and return the Final dataframe.Right now, the calculation tested based only on Student_Name has worked, but now I've a bigger problem of calculating also based on Marks.
At the risk of sounding arrogant, I only wish to pass the name as string, not as list.
Again, I'm sorry about the stupidity of my query.
Give it a try:
Names = File['Student_Name']
Scores = File['Marks']
Master_Sheet_PreCovid = []
for a in range(0,100):
Student_Name = [Names[a]]
Marks = []
NewDf = pd.DataFrame(PreCovid(Student_Name,Marks))
Master_Sheet_PreCovid.append(NewDf)
Master_Sheet_PreCovid = pd.concat(Master_Sheet_PreCovid)
print(Master_Sheet_PreCovid)

Why can't I change items in a Python list using a step?

So I am still in the beginning stages of learning Python--and coding as a whole for that matter.
My question is why can I not change items in a Python list using a step, like this:
def myfunc2(string):
new = list(string)
new[0::2] = new[0::2].upper()
new[1::2] = new[1::2].lower()
return '' .join(new)
myfunc2('HelloWorld')
I want to make every other letter upper and lower case, starting at index 0.
I had already seen a solution posted by another user, and although this other solution worked I had trouble understanding the code. So, in my curiosity I tried to work out the logic myself and came up with the above code.
Why is this not working?
new[0::2] returns a list, which does not have an upper method . Same goes for new[1::2] and lower.
You can achieve your goal with map:
def myfunc2(string):
new = list(string)
new[0::2] = map(str.upper, new[0::2])
new[1::2] = map(str.lower, new[1::2])
return '' .join(new)
print(myfunc2('HelloWorld'))
# HeLlOwOrLd

Python, loop that gets a value then tests that value again

This is my first time asking here. I tried searching for an answer, but wasn't certain how to phrase what I need so I decided to ask.
I am working on a character creator for a tabletop RPG. I want to get the results for the character's previous occupation, which are on a list, then test that value again to get the occupation previous to that.
I already have a way of getting the first occupation, which is then compared with a text database, with entries such as:
Captain ,Explorer,Knight,Sergeant,
Where Captain is the first occupation and the commas mark the beginning and the end of the possible previous occupations. I have managed to get one of those randomly, but I haven't been able to make the loop then take the selected occupation and run it again. For example:
Explorer ,Cartographer,
Here's the simplified version of my code. It gets the first part right, but I'm not sure how to trigger a loop for the next.
import random
def carOld(carrera,nivPoder):
carActual=carrera
u=0
indPoder=int(nivPoder)
carAnterior=[]
commas=[]
entTemp=[]
d=open("listaCarreras.txt","r")
f=(d.readlines())
while indPoder!=0:
indPoder=indPoder-1
for line in f:
if carActual in line:
entTemp=line.split(",")
d.close
del entTemp[0]
del entTemp[-1]
print (entTemp)
carAnterior=random.choice(entTemp)
I think this. I believe based on your description that the current occupation is in the front of the list, and the previous occupations are next in the list.
str_occs = 'Occ1,Occ2,Occ3'
list_occs = str_occs.split(',')
def prev_occ(occupation, list_occs):
prev_occ_index = list_occs.index(occupation) + 1
try:
ret_val = list_occs[prev_occ_index]
except:
ret_val = "No prior occupations."
return ret_val
You can try it out here: https://repl.it/B08A

Categories