Python 3 Multidimentional List populated by existing MySQL table

Python 3 Multidimentional List populated by existing MySQL table - python

I have an existing MySQL database that needs to change in rows with out rewriting code. I have referenced stackexchange Two dimensional array in python. Attempted those corrections (Examples maybe commented out). All login information HAS been changed.
import mysql.connector # Import MySQL Connector
import os
from sys import argv # Allow Arguments to be passed to this script
# DEFINE Global Variables
dataVar=[[],[]] # looked on stackexchange - failed
#dataVar=[] # Originally Did this - failed
#dataVar.append([]) # And This
#for i in range(20): dataVar.append(i) # Added this - failed
# MySQL Column Names -- Python doesn't allow string index of Lists.
UUID=1; # Unique Unit IDentifier
Name=10;
Type=3;
Description=11; # Written Description
V2=18;
ImgUrl=15; # Custom Icon
InStock=4; # Is it in stock 'Yes' or 'No'
New=2; # Display New Placard 'Yes' or 'No'
Damage=5; # Display Damage Warning 'Yes' or 'No'
Tankable=12; # Display Tank Warning 'Yes' or 'No'
Display=19; # Display If in stock 'Yes' or 'No'
Mix=16; # How to mix
Customer=17; # Customer Name
Profile=13; # Searchable keywords not in Description
ml005=14; # Prices u=Unavailble s=Standard
ml015=6; # Prices u=Unavailble s=Standard
ml030=7; # Prices u=Unavailble s=Standard
ml120=8; # Prices u=Unavailble s=Standard
Shots=9; # Prices u=Unavailble s=Standard
price005='$2.50';
price015='$5.00';
price030='$10.00';
price120='u'; # No 120ml Bottles
def fetchData():{
global totalCount # totalCount able to be written to in this function
global dataVar # Data able to be written to in this function
MySQLcon=mysql.connector.connect(user='u', password='p', host='localhost', database='d')
if (MySQLcon):
cursor=MySQLcon.cursor()
query="SELECT * FROM eJuice ORDER BY Name";
cursor.execute(query)
results=cursor.fetchall()
MySQLcon.close
totalCount=0;
for row in results: { # Fetch eJuice data
dataVar[UUID].append(row[0]);
dataVar[New].append(row[1]);
dataVar[Type].append(row[2]);
dataVar[InStock].append(row[3]);
dataVar[Damage].append(row[4]);
dataVar[ml015].append(row[5]);
dataVar[ml030].append(row[6]);
dataVar[ml120].append(row[7]);
dataVar[Shots].append(row[8]);
dataVar[Name].append(row[9]);
dataVar[Description].append(row[10]);
dataVar[Tankable].append(row[11]);
dataVar[Profile].append(row[12]);
dataVar[ml005].append(row[13]);
dataVar[ImgUrl].append(row[14]);
dataVar[Mix].append(row[15]);
dataVar[Customer].append(row[16]);
dataVar[V2].append(row[17]);
dataVar[Display].append(row[18]);
totalCount+=1;
}# End for row in results
}# End with MySQLcon
return
}# End fetchData()
# Start Program
fetchData();
# Create Display Function
# End Program
CLI output from running above code:
$ python3 main.py
Traceback (most recent call last):
File "main.py", line 88, in <module>
fetchData(); # Used from MySQL.py
File "main.py", line 61, in fetchData
dataVar[UUID].append(row[0]);
AttributeError: 'int' object has no attribute 'append'

You need to define a list with as many lists within it as there are columns in the data-table. You can to this:
unitUUID = []
unitNew = []
unitType = []
...
dataVar[unitUUID, unitNew, unitType,...]
You can now proceed to append items to each list:
unitUUID.append(row[0])
and so forth. Note that this is pretty clear from the explanation of the Two dimensional array in Python. I suggest you read that article carefully.

Related

Pandas dataframe not returning the index using the loc method

I'm trying to retrieve the index of a row within a dataframe using the loc method and a comparison of data from another dataframe within a for loop. Maybe I'm going about this wrong, I dunno. Here's a bit of information to help give the problem some context...
The following function imports some inventory data into a pandas dataframe from an xlsx file; this seemingly works just fine:
def import_inventory():
import warnings
try:
with warnings.catch_warnings(record=True):
warnings.simplefilter("always")
return pandas.read_excel(config_data["inventory_file"],header=1)
except Exception as E:
writelog.error(E)
sys.exit(E)
The following function imports some data from a combination of CSV files, creating a singular dataframe to work from during comparison; this seemingly works just fine:
def get_report_results():
output_dir = f"{config_data['output_path']}/reports"
report_ids = []
......
...execute and download the report csv files
......
reports_content = []
for path,current_directory,files in os.walk(output_dir):
for file in files:
file_path = os.path.join(path,file)
clean_csv_data(file_path) # This function simply cleans up the CSV content (removes blank rows, removes unnecessary footer data); updates same file that was sent in upon successful completion
current_file_content = pandas.read_csv(file_path,index_col=None,header=7)
reports_content.append(current_file_content)
reports_content = pandas.concat(reports_content,axis=0,ignore_index=True)
return reports_content
The problems exist here, at the following function that is supposed to search the reports content for the existence of an ID value then grab that row's index so I can use it in the future to modify some columns, add some columns.
def search_reports(inventory_df,reports_df):
for index,row in inventory_df.iterrows():
reports_index = reports_df.loc[reports_df["Inventory ID"] == row["Inv ID"]].index[0]
print(reports_df.iloc[reports_index]["Lookup ID"])
Here's the error I receive upon comparison
Length of values (1) does not match length of index (4729)
I can't quite figure out why this is happening. If I pull everything out of functions the work seems to happen the way it should. Any ideas?
There's a bit more work happening to the dataframe that comes from import_inventory, but didn't want to clutter the question. It's nothing major - one function adds a few columns that splits out a comma-separated value in the inventory into its own columns, another adds a column based on the contents of another column.
Edit:
As requested, the full stack trace is below. I've also included the other functions that operate on the original inventory_df object between its retreival (import_inventory) and its final comparison (search_reports).
This function again operates on the inventory_df function, only this time it retrieves a single column from each row (if it has data) and breaks the semicolon-separated list of key-value pair tags apart for further inspection. If it finds one, it creates the necessary column for it and populates that row with the found value.
def sort_tags(inventory_df):
cluster_key = "Cluster:"
nodetype_key = "NodeType:"
project_key = "project:"
tags = inventory_df["Tags List"]
for index,tag in inventory_df.items():
if not pandas.isna(tag):
tag_keysvalues = tag.split(";")
if any(cluster_key in string for string in tag_keysvalues):
pair = [x for x in tag_keysvalues if x.startswith(cluster_key)]
key_value_split = pair[0].split(":")
inventory_df.loc[index, "Cluster Name"] = key_value_split[1]
if any(nodetype_key in string for string in tag_keysvalues):
pair = [x for x in tag_keysvalues if x.startswith(nodetype_key)]
key_value_split = pair[0].split(":")
inventory_df.loc[index, "Node Type"] = key_value_split[1]
if any(project_key in string for string in tag_keysvalues):
pair = [x for x in tag_keysvalues if x.startswith(project_key)]
key_value_split = pair[0].split(":")
inventory_df.loc[index, "Project Name"] = key_value_split[1]
return inventory_df
This function compares the new inventory DF with a CSV import-to-DF of the old inventory. It creates new columns based on old inventory data if it finds a match. I know this is ugly code, but I'm hoping to replace it when I can find a solution to my current problem.
def compare_inventories(old_inventory_df,inventory_df):
aws_rowcount = len(inventory_df)
now = parser.parse(datetime.utcnow().isoformat()).replace(tzinfo=timezone.utc).astimezone(tz=None)
for a_index,a_row in inventory_df.iterrows():
if a_row["Comments"] != "none":
for o_index,o_row in old_inventory_df.iterrows():
last_checkin = parser.parse(str(o_row["last_checkin"])).replace(tzinfo=timezone.utc).astimezone(tz=None)
if (a_row["Comments"] == o_row["asset_name"]) and ((now - timedelta(days=30)) <= last_checkin):
inventory_df.loc[a_index,["Found in OldInv","OldInv Address","OldInv Asset ID","Inv ID"]] = ["true",o_row["address"],o_row["asset_id"],o_row["host_id"]]
return inventory_df
Here's the stack trace for the error:
Traceback (most recent call last):
File "c:\Users\beefcake-quad\Code\INVENTORYAssetSnapshot\main.py", line 52, in main
reports_index = reports_df.loc[reports_df["Inventory ID"] == row["Inv ID"]].index
File "c:\Users\beefcake-quad\Code\INVENTORYAssetSnapshot\.venv\lib\site-packages\pandas\core\ops\common.py", line 70, in new_method
return method(self, other)
File "c:\Users\beefcake-quad\Code\INVENTORYAssetSnapshot\.venv\lib\site-packages\pandas\core\arraylike.py", line 40, in __eq__
return self._cmp_method(other, operator.eq)
File "c:\Users\beefcake-quad\Code\INVENTORYAssetSnapshot\.venv\lib\site-packages\pandas\core\series.py", line 5625, in _cmp_method
return self._construct_result(res_values, name=res_name)
File "c:\Users\beefcake-quad\Code\INVENTORYAssetSnapshot\.venv\lib\site-packages\pandas\core\series.py", line 3017, in _construct_result
out = self._constructor(result, index=self.index)
File "c:\Users\beefcake-quad\Code\INVENTORYAssetSnapshot\.venv\lib\site-packages\pandas\core\series.py", line 442, in __init__
com.require_length_match(data, index)
File "c:\Users\beefcake-quad\Code\INVENTORYAssetSnapshot\.venv\lib\site-packages\pandas\core\common.py", line 557, in require_length_match
raise ValueError(
ValueError: Length of values (1) does not match length of index (7150)

reports_index = reports_df.loc[report_data["Inventory ID"] == row["Inv ID"].index[0]
missing ] at end

Calling functions from a python file dynamically

Need some help in figuring out the name of what this is called in Python.
A finance library I use (called Quantopian) has a pretty cool api. When defining a new financial algorithm, you would create a new file and simply define functions based off of various keywords. This python file then somehow gets passed to an executor which is then calling these functions.
Is there a name for this concept? Basically you seem to have some python code that gets passed a python file, and is able to call functions in this file (if they exist).
Here is an example of what the code would look like:
from zipline.api import order_target, record, symbol
def initialize(context):
context.i = 0
context.asset = symbol('AAPL')
def handle_data(context, data):
# Skip first 300 days to get full windows
context.i += 1
if context.i < 300:
return
# Compute averages
# data.history() has to be called with the same params
# from above and returns a pandas dataframe.
short_mavg = data.history(context.asset, 'price', bar_count=100, frequency="1d").mean()
long_mavg = data.history(context.asset, 'price', bar_count=300, frequency="1d").mean()
# Trading logic
if short_mavg > long_mavg:
# order_target orders as many shares as needed to
# achieve the desired number of shares.
order_target(context.asset, 100)
elif short_mavg < long_mavg:
order_target(context.asset, 0)
# Save values for later inspection
record(AAPL=data.current(context.asset, 'price'),
short_mavg=short_mavg,
long_mavg=long_mavg)

How to print each loop result to a single file?

I am running a model evaluation protocol for Modeller. It evaluates every model and writes its result to a separate file. However I have to run it for every model and write to a single file.
This is the original code:
from modeller import *
from modeller.scripts import complete_pdb
log.verbose() # request verbose output
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib') # read topology
env.libs.parameters.read(file='$(LIB)/par.lib') # read parameters
# read model file
mdl = complete_pdb(env, 'TvLDH.B99990001.pdb')
# Assess all atoms with DOPE:
s = selection(mdl)
s.assess_dope(output='ENERGY_PROFILE NO_REPORT', file='TvLDH.profile',
normalize_profile=True, smoothing_window=15)
I added a loop to evaluate every model in a single run, however I am creating several files (one for each model) and I want is to print all evaluations in a single file
from modeller import *
from modeller.scripts import complete_pdb
log.verbose() # request verbose output
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib') # read topology
env.libs.parameters.read(file='$(LIB)/par.lib') # read parameters
#My loop starts here
for i in range (1,1001):
number=str(i)
if i<10:
name='000'+number
else:
if i<100:
name='00'+number
else:
if i<1000:
name='0'+number
else:
name='1000'
# read model file
mdl = complete_pdb(env, 'TcP5CDH.B9999'+name+'.pdb')
# Assess all atoms with DOPE: this is the assesment that i want to print in the same file
s = selection(mdl)
savename='TcP5CDH.B9999'+name+'.profile'
s.assess_dope(output='ENERGY_PROFILE NO_REPORT',
file=savename,
normalize_profile=True, smoothing_window=15)
As I am new to programming, any help will be very helpful!

Welcome :-) Looks like you're very close. Let's introduce you to using a python function and the .format() statement.
Your original has a comment line # read model file, which looks like it could be a function, so let's try that. It could look something like this.
from modeller import *
from modeller.scripts import complete_pdb
log.verbose() # request verbose output
# I'm assuming this can be done just once
# and re-used for all your model files...
# (if not, the env stuff should go inside the
# read_model_file() function.
env = environ()
env.libs.topology.read(file='$(LIB)/top_heav.lib') # read topology
env.libs.parameters.read(file='$(LIB)/par.lib') # read parameters
def read_model_file(file_name):
print('--- read_model_file(file_name='+file_name+') ---')
mdl = complete_pdb(env, file_name)
# Assess all atoms with DOPE:
s = selection(mdl)
output_file = file_name+'.profile'
s.assess_dope(
output='ENERGY_PROFILE NO_REPORT',
file=output_file,
normalize_profile=True,
smoothing_window=15)
for i in range(1,1001):
file_name = 'TcP5CDH.B9999{:04d}pdb'.format(i)
read_model_file(file_name)
Using .format() we can get ride of the multiple if-statement checks for 10? 100? 1000?
Basically .format() replaces {} curly braces with the argument(s).
It can be pretty complex but you don't need to digetst all of it.
Example:
'Hello {}!'.format('world') yields Hello world!. The {:04d} stuff uses formatting, basically that says "Please make a 4-character wide digit-substring and zero-fill it, so you should get '0001', ..., '0999', '1000'.
Just {:4d} (no leading zero) would give you space padded results (e.g. ' 1', ..., ' 999', '1000'.
Here's a little more on the zero-fill: Display number with leading zeros

nhlscrapi - download Data error

I am trying to get the statistics and game information of every game of an NHL Season. I am working with Stata. I found the package nhlscrapi and have written code to get all the data and statistics of a particular season:
# Import statements
# Notice how I import the whole modules and not the functions explicitly as given in the online example (good practice)
from nhlscrapi.games import game, cumstats
from nhlscrapi import constants
import csv
# Define season being considered:
season = 2012
# Get all stats they have defined
# Googled "get all methods of a class python" and found this:
# http://stackoverflow.com/questions/34439/finding-what-methods-an-object-has
# Also, needed to excclude some methods (ABCMeta, ...) after I checked what they do
# (I did that with: "help(cumstats.METHODNAME)") and saw that they did not contain stats
methods = [method for method in dir(cumstats) if callable(getattr(cumstats, method)) and
method != 'ABCMeta' and
method != 'AccumulateStats' and
method != 'ShotEventTallyBase' and
method != 'abstractmethod' and
method != 'TeamIncrementor' and
method != 'EF' and
method != 'St']
# Set up dictionary with all stats
cum_stats = {method: getattr(cumstats, method)() for method in methods}
print('All the stats:', cum_stats.keys())
# Now, look up how many games were in the regular season of the year 2012
maxgames = constants.GAME_CT_DICT[season]
# If one is interested in all the home coaches (as an example), one would first set up an empty list,
# and gradually fill it:
thingswewant_keys = ['home_coach', 'away_coach', 'home', 'away', 'attendance', 'Score', 'Fenwick']
thingswewant_values = {key: [] for key in thingswewant_keys if not key in cum_stats.keys()}
thingswewant_values.update({key+'_home': [] for key in cum_stats.keys()})
thingswewant_values.update({key+'_away': [] for key in cum_stats.keys()})
# Now, loop over all games in this season
for i in range(**12**):
# Set up object which queries database
# If one enters the following command in ipython: "help(game.Game)", one sees also alternative ways to set up
# query other than the one given in the example
ggames = game.Game(game.GameKey(season, game.GameType.Regular, i+1), cum_stats=cum_stats)
# This object 'ggames' now contains all the information of 1 specific game.
# To concatenate all the home coaches for example, one would do it like this
for key in thingswewant_keys:
if not key in cum_stats.keys():
# First case: Information is attribute of ggames (e.g. home_coach)
if not key in ['home', 'away', 'attendance']:
thingswewant_values[key] += [getattr(ggames, key)]
# Second case: Information is key of ggames.matchup (e.g. home)
if key in ['home', 'away', 'attendance']:
thingswewant_values[key] += [ggames.matchup[key]]
# Third case: Information is a cum_stat
# Figure out home_team and away team
hometeam = ggames.matchup['home']
awayteam = ggames.matchup['away']
for key in cum_stats.keys():
thingswewant_values[key+'_home'] += [ggames.cum_stats[key].total[hometeam]]
thingswewant_values[key+'_away'] += [ggames.cum_stats[key].total[awayteam]]
# Make one single table out of all the columns
results = [tuple([key for key in thingswewant_values.keys()])]
results += zip(*[thingswewant_values[key] for key in thingswewant_values.keys()])
# Write to csv
with open('brrr.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(results)
The problem now is that in every season, after a certain game, the code stops and spits out following error:
Traceback (most recent call last):
File "C:/Users/Dennis/Downloads/AllStatsExcell.py", line 67, in <module>
thingswewant_values[key+'_home'] += [ggames.cum_stats[key].total[hometeam]]
File "C:\Python27\lib\site-packages\nhlscrapi\games\game.py", line 211, in cum_stats
return self.play_by_play.compute_stats()
File "C:\Python27\lib\site-packages\nhlscrapi\games\playbyplay.py", line 95, in compute_stats
for play in self._rep_reader.parse_plays_stream():
File "C:\Python27\lib\site-packages\nhlscrapi\scrapr\rtss.py", line 56, in parse_plays_stream
p_obj = parser.build_play(p)
File "C:\Python27\lib\site-packages\nhlscrapi\scrapr\rtss.py", line 130, in build_play
p['vis_on_ice'] = self.__skaters(skater_tab[0][0]) if len(skater_tab) else { }
File "C:\Python27\lib\site-packages\nhlscrapi\scrapr\rtss.py", line 159, in __skaters
if pl[0].text.isdigit():
AttributeError: 'NoneType' object has no attribute 'isdigit'
In the 2012 season, this occurs after game 12. Therefore I just run for the game 12 in season 2012.
ggames1=game.Game(game.GameKey(2012, game.GameType.Regular, 12),cum_stats=cum_stats
ggames1.cum_stats['ShootOut'].total
In ShootOut, for example, it crashes. But if I run this line again I get the results.
I don't know how to fix this.
If I just could get the csv file of all the games, even if there are some missing values I would be very happy.

First, you need to do some debugging yourself. The error explicitly states:
File "C:/Users/Dennis/Downloads/AllStatsExcell.py", line 67, in
thingswewant_values[key+'_home'] += [ggames.cum_stats[key].total[hometeam]]
That means on line 67 of your program there is an error. At the bottom it shows you what that error is:
AttributeError: 'NoneType' object has no attribute 'isdigit'
This means that you are attempting to get the attribute isdigit on the value of an object that is NoneType. As you might surmise, NoneType objects don't have any contents.
This is the offending line, along with the preceding for block:
for key in cum_stats.keys():
thingswewant_values[key+'_home'] += [ggames.cum_stats[key].total[hometeam]]
What you want to do is probably the following:
for key in cum_stats.keys():
try:
thingswewant_values[key+'_home'] += [ggames.cum_stats[key].total[hometeam]]
except Exception as e:
print(e)
print("key={}".format(key)
print("hometeam={}".format(hometeam)
print("ggames.cumstats={}".format(s[key].total[hometeam])
This is a basic error catching block. The first print line should tell you the exception. The following ones inform you as to the state of various things you're utilizing in the offending line. Your job is to figure out which thing is NoneType (it may not be one of the ones I provided) and then, after that, figure out why it is NoneType. Essentially: look at the data you have and are trying to manipulate in that block. Something is missing in it.

Writing multiple lists to the same .dat file simultaniously in Python 2.7?

This is my translation from pseudocode. The code suggests that multiple lists or arrays can be written to a .dat file. I'm trying to stay as true to the pseudocode format as I can so that I don't get lost in the Debugging process. For sake of space I have left out the actual Pseudocode. I know how to write everything into a .txt or .csv as string but is it possible to write them into the same file allowing each object to keep it's original value?(Num=Num and str=str)
#THIS IS THE PREMISE OF THE PSEUDOCODE
#Cooper College maintains a master file of students and credits
#earned. Each semester the master is updated with a transaction
#file that contains credits earned during the semester.
#Each file is sorted in Student ID number order.
THIS IS WHAT I'M ATTEMPTING TO TRANSLATE INTO PYTHON
#getReady()
# open master "studentFile.dat"
# open trans "semesterCredits.dat"
# open newMaster "updatedStudentFile.dat"
# readMaster()
# readTrans()
# checkBoth()
#return
#
#readMaster()
# input masterID, masterName, masterCredits from master
# if eof then
# masterID = HIGH_VALUE
# endif
#return
THIS IS WHAT I WAS TRYING TO USE TO CREATE A STARTER FILE
#EACH LIST HAS A RANGE OF [0, 4]
masterID =[100000,100001,100002,100003]
masterName =['Bevis, Ted','Finch, Harold','Einstein, Al','Queen, Liz']
masterCredits = [56,15,112,37]
master = open('studentFile.dat','wb')
master.write(masterID,masterName,masterCredits)
print master.readlines()
#THIS IS MY TRACEBACK ERROR
#Traceback (most recent call last):
# File "C:/Users/School/Desktop/Find the Bugs Ch7/debug07-03start.py",
#line 6, in <module>
#master.write(masterID,masterName,masterCredits)
#TypeError: function takes exactly 1 argument (3 given)
EXPECTED OUTPUT
print master.readlines()
[[100001,'Bevis, Ted',56],[100002,'Finch, Harold',15],[100003,'Einstein, Al',112]...]

You can get your desire lists with zip :
>>> zip(masterID,masterName,masterCredits)
[(100000, 'Bevis, Ted', 56), (100001, 'Finch, Harold', 15), (100002, 'Einstein, Al', 112), (100003, 'Queen, Liz', 37)]
Then all you need is looping over your zipped list and write to file :
with open('studentFile.dat','wb') as master:
for i,j,k,z in zip(masterID,masterName,masterCredits) :
master.write(','.join((str(i),j,k,str(z))))

write takes a string, and you are passing a bunch of object. These need to be converted to string before printing. One way of doing this is
master.write(','.join([masterID,masterName,masterCredits]))
You make a list out of the items, join them with a comma and write to file.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3 Multidimentional List populated by existing MySQL table - python

Related

Pandas dataframe not returning the index using the loc method

Calling functions from a python file dynamically

How to print each loop result to a single file?

nhlscrapi - download Data error

Writing multiple lists to the same .dat file simultaniously in Python 2.7?

Categories

Resources