Order of column names with keys() in SQLAlchemy - python

Can I rely on keys() to always return column names in the same order as the query results when selecting all columns? Based on my output it appears to be true, but I can't find any documentation that guarantees it. I'm inclined to think it is true because while dictionaries may be unordered, they should be consistent.
# Loop through each table in the database where all tables
# are being reflected at once.
for table in Base.metadata.tables.values():
# Select all columns
rows = Session.query(table).all()
# Prepend a header row
rows.insert(0, rows[0].keys())
# Output to file
fh = open(filename, 'wb')
outcsv = unicodecsv.writer(fh)
outcsv.writerows(rows)
fh.close
Similarly, column_descriptions also appears to return names in the same order as the values, but again I am not certain if it will always be true.
# Prepend list of column names as the first row
rows.insert(0, [col['name'] for col in Session.query(table).column_descriptions])
Any help will be much appreciated. Thanks!

The rows returned are KeyedTuples; the ordering of them in 1 query is dictated by the order of the columns in the original select, which will absolutely guarantee that the order is the same as returned by .keys(), and the same for each item of in the same query results.

Related

Lookup Each item from a list to items from List2. If there's a match return such value, if not delete the entire row

I have two lists that were created from columns from two different dataframes. The two dataframes have the following structure:
In [73][dev]: cw.shape
Out[73]: (4666, 13)
In [74][dev]: ml.shape
Out[74]: (815, 5)
and the two lists are identifier objects intended to match data from one dataframe with another. My intention is conceptually equivalent to a vlookup in Excel, which is to look up whether an item from list ID is in list ID2, and if so, returns the appropriate 'class1' value from the second list into this new "Class" that I've created. If the "vlookup" (pardon my Excel reference here but hopefully you catch my drift) doesn't find the relevant value, the drop all rows.
import pandas as pd
cw = pd.read_excel("abc.xlsx")
ml = pd.read_excel("xyz.xlsx")
ID = cw['Identifier']
cw["Class"] = ""
asc = cw["Class"]
ID2 = ml['num']
bac = ml['class1']
for item in ID:
if item in ID2:
asc[item] = bac[item]
else:
cw.drop(cw.index, inplace = True)
Unfortunately the pasted script drops all rows in cw, rendering it a blank dataframe. Not what I intended. Again, what I'm targeting for here is to remove rows that don't get a match between two ID identifiers, and return class1 values for those rows with matching IDs into this new Class column that I've just created.
In [76][dev]: cw.shape
Out[76]: (0, 13)
I hope I've made this clear. I suspect I didn't setup the if statement correctly but not sure. Thank you very much for helping a beginner here.
I found a simpler and more straight forward solution by using pandas merge.
# Merge with master list
cw_ac = pd.merge(cw, ml, on='cusip', how='inner')
This acts like an inner join in SQL based on the identifier and remove non-matching IDs.

Add another column to existing list

I'm starting to learn python and I'm trying to do an exercise where I have to save in a "rows" variable some stock data coming from a SQL query, like this:
rows = db.execute("SELECT * FROM user_quote WHERE user_id=:userid", userid=session["user_id"])
This will return 4 columns (id, user_id, symbol, name)
Then, for every row the query returns I'll get the last known price of that stock from an API, and I want to add that information to another column in my rows variable. Is there a way to do this? Should I use another approach?
Thanks for your time!
I'm not sure what type the rows variable is, but you can just add an additional column in the SELECT:
rows = db.execute("SELECT *, 0 NewCol FROM user_quote WHERE user_id=:userid", userid=session["user_id"])
Assuming rows is mutable, this will provide a placeholder for the new value.
Convert the rows tuple to a list, then you can use append() to add the price.
rows = list(rows)
rows.append(price)

Pandas - merge/join/vlookup df and delete all rows that get a match

I am trying to reference a list of expired orders from one spreadsheet(df name = data2), and vlookup them on the new orders spreadsheet (df name = data) to delete all the rows that contain expired orders. Then return a new spreadsheet(df name = results).
I am having trouble trying to mimic what I do in excel vloookup/sort/delete in pandas. Please view psuedo code/steps as code:
Import simple.xls as dataframe called 'data'
Import wo.xlsm, sheet
name "T" as dataframe called 'data2'
Do a vlookup , using Column
"A" in the "data" to be used to as the values to be
matched with any of the same values in Column "A" of "data2" (there both just Order Id's)
For all values that exist inside Column A in 'data2'
and also exist in Column "A" of the 'data',group ( if necessary) and delete the
entire row(there is 26 columns) for each matched Order ID found in Column A of both datasets. To reiterate, deleting the entire row for the matches found in the 'data' file. Save the smaller dataset as results.
import pandas as pd
data = pd.read_excel("ors_simple.xlsx", encoding = "ISO-8859-1",
dtype=object)
data2 = pd.read_excel("wos.xlsm", sheet_name = "T")
results = data.merge(data2,on='Work_Order')
writer = pd.ExcelWriter('vlookuped.xlsx', engine='xlsxwriter')
results.to_excel(writer, sheet_name='Sheet1')
writer.save()
I re-read your question and think I undertand it correctly. You want to find out if any order in new_orders (you call it data) have expired using expired_orders (you call it data2).
If you rephrase your question what you want to do is: 1) find out if a value in a column in a DataFrame is in a column in another DataFrame and then 2) drop the rows where the value exists in both.
Using pd.merge is one way to do this. But since you want to use expired_orders to filter new_orders, pd.merge seems a bit overkill.
Pandas actually has a method for doing this sort of thing and it's called isin() so let's use that! This method allows you to check if a value in one column exists in another column.
df_1['column_name'].isin(df_2['column_name'])
isin() returns a Series of True/False values that you can apply to filter your DataFrame by using it as a mask: df[bool_mask].
So how do you use this in your situation?
is_expired = new_orders['order_column'].isin(expired_orders['order_column'])
results = new_orders[~is_expired].copy() # Use copy to avoid SettingWithCopyError.
~is equal to not - so ~is_expired means that the order wasn't expired.

Updating excel rows with data in the form of python dict_items

I have a list of dictionaries which have value which needs to be updated to an excel sheet with corresponding column headers ,
new=[{"slno":"1","region":"2","customer":"3"}]
I am not sure about data types in python as I am a beginner,
All I want to do is update an excel sheet with the data from the above dict using a for loop. I always end up with a unordered data,
In the excel file there are column headers with the name exactly that of the Key of the dict so I was hoping to insert the respecting value in the excel column.
Note: I was able to write it to excel using a for loop but dict was giving random numbers so the values were messed up when updated on sheet.
xfile = openpyxl.load_workbook('D:\\LoginLibrary\\test.xlsx')
sheet = xfile.get_sheet_by_name('OE')
charcounter="A"
i=i
for key in g:
sheet[charcounter+str(i)]=key
charcounter = (chr(ord(charcounter[0]) + 1))
xfile.save('D:\\LoginLibrary\\test.xlsx')
One of the difficulties of dictionaries is that when you iterate over it as a loop, the keys can be in any order. However, something you can do is get the whole list of keys, then sort that list. For example:
xfile = openpyxl.load_workbook('D:\\LoginLibrary\\test.xlsx')
sheet = xfile.get_sheet_by_name('OE')
charcounter="A"
i=i
new = {"slno":"1","region":"2","customer":"3"} #The outer brackets made it a list, unneeded
print(sorted(new.keys())) #Prints out all the keys in alphabetical order
list_of_sorted_keys = sorted(new.keys())
for key in list_of_sorted_keys:
sheet[charcounter+str(i)]=key
charcounter = (chr(ord(charcounter[0]) + 1))
xfile.save('D:\\LoginLibrary\\test.xlsx')
Note: I don't know much about writing to excel, so I'm assuming that you have that part right. My additions just modify the dictionary so that it is organized.
If alphabetical order for the keys doesn't do your job, you can order by the values as well, although it's more difficult to get your keys from values because dictionaries aren't supposed to work that way.
Another way could be to just make the original data set as a list of tuples, as so:
new=[("slno","1"),("region","2"),("customer","3")]
That will keep all your data in order that you put it in the list, because lists are accessed by integer indices.
I hope one of these ideas meets your needs!

Sqlite query to python dictionary

I have searched the docs and SO and could not find anything to resolve my issue. I am trying to call a select from my sqlite database and add it to a dictionary with the columns as keys. When I do this it returns a row for each column/key. It is has 14 columns and if I only have 4 rows it repeats for each one. This was the first attempt
columns = [desc[0] for desc in cursor.description]
results = []
for row in r:
Summary = {}
items = zip(columns, row)
for (items, values) in items:
Summary[items] = row
results.append(Summary)
Then I also tried the row_factory as given in the docs. That didn't work. My end goal is to be able to print out to a text file verticly by using
for x in results:
print x[name]
print x[email]
etc
Any help is appreciated
You are creating your dictionary incorrectly. Use:
for row in r:
summary = dict(zip(columns, row))
results.append(summary)
instead.
Your code sets the whole row sequence as the value for each key in Summary, instead of the individual column value, then appending that same dictionary to the results list for each column key..

Categories