How to insert multiple values into specific treeview columns? - python

I have a database returning the total of several column and I am trying to display it in a treeview. If I do
for i in backend2.calc_total()[0]:
treeviewtotal.insert("", END, values=i)
I get
which is not what i want as i want everything to start from "food" column onwards. I cant make date a iid as i already have an iid that I am referencing to my database.
If I do
list2 = ['Date', 'Food', 'Transport', 'Insurance', 'Installments', 'Others']
for i in range(len(backend2.calc_total()[0][0])):
treeviewtotal.insert("", 0, list2[i+1], values=backend2.calc_total()[0][0][i])
I get this
instead, all the totals get stacked into 1 column (which is scrollable).
Any way to achieve my aim of allocating the respective totals to the respective column in a same row? Thanks!

With reference to the first attempt, the following solves the problem:
for i in backend2.calc_total()[0]:
treeviewtotal.insert("", END, values=([], *i))
values= takes in a list. Therefore we add an empty space by using [], but since i itself is already a list, we need to "flatten out" the list by doing *i.
Please correct me if I used any parts of the code wrongly. Still trying to learn =)

Related

Printing and counting unique values from an .xlsx file

I'm fairly new to Python and still learning the ropes, so I need help with a step by step program without using any functions. I understand how to count through an unknown column range and output the quantity. However, for this program, I'm trying to loop through a column, picking out unique numbers and counting its frequency.
So I have an excel file with random numbers down column A. I only put in 20 numbers but let's pretend the range is unknown. How would I go about extracting the unique numbers and inputting them into a separate column along with how many times they appeared in the list?
I'm not really sure how to go about this. :/
unique = 1
while xw.Range((unique,1)).value != None:
frequency = 0
if unique != unique: break
quantity += 1
"end"
I presume as you can't use functions this may be homework...so, high level:
You could first go through the column and then put all the values in a list?
Secondly take the first value from the list and go through the rest of the list - is it in there? If so then it is not unique. Now remove the value where you have found the duplicate from the list. Keep going if you find another remove that too.
Take the second value and so on?
You would just need list comprehension, some loops and perhaps .pop()
Using pandas library would be the easiest way to do. I created a sample excel sheet having only one column called "Random_num"
import pandas
data = pandas.read_excel("sample.xlsx", sheet_name = "Sheet1")
print(data.head()) # This would give you a sneak peek of your data
print(data['Random_num'].value_counts()) # This would solve the problem you asked for
# Make sure to pass your column name within the quotation marks
#eg: data['your_column'].value_counts()
Thanks

Best way to fuzzy match values in a data frame and then replace the value?

I'm working with a dataframe containing various datapoints of customer data. I'm looking to essentially replace any junk phone numbers as a blank value, right now I'm struggling to find an efficient way to find potential junk values such as a phone number like 111-111-1111 and replace that specific value with a blank entry.
I currently have a fairly ugly solution where I'm going through 3 fields; home phone, cell phone and work phone, locating the index values of the rows in question and respective column and then am replacing those,
with regards to actually finding junk values in a dataframe, is there a better approach to this than what I am currently doing?
row_index = dataset[dataset['phone'].str.contains('11111')].index
column_index = dataset.columns.get_loc('phone')
Afterwards, I would zip these up and cycle through a for loop, using dataset.iat[row_index, column_index] = ''. The row and column index variables would also have the junk values in the 'cellphone' and 'workphone' columns appended on as well.
Pandas 'where' function tends to be quick:
dataset['phone'] = dataset['phone'].where(~dataset['phone'].str.contains('11111'),
None)

Remove rows from dataframe whose text does not contain items from a list

I am importing data from a table with inconsistent naming conventions. I have created a list of manufacturer names that I would like to use as a basis of comparison against the imported name. Ideally, I will delete all rows from the dataframe that do not align with the manufacturer list. I am trying to create an index vector using a for loop to iterate through each element of the dataframe column and compare against the list. If the text is there, update my index vector to true. If not, index vector is updated to false. Finally, I want to use the index vector to drop rows from the original data frame.
I have tried generators and sets, but to no avail. I thought a for loop would be less elegant but ultimately work, yet I'm still stuck. My code is below.
meltdat.Products is my dataframe column that contains the imported data
mfgs is my list of manufacturer names
prodex is my index vector
meltdat = pd.DataFrame(
{"Location":["S1","S1","S1","S1","S1"],
"Date":["1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020", "1/1/2020"],
"Products":['CC304RED','COHoney','EtainXL','Med467','MarysTop'],
"Sold":[1,3,0,1,2]})
mfgs = ['CC', 'Etain', 'Marys']
for prods in meltdat.Products:
if any(mfg in meltdat.Products[prods] for mfg in mfgs):
prodex[prods] = TRUE
else:
prodex[prods] = FALSE
I added example data in the dataframe that mirrors my imported data.
you can use pd.DataFrame.apply:
meltdat[meltdat.Products.apply(lambda x: any(m in x for m in mfgs))]

Pandas: How to check if any of a list in a dataframe column is present in a range in another dataframe?

I'm trying to compare two bioinformatic DataFrames (one with transcription start and end genomic locations, and one with expression data). I need to check if any of a list of locations in one DataFrame is present within ranges defined by the start and end locations in the other DataFrame, returning rows/ids where they match.
I have tried a number of built-in methods (.isin, .where, .query,), but usually get stuck because the lists are nonhashable. I've also tried a nested for loop with iterrows and itertuples, which is exceedingly slow (my actual datasets are thousands of entries).
tss_df = pd.DataFrame(data={'id':['gene1','gene2'],
'locs':[[21,23],[34,39]]})
exp_df = pd.DataFrame(data={'gene':['geneA','geneB'],
'start': [15,31], 'end': [25,42]})
I'm looking to find that the row with id 'gene1' in tss_df has locations (locs) that match 'geneA' in exp_df.
The output would be something like:
output = pd.DataFrame(data={'id':['gene1','gene2'],
'locs': [[21,23],[34,39]],
'match': ['geneA','geneB']})
Edit: Based on a comment below, I tried playing with merge_asof:
pd.merge_asof(tss_df,exp_df,left_on='locs',right_on='start')
This gave me an incompatible merge keys error, I suspect because I'm comparing a list to integer; so I split out the first value in locs:
tss_df['loc1'] = tss_df['locs'][0]
pd.merge_asof(tss_df,exp_df,left_on='loc1',right_on='start')
This appears to have worked for my test data, but I'll need to try it with my actual data!
Based on a comment below, I tried playing with merge_asof:
pd.merge_asof(tss_df,exp_df,left_on='locs',right_on='start')
This gave me an incompatible merge keys error, I suspect because I'm comparing a list to integer; so I split out the first value in locs:
tss_df['loc1'] = tss_df['locs'][0]
pd.merge_asof(tss_df,exp_df,left_on='loc1',right_on='start')
This appears to have worked for my test data!

Subset one dataframe from a second

I am sure I am missing a simple solution but I have been unable to figure this out, and have yet to find the answer in the existing questions. (If it is not obvious, I am a hack and just learning Python)
Lets say I have two data frames (DataFileDF, SelectedCellsRaw) with the same two key fields (MRBTS, LNCEL) and I want a subset of the first data frame (DataFileDF) containing only the corresponding key pairs in the second data frame.
e.g. rows of DataFileDF with Keys that correspond to the keys of Selected CellsRaw.
Note this needs to match by key pair MRBTS + LNCEL not each key individually.
I tried:
SelectedCellsRaw = DataFileDF.loc[DataFileDF['MRBTS'].isin(SelectedCells['MRBTS']) & DataFileDF['LNCEL'].isin(SelectedCells['LNCEL'])]
I get the MRBTS's, but also every occurrence of LNCEL (it has a possible range of 0-9 so there are many duplicates throughout the data set).
One way you could do is to use isin with indexes:
joincols = ['MRBTS','LNCEL']
DataFileDF[DataFileDF.set_index(joincols).index.isin(SelectedCellsRaw.set_index(joincols).index)]

Categories