How to access particular column name in XlsxWriter through dictionary key? - python

I want to excel file like in above image, with more than 100 columns,
row_header = ['Student_id', 'Student_name', 'Collage_name', 'From', 'To',
.........upto 100 column name]
My Simple Question is how write flexible code, So if order of name change then, I don't have to write code again.
For first row which contains the only column name only I write following code. This will create the first row.
for item in row_header:
worksheet.write(row, col, item)
worksheet.set_row(row, col)
col += 1
Now problem I that, I am getting the list of dictionary, each dictionary contains the one student details, means each dictionary contains 100 key value.
student_list=
[{collage_name: IIIT-A, Student_name:'Rakesh','Student_id':1,.....up to 100 key},
{collage_name: IIIT-G, Student_name: 'Shyam', 'Student_id':2, ........ up to 100 key}]
As you can see that key order is not matching with the column's name order. If I try to write like above it will take 100 line of code. So I am looking for solution where we can assign the cell value according to column value. How to use xlsxWrite through key -value dictionary, so that require to write less number of code..

Related

Populate new column in dataframe based on dictionary key matches values in another column and some more conditions

I have a data frame like
I have a dictionary with the ec2 instance details
Now, I want to add a new column 'Instance Name' and populate it based on a condition that the instance ID in the dictionary is in the column 'ResourceId' and further, depending on what is there in the Name field in dictionary for that instance Id, I want to populate the new column value for each matching entry
Finally I want to create separate data frames for my specific use-cases e.g. to get only Box-Usage results. Something like this
box_usage = df[df['lineItem/UsageType'].str.contains('BoxUsage')]
print(box_usage.groupby('Instance Name')['lineItem/BlendedCost'].sum())
The new column value is not coming up against the respective Resource Id as I desire. It is rather coming up sequentially.
I have tried bunch of things including what I mentioned in above code, but no result yet. Any help?
After struggling through several options, I used the .apply() way and it did the trick
df.insert(loc=17, column='Instance_Name', value='Other')
instance_id = []
def update_col(x):
for key, val in ec2info.items():
if x == key:
if ('MyAgg' in val['Name']) | ('MyAgg-AutoScalingGroup' in val['Name']):
return 'SharkAggregator'
if ('MyColl AS Group' in val['Name']) | ('MyCollector-AutoScalingGroup' in val['Name']):
return 'SharkCollector'
if ('MyMetric AS Group' in val['Name']) | ('MyMetric-AutoScalingGroup' in val['Name']):
return 'Metric'
df['Instance_Name'] = df.ResourceId.apply(update_col)
df.Instance_Name.fillna(value='Other', inplace=True)

how to divide pandas dataframe into different dataframes based on unique values from one column and itterate over that?

I have a dataframe with three columns
The first column has 3 unique values I used the below code to create unique dataframes, However I am unable to iterate over that dataframe and not sure how to use that to iterate.
df = pd.read_excel("input.xlsx")
unique_groups = list(df.iloc[:,0].unique()) ### lets assume Unique values are 0,1,2
mtlist = []
for index, value in enumerate(unique_groups):
globals()['df%s' % index] = df[df.iloc[:,0] == value]
mtlist.append('df%s' % index)
print(mtlist)
O/P
['df0', 'df1', 'df2']
for example lets say I want to find out the length of the first unique dataframe
if I manually type the name of the DF I get the correct output
len(df0)
O/P
35
But I am trying to automate the code so technically I want to find the length and itterate over that dataframe normally as i would by typing the name.
What I'm looking for is
if I try the below code
len('df%s' % 0)
I want to get the actual length of the dataframe instead of the length of the string.
Could someone please guide me how to do this?
I have also tried to create a Dictionary using the below code but I cant figure out how to iterate over the dictionary when the DF columns are more than two, where key would be the unique group and the value containes the two columns in same line.
df = pd.read_excel("input.xlsx")
unique_groups = list(df["Assignment Group"].unique())
length_of_unique_groups = len(unique_groups)
mtlist = []
df_dict = {name: df.loc[df['Assignment Group'] == name] for name in unique_groups}
Can someone please provide a better solution?
UPDATE
SAMPLE DATA
Assignment_group Description Document
Group A Text to be updated on the ticket 1 doc1.pdf
Group B Text to be updated on the ticket 2 doc2.pdf
Group A Text to be updated on the ticket 3 doc3.pdf
Group B Text to be updated on the ticket 4 doc4.pdf
Group A Text to be updated on the ticket 5 doc5.pdf
Group B Text to be updated on the ticket 6 doc6.pdf
Group C Text to be updated on the ticket 7 doc7.pdf
Group C Text to be updated on the ticket 8 doc8.pdf
Lets assume there are 100 rows of data
I'm trying to automate ServiceNow ticket creation with the above data.
So my end goal is GROUP A tickets should go to one group, however for each description an unique task has to be created, but we can club 10 task once and submit as one request so if I divide the df's into different df based on the Assignment_group it would be easier to iterate over(thats the only idea which i could think of)
For example lets say we have REQUEST001
within that request it will have multiple sub tasks such as STASK001,STASK002 ... STASK010.
hope this helps
Your problem is easily solved by groupby: one of the most useful tools in pandas. :
length_of_unique_groups = df.groupby('Assignment Group').size()
You can do all kind of operations (sum, count, std, etc) on your remaining columns, like getting the mean value of price for each group if that was a column.
I think you want to try something like len(eval('df%s' % 0))

How can I extract a substring between two characters for every row of a column in a CSV file and copy those values into a new column in Python?

I have a column with unique ID numbers, called "UnitID", that is organised in a way such as this:
ABC2_DEFGH12-01_X1_Y1
The segment of DEFGH12-01 hypothetically refers to the ID of the specific batch of units. I need to make a new column that specifies this batch, and therefore, want to extract the "DEFGH12-01" values (like extracting the value between the first and second "_", but I haven't been able to figure out how), into a new column, called "BatchID".
I would want to just leave "UnitID" as is, and simply add the new "BatchID" column before it.
I've tried everything but I haven't really managed to do this.
Using str.split("_").str[1]
Ex:
df = pd.DataFrame({"UnitID": ["ABC2_DEFGH12-01_X1_Y1"]})
df["BatchID"] = df["UnitID"].str.split("_").str[1]
print(df)
Output:
UnitID BatchID
0 ABC2_DEFGH12-01_X1_Y1 DEFGH12-01
If you need Regex use str.extract(r"(?<=_)(.*?)(?=_)").
df["BatchID"] = df["UnitID"].str.extract(r"(?<=_)(.*?)(?=_)")

How to fit a Dictionary into a CSV file with all the values in the keys in a different column within the same row in python

I have a Dictionary and I wish to save it into a csv file. However, using the code below, the value (ei. All the data in the key are put in one cell all together). However, I required for each value to be in a different column within the same row of the key it belongs to.
csvPatientListFinal = csv.writer(open('PatientListFinal.csv','wb'))
for key, value in PatientList.items():
csvPatientListFinal.writerow([key, value])
I have tried however I have been unsuccessful in my efforts.
Thank you
Update 1:
Sorry about the typo, however the code worked (it was written in the script without typos), but it was not the desired output. The Dictionary created by my script has 14 keys, and 721 items inside each key. They are a database of case subjects. An example of the Dictionary is the following: I use the "|" so represent the lines between each cell, my mistake. The data in the dictionary look like this:
PatientList = {'code' : ['1','2','3','4','5',6], 'name' : ['aho','awd','faw','fas','gas','gdas','fasw'] , 'surnames' : ['awds','fhtt','hfr','hyk','uyr','rtyd'], 'ID' : ['123','345','654','234','645','354'], 'description' : ['a long text','a long text','a long text','a long text','a long text','a long text'] }.
The csv table format should be ("|" is the space between each cell in a row)
(headers)A|B|C
(row 1)1|A|caro
(row 2)2|B|al
But the input that I had was
(headers in one cell) A,B,C
(row 1 in one cell) 1 A caro
(row 2 in one cell) 2 B al
All under the same column, that correspond to the key they belong. This is an image representation of the desired output:
I am now trying the following code:
csvPatientListFinal = csv.writer(open('PatientListFinal.csv','wb'))
for header in PatienList.keys():
csvPatientListFinal.writerow(izip[header, value for value in PatientList[header]])
However, it indicates, an Error of Invalid syntax in the area of "value for value". I am trying to figure out why.
Is there a better way to achieve my desired output? Is there something wrong with the code (apart from the syntax error which I currently can't guess why it is there)?
Thank you for the help
You can do it like this. Iterate over the length of the value lists (assuming they're all the same length as the one in code) with nested iteration over the keys.
with open('PatientListFinal.csv', 'w') as fp:
csvPatientListFinal = csv.writer(fp)
csvPatientListFinal.writerow(PatientList.keys())
for i in range(len(PatientList['code'])):
csvPatientListFinal.writerow([j[i] for j in PatientList.values()])
You might want to consider using an OrderedDict if order is important to you. There's currently no guarantee that the columns will be stored in the same order as they was entered.

Exclude given columns when sorting a table python

I am trying to sort a table but would like to exclude given columns by their names while sorting. In other words, the given columns should remain where they were before sorting. This is aimed at dealing with columns like "Don't know', "NA" etc.
The API I'm using is unique and company specific but it uses python.
A table in this API is an object which is a list of rows, where each row is a list of cells and each cell is a list of cell values.
I am currently have a working function which sorts a table but I would like to edit/modify this to exclude a given column by it's name but I am struggling to find a way.
FYI - "Matrix" can be thought of as the table itself.
def SortColumns(byRow=0, usingCellValue=0, descending=True):
"""
:param byRow: Use the values in this row to determine the sort order of the
columns.
:param usingCellValue: When there are multiple values within a cell use this
to control which value row within each cell is used for sorting
(zero-based)
:param descending: Determines the order in which the values should be
sorted.
"""
for A in range(0,Matrix.Count):
for B in range(0,Matrix.Count):
if(A==B):
continue; #do not compare rows against eachother
valA = Matrix[byRow][A][usingCellValue].NumericValue if Matrix[byRow][A].Count > usingCellValue else None;
valB = Matrix[byRow][B][usingCellValue].NumericValue if Matrix[byRow][B].Count > usingCellValue else None;
if(descending):
if valB < valA:
Matrix.SwitchColumns(A,B)
else:
if valA < valB:
Matrix.SwitchColumns(A,B)
I am thinking of adding a new parameter which takes a list of column names, and use this to bypass these columns.
Something like:
def SortColumns(fixedcolumns, byRow=0,usingCellValue=0,descending=True):
While iterating through the columns, You can use the continue statement to skip over columns that you don't want to move. Put these conditions at the start of your two loops:
for A in range(0,Matrix.Count):
a_name = ??? #somehow get the name of column A
if a_name in fixedcolumns: continue
for B in range(0,Matrix.Count):
b_name = ??? #somehow get the name of column B
if b_name in fixedcolumns: continue
if(A==B):
continue

Categories