string Validation with xlsxwriter python - python

I am trying to put validations on excel file. I field should only accept characters or character with '_' symbol. The following code is only redistricting user to enter values more than length 10.
import xlsxwriter
wb = xlsxwriter.Workbook('staff.xlsx')
ws = workbook.add_worksheet()
ws.data_validation(1, 1, 10, 0,
{'validate': 'length',
'input_title': 'Enter value',
'criteria': '<',
'value': firstname_max_length,
'error_message': 'Max Length is{0}'.format(10)})
Help me to validate values that should only except characters.
Thanks.

The first step is to work out how to do the data validation in Excel and then transfer it to XlsxWriter.
This will probably require a "Custom" data validation like this:
import xlsxwriter
wb = xlsxwriter.Workbook('staff.xlsx')
ws = wb.add_worksheet()
ws.data_validation('A1:C3',
{'validate': 'custom',
'value': '=ISTEXT(A1)',
'input_title': 'Enter a non numeric value',
'error_message': 'Enter a string not a number'})
wb.close()
Output:
However, this doesn't do exactly what you are looking for. It is still possible to add non character data or even strings like 123h as shown in the screenshot. So you will need to figure out a formula that works in Excel and then apply it to XlsxWriter. I googled for an example but didn't find anything that worked.

Related

how to append rows into one row if the other rows have same certain name

orginal:
expected result:
Task:
I am trying to merge the 'urls column' into one row if there exist a same name in the other column ('full path') using python and jupyter notebook.
I have tried using groupby but it doesnt pass me the result i want.
Code:
df.groupby("Full Path").apply(lambda x: ", ".join(x)).reset_index()
not what i am expecting:
The reason it is not working is that you need to modify the column for the full path first before passing it to group by since there are differences in the full paths.
Based on the sample here the following should work:
df['Full Path'] = df['Full Path'].str.split('/').str[0:2].str.join('/')
test = df.groupby(by=['Full Path']).agg({'url': ', Next'.join})
test['url'] = test['url'].str.replace("Next","\n")
This code of course assumes that the grouping you want for the full path occurs in the first two items. The \n will disappear when you write the df out to Excel.
NOTE: Unless the Type and Date fields are all the same value, you cannot include them in the group by since for example, if you did groupby(['Full Path', 'Type', 'Date']) you would end up with not all the links being aggregated for an individual path+folder combination. If you wanted them to be included in a comma-separated next line column like the url, you would need to add that to the agg statement and use the replace for those as well.
Code used for testing:
import pandas as pd
pd.options.display.max_colwidth = 999
data_dict = {
'Full Path': [
'downloads/Residences Singapore',
'downloads/Residences Singapore/15234523524352',
'downloads/Residences Singapore/41242341324',
],
'Type': [
'Folder',
'File',
'File',
],
'Date': [
'07-05-22 19:24',
'07-05-22 19:24',
'07-05-22 19:24',
],
'url': [
'https://www.google.com/drive/storage/345243534534522345',
'https://www.google.com/drive/storage/523405923405672340567834589065',
'https://www.google.com/drive/storage/90658360945869012141234',
],
}
df = pd.DataFrame(data_dict)
df['Full Path'] = df['Full Path'].str.split('/').str[0:2].str.join('/')
test = df.groupby(by=['Full Path']).agg({'url': ', Next'.join})
test['url'] = test['url'].str.replace("Next","\n")
test
Output
Just groupby the FullPath and value as URL field, aggregate with comma separator. enter image description here

xlsxwriter conditional format formula with multiple criteria

I'm trying to apply conditional formatting to my excel file using xlsxwriter, but not sure how to code for two conditions. After perusing the documentation I only see examples for one.
Concretely I'm trying to say when the value (H13) is greater than (H5) AND > (H6) then color green.
Below is my attempt that does not work. I believe this is simply a syntax issue.
worksheet.conditional_format('H13:H13', {'type': 'formula',
'criteria': '=H13 >= $H$5 and H13 > $H$6 ',
'format': green_bg})
In all cases like this it is best to figure out the conditional format in Excel first and then transfer it across to xlsxwriter.
Excel doesn't allow joined/union conditional format conditions like your example. Instead you would need to use something like AND(). Like this:
import xlsxwriter
workbook = xlsxwriter.Workbook('conditional_format.xlsx')
worksheet = workbook.add_worksheet()
green_bg = workbook.add_format({'bg_color': '#C6EFCE',
'font_color': '#006100'})
worksheet.write('H5', 2)
worksheet.write('H6', 5)
worksheet.write('H13', 9)
worksheet.conditional_format('H13:H13', {'type': 'formula',
'criteria': '=AND($H$13 >= $H$5, $H$13 > $H$6)',
'format': green_bg})
workbook.close()
Output:
However, the logical statement here is a little suspect. It is actually just the same as $H$13 > $H$6. Maybe you meant to say ..., $H$13 < $H$6.

Issues with using a for loop with conditional formatting in XlsxWriter

Python 2.7:
I am trying to bold all the cells that contain a certain text in excel using XlsxWriter. I have stored the text in a list and used a for loop to iterate over the elements. I am not sure if I am using the correct syntax for specifying the value of the 'value' key in the conditional_format dictionary that XlsXwriter offers. The cells that contain the strings in my dictionary are not being converted into bold format.
header_format = new_wb.add_format({'bold': True})
header_list = ["Issue", "Type", "Status", "Resolution", "Summary", "Priority", "Fix Version", "Labels"]
for i in range(len(header_list)):
new_ws.conditional_format('A1:Z999', {'type': 'cell', 'criteria': 'equal to', 'value': '"header_list[i]"' , 'format': header_format})
You need to use the header strings as the values in conditional format, and they need to be double quoted (as required by Excel). You are trying to do that but your syntax is wrong. Here is a corrected version based on your example:
import xlsxwriter
new_wb = xlsxwriter.Workbook('test.xlsx')
new_ws = new_wb.add_worksheet()
header_format = new_wb.add_format({'bold': True})
header_list = ["Issue", "Type", "Status", "Resolution",
"Summary", "Priority", "Fix Version", "Labels"]
for value in header_list:
new_ws.conditional_format('A1:Z999', {'type': 'cell',
'criteria': 'equal to',
'value': '"%s"' % value,
'format': header_format})
# Write some strings to test against.
new_ws.write_column('A1', ['Foo', 'Type', 'Bar', 'Status'])
new_wb.close()
Output with the target words in bold:

Finding and replacing values in specific columns in a CSV file using dictionaries

My Goal here is to clean up address data from individual CSV files using dictionaries for each individual column. Sort of like automating the find and replace feature from excel. The addresses are divided into columns. Housenumbers, streetnames, directions and streettype all in their own column. I used the following code to do the whole document.
missad = {
'Typo goes here': 'Corrected typo goes here'}
def replace_all(text, dic):
for i, j in missad.items():
text = text.replace(i, j)
return text
with open('original.csv','r') as csvfile:
text=csvfile.read()
text=replace_all(text,missad)
with open('cleanfile.csv','w') as cleancsv:
cleancsv.write(text)
While the code works, I need to have separate dictionaries as some columns need specific typo fixes.For example for the Housenumbers column housenum , stdir for the street direction and so on each with their column specific typos:
housenum = {
'One': '1',
'Two': '2
}
stdir = {
'NULL': ''}
I have no idea how to proceed, I feel it's something simple or that I would need pandas but am unsure how to continue. Would appreciate any help! Also is there anyway to group the typos together with one corrected typo? I tried the following but got an unhashable type error.
missad = {
['Typo goes here',Typo 2 goes here',Typo 3 goes here']: 'Corrected typo goes here'}
is something like this what you are looking for?
import pandas as pd
df = pd.read_csv(filename, index_col=False) #using pandas to read in the CSV file
#let's say in this dataframe you want to do corrections on the 'column for correction' column
correctiondict= {
'one': 1,
'two': 2
}
df['columnforcorrection']=df['columnforcorrection'].replace(correctiondict)
and use this idea for other columns of interest.

xlsxwriter: how "inf" value can be uncolored?

using python package "xlsxwriter", I want to highlight cells in the following conditional range;
value > 1 or value <-1
However, some cells have -inf/inf values and it fill colors them too (to yellow). Is thare any way to unhighlight them?
I tried "conditional_format" function to uncolor them, but it doesn't work.
output example
format1 = workbook.add_format({'bg_color':'#FFBF00'}) #yellow
format2 = workbook.add_format({'bg_color':'#2E64FE'}) #blue
format3 = workbook.add_format({'bg_color':'#FFFFFF'}) #white
c_fold=[data.columns.get_loc(col) for col in data.columns if col.startswith("fold")]
c_fold.sort()
l=len(data)+1
worksheet.conditional_format(1,c_fold[0],l,c_fold[-1], {'type':'cell',
'criteria' : '>',
'value':1,
'format':format1,
})
worksheet.conditional_format(1,c_fold[0],l,c_fold[-1], {'type':'cell',
'criteria' : '<',
'value':-1,
'format':format2,
})
worksheet.conditional_format(1,c_fold[0],l,c_fold[-1], {'type':'text',
'criteria' : 'begins with',
'value':"-inf",
'format':format3,
})
Thanks in advance
In a lot of cases the answer to the question "how do I get this to work with XlsxWriter" is the same as the answer to the question "how do I get this to work with Excel".
If you try your example in Excel you will see that you get the same results as the XlsxWriter example. The > criteria is applies to -inf in Excel, thus it is highlighted as light orange. The fact that the following containing criteria also matches doesn't override the first matching criteria since Excel applies them in the order that the user supplies them.
The solution in Excel, and XlsxWriter, is to change the order that the rules are applied, like this:
import xlsxwriter
workbook = xlsxwriter.Workbook('conditional_format.xlsx')
worksheet1 = workbook.add_worksheet()
# Add some formats to use in the conditional formats.
format1 = workbook.add_format({'bg_color': '#FFBF00'})
format2 = workbook.add_format({'bg_color': '#2E64FE'})
format3 = workbook.add_format({'bg_color': '#FFFFFF'})
# Write some sample data.
worksheet1.write('A1', 2)
worksheet1.write('A2', '-inf')
worksheet1.write('A3', -2)
# Write a conditional formats over the same range.
worksheet1.conditional_format('A1:A3', {'type': 'text',
'criteria': 'begins with',
'value': "-inf",
'format': format3})
worksheet1.conditional_format('A1:A3', {'type': 'cell',
'criteria': '>',
'value': 1,
'format': format1})
worksheet1.conditional_format('A1:A3', {'type': 'cell',
'criteria': '<',
'value': 1,
'format': format2})
workbook.close()
Output:
This would solve
worksheet.conditional_format(1,c_fold[0],l,c_fold[-1], {'type':'text',
'criteria' : 'containing',
'value':"-inf",
'format':format3,
})

Categories