I have a table in which I put numbers (as a string) in a column. For some reason, it appears that numbers with 2 or more periods (i.e. 5.5.5) will align on the left side of the cell, while numbers with fewer periods (i.e. 55.5) will align on the right side of the cell. Does anyone know how to change this?
I had a similar problem. My solution was slightly different.
When filling each item into your table, check it matches your '5.5.5' format and set the item to be right aligned.
cell = QTableWidgetItem(value)
tableWidget.setItem(row, col, cell)
# check the value matches your requirement, via regex or as below
check = value.replace('.', '')
if check.isdigit():
tableWidget.item(row, col).setTextAlignment(QtCore.Qt.AlignRight|QtCore.Qt.AlignVCenter)
I understand that the characters used for the thousands separator and the decimal mark may differ between locales, but surely no locale could sensibly interpret 5.5.5 as a number? Given that, it's hardly surprising that Qt wants to treat it as ordinary text.
But anyway, the docs for QTableItem suggest you can work around this by reimplementing the alignment function:
class TableItem(QTableItem):
def alignment(self):
if is_pseudo_number(self.text()):
return Qt.AlignRight
return QTableItem.alignment(self)
...
table.setItem(row, column, TableItem('5.5.5'))
The implementation of is_pseudo_number() is left as an exercise for the reader...
(PS: since you are using PyQt3, the above code is completely untested)
Related
I have to clean a column "Country" of a DataFrame where, sometimes, the country names are followed by numbers (for example we will see "France6" instead of France). I would like to separate the country name from the number that follows it.
I coded this function to solve the problem:
def new_name2(row):
for item in re.finditer("([a-zA-Z]*)(\d*)",row.Country):
row.Country=item.group(1)
return row
We can see that I created two groups, the first one to catch the country name, and the other to separate the number. Following that, I should get (France)(6).
Unfortunately, when I run it, my Country column turns empty. This means that the first group that I get is not "France" but "" and I don't understand why, because on a regex website, I can see that my expression ([a-zA-Z]*)(\d*) is working.
Your loop rewrites row.Country each time even with a zero-length match!
Instead, you could strip off the numbers directly
df["Country"] = df["Country"].str.rstrip("0123456789")
Using a dedicated Pandas method will almost-certainly be much faster than simple Python loop due to vectorizing
Add a beginning and ending match like this:
^([a-zA-Z]*)(\d*)$
This will force it to match the entire string. Perhaps that was the problem.
If that doesn't work, try logging the regex result. Maybe your inputs are faulty.
I am relatively new to Python and very new to NLP (and nltk) and I have searched the net for guidance but not finding a complete solution. Unfortunately the sparse code I have been playing with is on another network, but I am including an example spreadsheet. I would like to get suggested steps in plain English (more detailed than I have below) so I could first try to script it myself in Python 3. Unless it would simply be easier for you to just help with the scripting... in which case, thank you.
Problem: A few columns of an otherwise robust spreadsheet are very unstructured with anywhere from 500-5000 English characters that tell a story. I need to essentially make it a bit more structured by pulling out the quantifiable data. I need to:
1) Search for a string in the user supplied unstructured free text column (The user inputs the column header) (I think I am doing this right)
2) Make that string a NEW column header in Excel (I think I am doing this right)
3) Grab the number before the string (This is where I am getting stuck. And as you will see in the sheet, sometimes there is no space between the number and text and of course, sometimes there are misspellings)
4) Put that number in the NEW column on the same row (Have not gotten to this step yet)
I will have to do this repeatedly for multiple keywords but I can figure that part out, I believe, with a loop or something. Thank you very much for your time and expertise...
If I'm understanding this correctly, first we need to obtain the numbers from the string of text.
cell_val = sheet1wb1.cell(row=rowNum,column=4).value
This will create a list containing every number in the string
new_ = [int(s) for s in cell_val.split() if s.isdigit()]
print(new_)
You can use the list to assign the values to the column.
Then define the value of the 1st number in the list to the 5th column
sheet1wb1.cell(row=rowNum, column=5).value = str(new_[1])
I think I have found what I am looking for. https://community.esri.com/thread/86096 has 3 or 4 scripts that seem to do the trick. Thank you..!
I have per-row semicolon separated manufacturer products and vendor data which has missing cells.
The spreadsheet has multiple vendors for each manufacturers product.
The order of the data will always be the same columnwise:
vendor_name_1, vendor_address_1, vendor_phone_1, vendor_fax_1, vendor_contact_mobile_1, vendor_contact_email_1, etc.
When there is more than one vendor for that product (almost all most do), there is another repeat of the columns in the same order left to right:
vendor_name_2, vendor_address_2, vendor_phone_2, vendor_fax_2, vendor_contact_mobile_2, vendor_contact_email_2, etc.
At this point the sets of columns repeat as long as there are more vendors for the product on that row.
A "good" row will have all of the available data in the correct column:
Motion Distributors; 3231 Apex Drive; Dulles, Ohio 45321; (321) 542-6422(p); (321) 542-6428(f); (321) 542-6680(m); alan#motiondist.com; etc. etc.
A "bad" row will have one or more missing items for at least one vendor on the row, which of course effects everything to the right of that missing cell, so everything is shifted.
Since some of the data in the cells are missing, the issue is getting the data in each row back to the correct cell.
For example, if the vendor_fax number is missing, all of the cells to the right of that missing cell do not go into the correct column and are shifted.
To make things worse, because there are multiple vendors for the same product, the more missing cells per row, the more shifting occurs on that row.
Is there a way to fix this since each column set has the same arrangement and only the extra delimiters are missing?
I am hoping there is a fix at least for company and contact names and the phone numbers by a generic match of each column type (name, phone number, email, etc.)?
Is there a way to process the spreadsheet by each row to ensure the matches occur? If necessary, I can split cells into other columns if it will allow for more specific matching.
I am desperate enough to go with any language or utility necessary to solve the problem.
I have searched through several categories here on SO and am not seeing a way to solve this (yet)...
Assuming the format of things like phone numbers is predictable and easy to discern (i.e the difference between the phone and fax is obvious as in your example) then it should be fairly easy to take a good guess at how fields match.
I would create a hash of Regex's something like:
field_regexes = {
name: \^.+$\,
street: \^d+\s\,
city: \^.*,\s\d{5}$\,
phone: \^\(\d{3}\)\s\d{3}\-\d{4}\(p\)$\,
fax: \^\(\d{3}\)\s\d{3}\-\d{4}\(f\)$\,
mobile: \^\(\d{3}\)\s\d{3}\-\d{4}\(m\)$\,
email: \^\w+\#\w+.\w+$\,
etc...
}
The code might be something like:
fields = input.split(';').map(&:strip)
while fields.present? do
record = parse_record(fields)
break unless record.present? # something went wrong
save(record)
end
def parse_record(fields)
result = {}
field_regexes.each do |name, regex| do
if fields[0] =~ regex
result[name] = fields.unshift
break if fields.empty?
end
end
result
end
Note: This assumes that there are no colons that should be considered valid data (A colon in an address or company name for example)
The ideal solution would be to have whoever is sending you this data spit out a delimiter even for blank columns, then all your columns would line up on import with no problem. Assuming that getting your input fixed isn't an option...
I think you'll need to identify what data is in each column of your input on a column-by-column basis
Email addresses are easy - there is an # and a . in them or they're not valid. If you find one that's in the wrong place, shift right until it's in the email column.
Missing phone numbers are pretty easy too. Since they seem to have (p), (f) and (m) to identify the number type, simply pull the last three characters to identify which number you have. If you're missing one, shift the remainder to the right until the one you have is in the correct column.
Identifying a zip code is fairly straight forward, it's either 5 02134 or 9 digits 021345678 or, possibly 10 characters 02134-5678. Shift right until it lines up.
If states are spelled out, have a table listing all the states, and if you find a match earlier than expected, shift right until the state is in the right place. If states are the standard 2-character postal abbreviations, just look for a 2-character column and shift right until it matches.
US street addresses should start with a house (or building) number, so a character string beginning with a digit should be an address, but it could possibly be a zip-code (zip+4 with the embedded dash -), so if it's all numeric (possibly including the dash), then that's the zip field, otherwise it's an address field.
The city... well that's an all alpha field that should be what's left after sorting out all the rest.
The company name - all this presumes the company name is actually there to begin the record, if not, you might be a bit out of luck, but I'm sure there's some way to identify what's there.
You might want to try something like a state machine. I'm at this point in the record, so here's what I expect to find next, let's look at the next column of data to see what's actually there, and shift right until it seems to line up. That should minimize the error for a missing company name or city name misidentification.
You should be able to write that in the language of your choice, but it may not be the fastest thing in the world, since it will have to do the import one field at a time.
As noted here: http://docs.wxwidgets.org/trunk/classwx_list_box.html
Notice that currently TAB characters in list box items text are not handled consistently under all platforms, so they should be replaced by spaces to display strings properly everywhere. The list box doesn't support any other control characters at all.
So far in my experience while using Python 2.7 32-bit in Windows 7, using \t within the string of a wxListBox selection has no effect; as expected
I have a bunch of rows from the database and I have multiple columns that I want to display (and eventually use on selection of one or more row) within a row in wxListBox. For now I am using spaces as recommended as the delimiter between values in the string. However, this is not really ideal since the columns are variable length.
Is there an alternative to the \t that is not a simple delimiter? The point here is to have all of the columns for each row presented neatly i.e.
column1 value1 value2
column442142 values24234234 val2
rather than
column1 value1 value2
column442142 values24234234 val2
wxGrid comes to mind but I don't think that would work for me because I don't want to be able to select specific cells within a row (I can't seem to find the function to disable that), I only want the user to be able to select a row or multiple rows.
My advice would be to use wxListCtrl for the simple multicolumn data display or wxDataViewCtrl if you need more features.
FWIW you can use wxGrid::SetSelectionMode() with wxGridSelectRows argument to disable cell selection but wxGrid is arguably still not the best control to use for something like this.
See this slide from my lectures for a brief summary of different controls.
print " 4 whitespaces replace a tab"
print "%20s"%some_string_padded_to_20_chars
I have an array, of unknown length, of key:val pairs. Each pair occupies a row in a FlexGridSizer. The keys are in the first column, as wx.StaticTexts, and the vals are in the second column, as wx.TextCtrls.
The problem is that there isn't a lot of room available, and some of the vals are relatively long, and don't fit in the wx.TextCtrls. I would like to have all of the wx.TextCtrls be maybe 2 or 3 lines in height.
I've tried using style = wx.TE_MULTILINE, but that just adds a vertical scrollbar, as opposed to the default behaviour of scrolling horizontally with left/right/home/end etc.
Any ideas?
I suggest you use wxGrid.
http://docs.wxwidgets.org/2.9.2/overview_grid.html
According to the documentation for the wx.TextCtrl, you can apply the wx.HSCROLL style to it to make the control have a horizontal scrollbar, but this won't work on GTK1-based systems: http://xoomer.virgilio.it/infinity77/wxPython/Widgets/wx.TextCtrl.html
There's also an ExpandoTextCtrl that you might want to look at: from wx.lib.expando import ExpandoTextCtrl (see the wxPython demo for an example)
I ended up using FlexGridSizer. I made each of the val cells span across two rows, and added empty wx.Size()s below each key. The result is something like this: