Updating pyfits bin table data - python

I am trying to update an existing fits table with pyfits. It is working fine for some columns of the table, unfortunately not for the first column.
Here is the columns definition:
ColDefs(
name = 'EVENT_ID'; format = '1J'; bscale = 1; bzero = 2147483648
name = 'TEL_ID'; format = '1I'
name = 'TIMESLICE'; format = '1I'; null = 0...
And the simple code fragment to update it:
event = pyfits.open('file.fits.gz')[1]
event.data.field('EVENT_ID')[0] = np.uint32(event.event_ID)
event.data.field('TEL_ID')[0] = int(tel.ID[2])
event.writeto('test.fits')
Writing TEL_ID (and others not shown here) works, EVENT_ID does not. I already tried different formats (np.int32, int) but always the same...
type(event.data.field('EVENT_ID')[0])
returns numpy.uint32 (for the unmodified file)
Thanks for your help
Edit:
If I change the definition of 'EVENT_ID', leaving out 'bscale' and 'bzero' the update of the value works. So it seems there is a problem with the unsigned integer.

Related

Pandas Styler.to_latex() - how to pass commands and do simple editing

How do I pass the following commands into the latex environment?
\centering (I need landscape tables to be centered)
and
\caption* (I need to skip for a panel the table numbering)
In addition, I would need to add parentheses and asterisks to the t-statistics, meaning row-specific formatting on the dataframes.
For example:
Current
variable
value
const
2.439628
t stat
13.921319
FamFirm
0.114914
t stat
0.351283
founder
0.154914
t stat
2.351283
Adjusted R Square
0.291328
I want this
variable
value
const
2.439628
t stat
(13.921319)***
FamFirm
0.114914
t stat
(0.351283)
founder
0.154914
t stat
(1.651283)**
Adjusted R Square
0.291328
I'm doing my research papers in DataSpell. All empirical work is in Python, and then I use Latex (TexiFy) to create the pdf within DataSpell. Due to this workflow, I can't edit tables in latex code while they get overwritten every time I run the jupyter notebook.
In case it helps, here's an example of how I pass a table to the latex environment:
# drop index to column
panel_a.reset_index(inplace=True)
# write Latex index and cut names to appropriate length
ind_list = [
"ageFirm",
"meanAgeF",
"lnAssets",
"bsVol",
"roa",
"fndrCeo",
"lnQ",
"sic",
"hightech",
"nonFndrFam"
]
# assign the list of values to the column
panel_a["index"] = ind_list
# format column names
header = ["", "count","mean", "std", "min", "25%", "50%", "75%", "max"]
panel_a.columns = header
with open(
os.path.join(r"/.../tables/panel_a.tex"),"w"
) as tf:
tf.write(
panel_a
.style
.format(precision=3)
.format_index(escape="latex", axis=1)
.hide(level=0, axis=0)
.to_latex(
caption = "Panel A: Summary Statistics for the Full Sample",
label = "tab:table_label",
hrules=True,
))
You're asking three questions in one. I think I can do you two out of three (I hear that "ain't bad").
How to pass \centering to the LaTeX env using Styler.to_latex?
Use the position_float parameter. Simplified:
df.style.to_latex(position_float='centering')
How to pass \caption*?
This one I don't know. Perhaps useful: Why is caption not working.
How to apply row-specific formatting?
This one's a little tricky. Let me give an example of how I would normally do this:
df = pd.DataFrame({'a':['some_var','t stat'],'b':[1.01235,2.01235]})
df.style.format({'a': str, 'b': lambda x: "{:.3f}".format(x)
if x < 2 else '({:.3f})***'.format(x)})
Result:
You can see from this example that style.format accepts a callable (here nested inside a dict, but you could also do: .format(func, subset='value')). So, this is great if each value itself is evaluated (x < 2).
The problem in your case is that the evaluation is over some other value, namely a (not supplied) P value combined with panel_a['variable'] == 't stat'. Now, assuming you have those P values in a different column, I suggest you create a for loop to populate a list that becomes like this:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
Now, we can apply a function to df.style.format, and pop/select from the list like so:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
def func(v):
fmt = fmt_list.pop(0)
return fmt.format(v)
panel_a.style.format({'variable': str, 'value': func})
Result:
This solution is admittedly a bit "hacky", since modifying a globally declared list inside a function is far from good practice; e.g. if you modify the list again before calling func, its functionality is unlikely to result in the expected behaviour or worse, it may throw an error that is difficult to track down. I'm not sure how to remedy this other than simply turning all the floats into strings in panel_a.value inplace. In that case, of course, you don't need .format anymore, but it will alter your df and that's also not ideal. I guess you could make a copy first (df2 = df.copy()), but that will affect memory.
Anyway, hope this helps. So, in full you add this as follows to your code:
fmt_list = ['{:.3f}','({:.3f})***','{:.3f}','({:.3f})','{:.3f}','({:.3f})***','{:.3f}']
def func(v):
fmt = fmt_list.pop(0)
return fmt.format(v)
with open(fname, "w") as tf:
tf.write(
panel_a
.style
.format({'variable': str, 'value': func})
...
.to_latex(
...
position_float='centering'
))

Is there a way to obtain the actual A1 range from sheetfu - get_data_range()

I am trying to obtain the actual A1 values using the Sheetfu library's get_data_range().
When I use the code below, it works perfectly, and I get what I would expect.
invoice_sheet = spreadsheet.get_sheet_by_name('Invoice')
invoice_data_range = invoice_sheet.get_data_range()
invoice_values = invoice_data_range.get_values()
print(invoice_data_range)
print(invoice_values)
From the print() statements I get:
<Range object Invoice!A1:Q42>
[['2019-001', '01/01/2019', 'Services']...] #cut for brevity
What is the best way to get that "A1:Q42" value? I really only want the end of the range (Q42), because I need to build the get_range_from_a1() argument "A4:Q14". My sheet has known headers (rows 1-3), and the get_values() includes 3 rows that I don't want in the get_values() list.
I guess I could do some string manipulation to pull out the text between the ":" and ">" in
<Range object Invoice!A1:Q42>
...but that seems a bit sloppy.
As a quick aside, it would be fantastic to be able to call get_data_range() like so:
invoice_sheet = spreadsheet.get_sheet_by_name('Invoice')
invoice_data_range = invoice_sheet.get_data_range(start="A4", end="")
invoice_values = invoice_data_range.get_values()
...but that's more like a feature request. (Which I'm happy to do BTW).
Author here. Alan answers it well.
I added some methods at Range level to the library, that are simply shortcuts to the coordinates properties.
from sheetfu import SpreadsheetApp
spreadsheet = SpreadsheetApp("....access_file.json").open_by_id('long_string_id')
sheet = spreadsheet.get_sheet_by_name('test')
data_range = sheet.get_data_range()
starting_row = data_range.get_row()
starting_column = data_range.get_column()
max_row = data_range.get_max_row()
max_column = data_range.get_max_column()
This will effectively tell you the max row and max column that contains data in your sheet.
If you use the get_data_range method, the first row and first column typically is 1.
I received a response from the owner of Sheetfu, and the following code provides the information that I'm looking for.
Example code:
from sheetfu import SpreadsheetApp
spreadsheet = SpreadsheetApp("....access_file.json").open_by_id('long_string_id')
sheet = spreadsheet.get_sheet_by_name('test')
data_range = sheet.get_data_range()
range_max_row = data_range.coordinates.row + data_range.coordinates.number_of_rows - 1
range_max_column = data_range.coordinates.column + data_range.coordinates.number_of_columns - 1
As of this writing, the .coordinates properties are not currently documented, but they are usable, and should be officially documented within the next couple of weeks.

Index out of bounds for a 2D Array?

I am having a 2D List and i am trying to retrieve a column with the index spcified as a parameter (type : IntEnum).
I get the index out of bounds error when trying to retrieve any column other then the one at index 0.
Enum:
class Column(IntEnum):
ROAD = 0
SECTION = 1
FROM = 2
TO = 3
TIMESTAMP = 4
VFLOW=5
class TrafficData:
data=[[]]
Below are member methods of TrafficData
Reading from file and storing the matrix:
def __init__(self,file):
self.data=[[word for word in line.split('\t')]for line in file.readlines()[1:]]
Retrieve-ing the desired column:
def getColumn(self,columnName):
return [line[columnName] for line in self.data]
Call:
)
column1 = traficdata.getColumn(columnName=Column.ROAD)
`column2 = traficdata.getColumn(columnName=Column.FROM)` //error
`column3 = traficdata.getColumn(columnName=Column.TO)` //error
I attached a picture with the data after __init__ processing:
[
I tested the code that you provided above, and didn't see any issues. That leads me to believe that there might be something wrong with the data that you have in the file. Could you paste the file data? (the tab delimited data)
UPDATE -
I found the issue - as suspected, it was a data issue (there is a minor code update involved too). Make the following changes -
1) When opening the file use the appropriate encoding, I used utf-16.
2) At the end of the data file that you shared, it contains the text - "(72413 row(s) affected)" along with a couple of new line characters. So, you have 2 options, either manually cleanup the data file, or update the code to ignore the "(72413 row(s) affected)" & "\n" characters.
Hope that helps.

Retrieve output parameters from an AutoCAD API method in python

I'm trying to retrieve 2 output Arrays from an XRecord in AutoCAD 2016 using python 2.7, with comtypes imported, the first array is an array of integers (DXF Group Codes) and the second array is an array of variants (the values of XRecord).
The opposite way of what this question seeks to
The method of interest is GetXRecordData, which (according to AutoCAD's documentation) if successful returns None, and only accepts 2 output arguments.
when I try to retrieve it with code like
DxfGrCd = []
vals = []
an_XRecord.GetXRecordData(DxfGrCd, vals)
and see the values of DxfGrCd and vals I found no change happened to them, both of them still equal to [], the same is also with
DxfGrCd = {}
vals = {}
anXRecord.GetXRecordData(DxfGrCd, vals)
also no change is applied on them, both of them still equal to {}, even though dictionaries and lists are mutable.
Is there any way to deal with that kind of methods in python?
Well, I haven't figured out any way to do so from python, however, since the data stored in XRecords are just numbers and strings (in my application), stored in the XRecord as variants, I've used MS Excel as a middle man to pass me data.
Note: All numbers I've got were retrieved but as floats.
And all strings were retrieved but their type is unicode. (you can convert them to string easily with the built-in function str())
Here's how I've done that.
First: Creation of The Facilitator Workbook (Our Middle Man)
1-Normally as a regular windows user, open Excel, then open Visual Basic Editor, one way to do that is to go to Developer tab and click on Visual Basic Editor.
2-From the Editor, insert a module (one way is from the menu bar: insert>Module), then left-double click on its default name and type "mod_facilitate", then hit Enter.
3-Left-double click on its icon at the project viewer.
4- A window will appear, copy the following code to it.
Sub getxrecord()
'get running AutoCAD object
Dim mycad As AcadApplication, mydoc As AcadDocument, filepath As String
Set mycad = GetObject(, "AutoCAD.Application.20")
'get the selected drawing, provided from python code
With Sheet1
filepath = .Range(.Cells(1, 1), .Cells(1, 1)).Value
End With
Dim iCount As Integer, i As Integer, j As Integer, CompName As String
iCount = mycad.Documents.Count
For i = 0 To iCount - 1
CompName = mycad.Documents.Item(i).FullName
If CompName Like filepath Then
j = i
Exit For
End If
Next i
Set mydoc = mycad.Documents.Item(j)
Dim name2 As String
'get the object from its provided handle
With Sheet1
handler = .Range(.Cells(2, 1), .Cells(2, 1)).Value
End With
Dim myXRecord As AcadXRecord
Set myXRecord = mydoc.HandleToObject(handler)
Dim DxfGrcd As Variant, Val As Variant
DxfGrcd = Array()
Val = Array()
myXRecord.GetXRecordData DxfGrcd, Val
Dim UB As Integer
UB = UBound(DxfGrcd)
For i = 0 To UB
With Sheet1
.Range(.Cells((i + 1), 2), .Cells((i + 1), 2)).Value = DxfGrcd(i)
.Range(.Cells((i + 1), 3), .Cells((i + 1), 3)).Value = Val(i)
End With
Next i
End Sub
5- From Tools>References Select these reference names, leaving the others at their previous states
AcSmComponents20 1.0 Type Library
AutoCAD 2016 Type Library
CAO 1.0 Type Library
Then click on OK, then hit Ctrl+s to save.
6- Save the file and name it "facilitator", save it within the same directory of your python file. Save it of type Excel Macro-Enabled Workbook (has the extension .xlsm)
7- At your python file, define the function to retrieve XRecord's data as following, I'll tell what are its arguments for:
def XRecord_return(namefile,handle,size):
xl.Range["A1"].Value[xlRangeValueDefault] = namefile
xl.Range["A2"].Value[xlRangeValueDefault] = handle
xl.Application.Run("facilitator.xlsm!mod_facilitate.getxrecord")
dxfgrcd = []
vals = []
for i in range(0,size):
CellB = 'B' + str(i+1)
CellC = 'C' + str(i+1)
dxfgrcd.append(xl.Range[CellB].Value[xlRangeValueDefault])
vals.append(xl.Range[CellC].Value[xlRangeValueDefault])
return dxfgrcd,vals
Second: What to Insure
Note: All the following steps must be written before the definition of XRecord_return
1- AutoCAD must be instantiated from python using a line like autocad = CreateObject("AutoCAD.Application.20",dynamic=True) or autocad = comtypes.client.CreateObject("AutoCAD.Application.20",dynamic=True) depending on the scope of importing and importing form [ import comtypes.client or from comtypes.client import CreateObject ], here, importing scope is the python file's module scope.
2-instantiate Excel using xl = CreateObject("Excel.Application") and open the facilitator file with
xlpath = os.getcwd()
xlpath += '\\facilitator.xlsm'
xl = CreateObject("Excel.Application")
from comtypes.gen.Excel import xlRangeValueDefault
xlwb = xl.Workbooks.Open(Filename=xlpath,ReadOnly=0)
3- You have to know how many elements are stored in the XRecord (excluding the number of associated DXF group codes), this number of elements is what you'll supply to XRecord_return as its size argument.
e.g. An XRecord that stores 3.0 "abc" 5 and have correspondent DXF group codes 1 2 3 is of size 3, not 6.
Third: Supplying Data to The Facilitator Workbook
We need only its first worksheet, you must provide the following data:-
1- The drawing's full path/directory to cell "A1".
To get the drawing's full path if you have its Document object you can get it from the property FullName. This value is what you'll supply to XRecord_return as its namefile argument.
To assign, for instance: xl.Range["A1"].Values[xlRangeValueDefault] = filepath
2-The XRecord's handle value to cell "A2", you can get it from the property Handle of the XRecord. This value is what you'll supply to XRecord_return as its 'handle' argument.
To assign, for instance: xl.Range["A1"].Values[xlRangeValueDefault] = handlevalue
3- After that, wherever you need to get the XRecords data, call the XRecord_return function, like
DxfGrCd,vals = XRecord_return(filepath,handlevalue,size_of_XRecord)
The outputs are lists that contain the correspondent data.
Last, But not Least
When you finish using Excel for retrieving data from as many XRecords as you need, close the facilitator workbook using xlwb.Close(SaveChanges=0)

Python - Reading a CSV, won't print the contents of the last column

I'm pretty new to Python, and put together a script to parse a csv and ultimately output its data into a repeated html table.
I got most of it working, but there's one weird problem I haven't been able to fix. My script will find the index of the last column, but won't print out the data in that column. If I add another column to the end, even an empty one, it'll print out the data in the formerly-last column - so it's not a problem with the contents of that column.
Abridged (but still grumpy) version of the code:
import os
os.chdir('C:\\Python34\\andrea')
import csv
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
if 'phone' in tableHeader:
phoneIndex = tableHeader.index('phone')
else:
phoneIndex = -1
for row in exampleReader:
row[-1] =''
print(phoneIndex)
print(row[phoneIndex])
csvOpen.close()
my.csv
stuff,phone
1,3235556177
1,3235556170
Output
1
1
Same script, small change to the CSV file:
my.csv
stuff,phone,more
1,3235556177,
1,3235556170,
Output
1
3235556177
1
3235556170
I'm using Python 3.4.3 via Idle 3.4.3
I've had the same problem with CSVs generated directly by mysql, ones that I've opened in Excel first then re-saved as CSVs, and ones I've edited in Notepad++ and re-saved as CSVs.
I tried adding several different modes to the open function (r, rU, b, etc.) and either it made no difference or gave me an error (for example, it didn't like 'b').
My workaround is just to add an extra column to the end, but since this is a frequently used script, it'd be much better if it just worked right.
Thank you in advance for your help.
row[-1] =''
The CSV reader returns to you a list representing the row from the file. On this line you set the last value in the list to an empty string. Then you print it afterwards. Delete this line if you don't want the last column to be set to an empty string.
If you know it is the last column, you can count them and then use that value minus 1. Likewise you can use your string comparison method if you know it will always be "phone". I recommend if you are using the string compare, convert the value from the csv to lower case so that you don't have to worry about capitalization.
In my code below I created functions that show how to use either method.
import os
import csv
os.chdir('C:\\temp')
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
phoneColIndex = None;#init to a value that can imply state
lastColIndex = None;#init to a value that can imply state
def getPhoneIndex(header):
for i, col in enumerate(header): #use this syntax to get index of item
if col.lower() == 'phone':
return i;
return -1; #send back invalid index
def findLastColIndex(header):
return len(tableHeader) - 1;
## methods to check for phone col. 1. by string comparison
#and 2. by assuming it's the last col.
if len(tableHeader) > 1:# if only one row or less, why go any further?
phoneColIndex = getPhoneIndex(tableHeader);
lastColIndex = findLastColIndex(tableHeader)
for row in exampleReader:
print(row[phoneColIndex])
print('----------')
print(row[lastColIndex])
print('----------')
csvOpen.close()

Categories