XLSXWriter Format multiple rows - python

Trying to do something that should be simple. XLSXWriter has a function set_column that lets me format multiple columns at the same time:
worksheet.set_column('B:D', 30, align)
However, there is no such row function, as
worksheet.set_row(5,None, percentage)
operates on but one row at a time.
I've tried doing the following to no avail:
worksheet.conditional_format('C27:W27', {'format': percentage})
How can I simply set cells C27:W27 to be percentage format?

Unfortunately, a method such as worksheet.set_row(5,None, percentage) does not exist. I ended up scrapping this project.

Related

Python Iteratively Read and Write Rows

I am trying to read an excel file and write every fourth row into a new Excel file. I'm using Pandas to read and write, and if int(num%4) == 0 to determine which rows to select, but the iteration and subsequent writing continue to escape me. I've tried my best to look up answers, but I'm a new programmer and struggling :/
If you're using Pandas I'm assuming you've loaded the data into a dataframe?
If so then consider this:
import pandas as pd
df = pd.read_csv('YourFile.csv')
df.iloc[::4]
#once you're done with the data you can save it to another csv file
df.to_csv('OutputFile.csv')
This will leave your dataframe df with the 4th, 8th, 12th, etc. rows from your original dataframe/file. You can then read/write to each row left in the dataframe df. To visualize the before and after just insert df.head() before and after the df.iloc[::4] expression.
I did not understand what the problem is to be more specific, but you should try pandas' iloc property (or even loc depending on your df), check more info in here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html

having issues formatting output to excel from dataframes, using xlsxwriter

I have a series of SQL database queries, that I am writing to Excel, using Xlswriter/Pandas.
I am using a simple global format, for font type and size.
Each table is different, so the only thing I want to do, is present a standard font and size.
format = workbook.add_format()
format.set_font_size(9)
format.set_font_name='Calibri'
for col_name in df1:
column_width=max(df1[col_name].astype(str).map(len).max(),len(col_name))
col_idx=df1.columns.get_loc(col_name)
if col_idx < 4:
column_width=column_width + 1
worksheet1.set_column(col_idx,col_idx,column_width,format)
writer.save()
This all work well, until I encounter a DATE.
There may be multiple date fields, or no date field in each Excel table
All the fonts in the output are 9, except the date field. All the date fields are showing up as 11 and I don't know how to resolve the issue:
Also,
the dates themselves, show up in Excel as Date-time, not Date, even though they are defined in the Database as a Date field. Converting them is also an issue. I cant seem to get rid of the Time portion.
Any help would be greatly appreciated. I have spent waay to much time on this.
Sounds to me like this problem.
If you have existing files, create a template excel file with in the correct format and use python to just fill the cells. (This scenario is the accepted answer in the post). You can also define a certain style in excel once and apply it to columns
Apparently, you have a somehow more complicated scenario. The second answer proposes to adjust the data in pandas before writing it to excel.
However, my personal guess is that it is rather formatting problem of excel so your approach seems reasonable. How about specifying the format explicitly: format = workbook.add_format({'num_format': 'yyyy-mm-dd'})? (Which youl rather align with this post. Try specifying the font height if setting the global font size does not work: 'height': 9*20 (note that you need to scale the height by 20 to use "points" as unit)

Interpolating data for missing values pandas python

enter image description here[enter image description here][2]I am having trouble interpolating my missing values. I am using the following code to interpolate
df=pd.read_csv(filename, delimiter=',')
#Interpolating the nan values
df.set_index(df['Date'],inplace=True)
df2=df.interpolate(method='time')
Water=(df2['Water'])
Oil=(df2['Oil'])
Gas=(df2['Gas'])
Whenever I run my code I get the following message: "time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex"
My Data consist of several columns with a header. The first column is named Date and all the rows look similar to this 12/31/2009. I am new to python and time series in general. Any tips will help.
Sample of CSV file
Try this, assuming the first column of your csv is the one with date strings:
df = pd.read_csv(filename, index_col=0, parse_dates=[0], infer_datetime_format=True)
df2 = df.interpolate(method='time', limit_direction='both')
It theoretically should 1) convert your first column into actual datetime objects, and 2) set the index of the dataframe to that datetime column, all in one step. You can optionally include the infer_datetime_format=True argument. If your datetime format is a standard format, it can help speed up parsing by quite a bit.
The limit_direction='both' should back fill any NaNs in the first row, but because you haven't provided a copy-paste-able sample of your data, I cannot confirm on my end.
Reading the documentation can be incredibly helpful and can usually answer questions faster than you'll get answers from Stack Overflow!

How to write dataframe to csv with a single row header(5k columns)?

I am trying to export a pandas dataframe with to_csv so it can be processed by another tool before using it again with python. It is a token dataset with 5k columns. When exported the header is split in two rows. This might not be an issue for pandas but in this case I need to export it on a single row csv. Is this a pandas limitation or a csv format one?
Currently, searching returned no compatible results. The only solution I came up is writing the column names and the values separately, eg. writing an str column list first and then a numpy array to the csv. Can this be implemented, and if so how?
For me this problem was caused by having multiple indexes. The easiest way to resolve this issue is to specify your own headers. I found reference to an option called tupleize_cols but it doesn't exist in current (1.2.2) pandas.
I was using the following aggregation:
df.groupby(["device"]).agg({
"outage_length":["count","sum"],
}).to_csv("example.csv")
This resulted in the following csv output:
,outage_length,outage_length
,count,sum
device,,
device0001,3,679.0
device0002,1,113.0
device0003,2,400.0
device0004,1,112.0
I specified my own headers in the call to to_csv; excluding my group_by, as follows:
}).to_csv("example.csv",header=("flaps","downtime"))
And got the following csv output, which was much more pleasing to spreadsheet software:
device,flaps,downtime
device0001,3,679.0
device0002,1,113.0
device0003,2,400.0
device0004,1,112.0

Is this possible in python with pandas or xlwt

I have an existing excel. That looks like
and I have another excel that has around 40000 rows and around 300 columns. shortened version looks like
I would like to append values to my existing excel from second excel. But only values that match values in col4 from my existing excel. So i would get something like this
Hope you guys get the picture of what I am trying to do.
yes, that is possible in pandas and it is way faster than anything in excel
df_result = pd.merge(FirstTable, SecondTable, how='left', on='col4')
this will look into both the tables for column "col4" so it needs to be named this way in both the tables.
Also be aware of the fact that if you have multiple values in second table for single value in the first table it will make as many lines in the result as in the second table.
to read the excel you can use:
import pandas as pd
xl=pd.ExcelFile('MyFile.xlsx')
FirstTable = pd.read_excel(xl, 'sheet_name_FIRST_TABLE')
for more detailed description see documentation

Categories