I have three columns in an Excel spreadsheet and am trying to build a stacked chart.
If all three columns has values then that row should go to bucket 1.
If column A and column B has values and column c has no value then the row should go to bucket 2.
If only column A has value then the row should go to bucket 3.
Using the counts from all 3 buckets a stacked chart should be created.
I started with Pandas by reading excel file into a dataframe, and I am stuck on how to look for values in columns and get count.
I tried using xlswriter and am stuck.
Related
I get column names list from python pandas dataframe by columnvalues = List(df.columns.values) and row values by df.query('A=="foo"'). However, I will not require all cell values from all columns. I'd like to map or zip them as key(column name): value(cell value) for using separately as an output in an excel sheet.
columnvalues = List(df.columns.values)
['ColA','ColB','ColC','ColD','ColE']
rowData=df.loc[df['ColA']=='apple']
ColA ColB ColC ColD ColE
13 apple NaN height width size
I have columnValues, but if I could also row values I can easily use
dict(zip(colValues, rowValues)) method to create columnKey rowValue based dictionary then by calling dictionary to write output excel files. Because in Excel file which is output file, column numbers and column places differ from how they are set up in dataframe object.
Any ideas on how I can achieve this result, even with a different approach, would be greatly appreciated.
I need method to get this result below;
rowValuesList=['apple', NaN, 'height','width','size']
We could do
rowValuesList = rowData.iloc[0].tolist()
I am trying to create a new .txt file from a dataframe based on the value of a certain column.
The file is being created correctly, but the output is not as expected.
I would like for the columns to be next to each othen and not be split(7 columns with the data, then 6 columns with the data etc.)
I am trying the below:
with open ("test.txt" , "wt") as f:
f.write(str(df[df['binary result']==1]))
This is the output i am getting in the txt file: https://i.stack.imgur.com/aNJOa.png
I would like for the 20 columns that i have to be presented side by side.
For example after deadband column instead of \ to be presented the next column.
At the moment the next column is presented after the data of the first 7 columns, and the same thing applies until the columns are finished.
How to move the data from different rows with same ID in same rows but different column? For example, I have this table in sheet1
Input Sheet
Desired Output Sheet
I imported two csv files in python using read_csv. So now I have 2 dataframes with dimensions 40x300. What I want to do is create a new csv file with dimensions 40x300, where each cell will have the mean value calculated using the values of the respective position in the other two csv files. For example, if the cell with position 1x2 in the first dataframe is 10 and the cell with the same position in the second dataframe is 20, I want a third dataframe with dimensions 40x300 which has the value of 15 in position 1x2. I tried
frame1.add(frame2)
but this created a new dataframe with dimensions 40x600. Any help would be very appreciated.
Panda is index and column sensitive , when you adding them up, you always need to make sure the two df have the same index and column
frame2.index=frame1.index
frame2.columns=frame1.columns
frame1.add(frame2)/2
After change the index and columns concatwill work as well
pd.concat([frame1,frame2]).mean(level=0)
I took data from an .xlsx file and stored it in the dataframe. The data frame is called df, and the size of the dataframe is (51,3). 51 rows. 3 columns. The columns are unnamed and numbered 0,1,2. The rows are indexed from 0-50. What syntax would I use to extract data from a dataframe with pandas in python and put it into a csv? I know I would use DataFrame.to_csv("outputFile.csv" ), but I'm not sure how to identify a specific piece of data (row/column pair), so I can put it in a new location in the csv table in comparison to the old excel table.
You can use integer based indexing using iloc: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-position
In your case, the row/col value you are looking for can be retrieved by:
df.iloc[row_id, col_id]