This question already has answers here:
Is there a way to auto-adjust Excel column widths with pandas.ExcelWriter?
(16 answers)
Closed 2 years ago.
I have a dataframe df:
A B LongColName1 AnotherNa AnotherName3
Brunner Island is not island Baltimore is town 0.26 3.88 3.75
Brunner Island is not island Baltimore is town -0.59 1.47 2.01
When I dump the above dataframe to excel, it appears as following in excel:
Is there a way to style the dataframe so that dump to excel looks as following:
One approach could be to find the max length of column and set the width of that column explicitly while writing to excel.
Consider below dataframe:
In [527]: df
Out[527]:
A
0 Brunner Island is not island
1 Brunner Island is not an island
len_max = df.A.str.len().max()
from StyleFrame import StyleFrame
excel_writer = StyleFrame.ExcelWriter(filename)
sf = StyleFrame(df)
sf.set_column_width(columns=['A'],width=len_max)
sf.to_excel(excel_writer=excel_writer)
excel_writer.save()
There is no way to auto adjust column width. But there are some workarounds mentioned in this post Is there a way to auto-adjust Excel column widths with pandas.ExcelWriter?
Related
This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 2 years ago.
Given a dataframe:
Location | Rate | Skill
San Francisco $56-$64 architect
Albany $43-$50 architect
San Francisco $23-$48 tester
I'm trying to turn that into this expected result:
Location | architect | tester
San Francisco $56-$64 $23-$48
Albany $43-$50
I thought about transposing on column 'Skill' and then setting it's value to the value of 'Rate' , but I'm not entirely sure how this can be done.
If you expect only one row for each Location-Skill combination
df.groupby(['Location', 'Skill']).first().unstack()
Or you can use pivot
df.pivot(index='Location', columns='Skill', values='Rate')
Notice groupby will return only the first row of each combination and pivot will fail if there is more than one row for any combination.
This question already has answers here:
Drop all duplicate rows across multiple columns in Python Pandas
(8 answers)
Closed 2 years ago.
I want to remove duplicate items completely from a pandas dataframe. For example, I have the dataframe:
location area
0 mountain view 1044ft2
1 palo alto None
2 mountain view 890ft2
3 san carlos 1000ft2
4 belmont None
What I want to do is find unique values in column location and remove any items that had duplicates altogether, completely, etc.. So the final product will look like this (notice mountain view is gone):
location area
1 palo alto None
3 san carlos 1000ft2
4 belmont None
Thanks.
Use
df.drop_duplicates(subset='location', keep=False)
This question already has an answer here:
pandas redefine isnull to ignore 'NA'
(1 answer)
Closed 2 years ago.
I know this sounds dumb, but I can't figure out what to do about data in a spreadsheet that equals "NA" (in my case, it's an abbreviation for "North America"). When I do a Pandas "read_excel", the data gets brought in as "NaN" instead of "NA".
Is "NA" also considered "Not a Number" like NaN is?
The input Excel sheet cells contain NA. The dataframe contains "NaN".
Any way to avoid this?
Solution
You can switch-off auto-detection of na-values by using keep_defaul_na=False in pandas.read_excel() as follows.
I am using the demo test.xlsx file that I created in the Dummy Data section.
pd.read_excel('test.xlsx', keep_default_na=False)
## Output
# Region Country
# 0 NA Canada
# 1 NA USA
# 2 SA Brazil
# 3 EU Sweden
# 4 AU Australia
Dummy Data
import pandas as pd
# Create a dummy dataframe for demo purpose
df = pd.DataFrame({'Region': ['NA', 'NA', 'SA', 'EU', 'AU'],
'Country': ['Canada', 'USA', 'Brazil', 'Sweden', 'Australia']})
# Create an excel file with this data
df.to_excel('test.xlsx', index=False)
# Show dataframe
print(df)
Output
Region Country
0 NA Canada
1 NA USA
2 SA Brazil
3 EU Sweden
4 AU Australia
I have a dataset that is similar in this format:
CITY - YEAR - ATTRIBUTE - VALUE
## example:
dallas-2002-crime-100
dallas-2003-crime-101
dallas-2002-population-4000
houston-2002-population-4100
etc....
I'm trying to transpose this long to wide format so that each city+year value is a row and all the distinct combinations of attributes are the columns-names.
Thus this new dataframe would look like:
###
city - year - population - crime - median_income- etc....
I've looked at the pivot function, but it doesn't seem to support a multi-index for reshaping. Can someone let me know how to work around transposing? Additionally, I tried to look at
pd.pivot_table but it seems this typically only works with numerical data with sums,means, etc. Most of my VALUE attributes are actually strings, so I don't seem to be able to use this.
### doesn't work - can't use a multindex
df.pivot(index=['city','year'], columns = 'attribute', values='value')
Thank you for your help!
Is this what you are looking for:
import pandas as pd
from io import StringIO
data = """city-year-attribute-value
dallas-2002-crime-100
dallas-2003-crime-101
dallas-2002-population-4000
houston-2002-population-4100"""
df = pd.read_csv(StringIO(data), sep="-")
pivoted = df.pivot_table(
index=["city", "year"],
columns=["attribute"],
values=["value"]
)
print(pivoted.reset_index())
Result:
city year value
attribute crime population
0 dallas 2002 100.0 4000.0
1 dallas 2003 101.0 NaN
2 houston 2002 NaN 4100.0
This question already has answers here:
Pandas dataframe hide index functionality?
(8 answers)
Closed 3 years ago.
very simple question
I am reading an excel sheet with python and I want to print the results without the automatic index pandas adds
import pandas as pd
x=pd.read_excel(r'2_56_01.276295.xlsx',index_col=None)
print x[:3]
this prints the 1st 3 rows
blahblah Street Borough
0 55 W 192 ST Bronx
1 2514 EAST TREMONT AV Bronx
2 877 INTERVALE AV Bronx
but I do not want the index
print x.to_string(index=False)
should do the trick