I am trying to sort a Pandas Series in ascending order.
Top15['HighRenew'].sort_values(ascending=True)
Gives me:
Country
China 1
Russian Federation 1
Canada 1
Germany 1
Italy 1
Spain 1
Brazil 1
South Korea 2.27935
Iran 5.70772
Japan 10.2328
United Kingdom 10.6005
United States 11.571
Australia 11.8108
India 14.9691
France 17.0203
Name: HighRenew, dtype: object
The values are in ascending order.
However, when I then modify the series in the context of the dataframe:
Top15['HighRenew'] = Top15['HighRenew'].sort_values(ascending=True)
Top15['HighRenew']
Gives me:
Country
China 1
United States 11.571
Japan 10.2328
United Kingdom 10.6005
Russian Federation 1
Canada 1
Germany 1
India 14.9691
France 17.0203
South Korea 2.27935
Italy 1
Spain 1
Iran 5.70772
Australia 11.8108
Brazil 1
Name: HighRenew, dtype: object
Why this is giving me a different output to that above?
Would be grateful for any advice?
Top15['HighRenew'] = Top15['HighRenew'].sort_values(ascending=True).to_numpy()
or
Top15['HighRenew'] = Top15['HighRenew'].sort_values(ascending=True).reset_index(drop=True)
When you sort_values , the indexes don't change so it is aligning per the index!
Thank you to anky for providing me with this fantastic solution!
I am trying to read excel file cells having multi line text in it. I am using xlrd 1.2.0. But when I print or even write the text in cell to .txt file it doesn't preserve line breaks or tabs i.e \n or \t.
Input:
File URL:
Excel file
Code:
import xlrd
filenamedotxlsx = '16.xlsx'
gall_artists = xlrd.open_workbook(filenamedotxlsx)
sheet = gall_artists.sheet_by_index(0)
bio = sheet.cell_value(0,1)
print(bio)
Output:
"Biography 2018-2019 Manoeuvre Textiles Atelier, Gent, Belgium 2017-2018 Thalielab, Brussels, Belgium 2017 Laboratoires d'Aubervilliers, Paris 2014-2015 Galveston Artist Residency (GAR), Texas 2014 MACBA, Barcelona & L'appartment 22, Morocco - Residency 2013 International Residence Recollets, Paris 2007 Gulbenkian & RSA Residency, BBC Natural History Dept, UK 2004-2006 Delfina Studios, UK Studio Award, London 1998-2000 De Ateliers, Post-grad Residency, Amsterdam 1995-1998 BA (Hons) Textile Art, Winchester School of Art UK "
Expected Output:
1975 Born in Hangzhou, Zhejiang, China
1980 Started to learn Chinese ink painting
2000 BA, Major in Oil Painting, China Academy of Art, Hangzhou, China
Curator, Hangzhou group exhibition for 6 female artists Untitled, 2000 Present
2007 MA, New Media, China Academy of Art, Hangzhou, China, studied under Jiao Jian
Lecturer, Department of Art, Zhejiang University, Hangzhou, China
2015 PhD, Calligraphy, China Academy of Art, Hangzhou, China, studied under Wang Dongling
Jury, 25th National Photographic Art Exhibition, China Millennium Monument, Beijing, China
2016 Guest professor, Faculty of Humanities, Zhejiang University, Hangzhou, China
Associate professor, Research Centre of Modern Calligraphy, China Academy of Art, Hangzhou, China
Researcher, Lanting Calligraphy Commune, Zhejiang, China
2017 Christie's produced a video about Chu Chu's art
2018 Featured by Poetry Calligraphy Painting Quarterly No.2, Beijing, China
Present Vice Secretary, Lanting Calligraphy Society, Hangzhou, China
Vice President, Zhejiang Female Calligraphers Association, Hangzhou, China
I have also used repr() to see if there are \n characters or not, but there aren't any.
https://github.com/haosmark/jupyter_notebooks/blob/master/Coursera%20week%203%20assignment.ipynb
All the way at the bottom of the code, with question 3, I'm trying to average, round, and sort the data, however for some reason rounding and sorting isn't working at all
i = df.columns.get_loc('2006')
avgGDP = df[df.columns[i:]].copy()
avgGDP = avgGDP.mean(axis=1).round(2).sort_values(ascending=False)
avgGDP
what am I doing wrong here?
This is what df looks like before I apply average, round, and sort.
Your series is actually sorted, the first line being 1.5e+13 and the last one 4.4e+11:
Country
United States 1.536434e+13
China 6.348609e+12
Japan 5.542208e+12
Germany 3.493025e+12
France 2.681725e+12
United Kingdom 2.487907e+12
Brazil 2.189794e+12
Italy 2.120175e+12
India 1.769297e+12
Canada 1.660648e+12
Russian Federation 1.565460e+12
Spain 1.418078e+12
Australia 1.164043e+12
South Korea 1.106714e+12
Iran 4.441558e+11
Rounding doesn't do anything visible here because the smallest value is 4e+11, and rounding it to 2 decimal places doesn't show on this scale. If you want to keep only 2 decimal places in the scientific notation, you can use .map('{:0.2e}'.format), see my note below.
Note: just for fun, you could also calculate the same with a one-liner:
df.filter(regex='^2').mean(1).sort_values()[::-1].map('{:0.2e}'.format)
Output:
Country
United States 1.54e+13
China 6.35e+12
Japan 5.54e+12
Germany 3.49e+12
France 2.68e+12
United Kingdom 2.49e+12
Brazil 2.19e+12
Italy 2.12e+12
India 1.77e+12
Canada 1.66e+12
Russian Federation 1.57e+12
Spain 1.42e+12
Australia 1.16e+12
South Korea 1.11e+12
Iran 4.44e+11
I have a dataframe called wine that contains a bunch of rows I need to drop.
How do i drop all rows in column 'country' that are less than 1% of the whole?
Here are the proportions:
#proportion of wine countries in the data set
wine.country.value_counts() / len(wine.country)
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
New Zealand 0.009069
Israel 0.006133
Greece 0.004493
Canada 0.002526
Hungary 0.001755
Romania 0.001558
...
I got lazy and didn't include all of the results, but i think you catch my drift. I need to drop all rows with proportions less than .01
Here is the head of my dataframe:
country designation points price province taster_name variety year price_category
Portugal Avidagos 87 15.0 Douro Roger Voss Portuguese Red 2011.0 low
You can use something like this:
df = df[df.proportion >= .01]
From that dataset it should give you something like this:
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
figured it out
country_filter = wine.country.value_counts(normalize=True) > 0.01
country_index = country_filter[country_filter.values == True].index
wine = wine[wine.country.isin(list(country_index))]
I want to take specific tables from a CSV file and return a file for each table. I have something that looks like this:
France city population agriculture
France Paris 2000000 lots
France Nice 500000 some
England city population agriculture
England London 30000 none
England Glasgow 10000 some
and I want to return two files, one with
France city population agriculture
France Paris 2000000 lots
France Nice 500000 some
and the other with
England city population agriculture
England London 30000 none
England Glasgow 10000 some
how do I do this?
Here is a solution without using cvs module (can csv module separate tables?)
with open('table.txt') as f:
text = f.read()
tables = text.split('\n\n')
for itable,table in enumerate(tables):
fileout = 'table%2.2i.txt' % itable
with open(fileout,'w') as f:
f.write(table.strip())