I am trying to format my output file exactly like my input file. I was wondering if anyone could give me some pointers. My codes are:
input_file=open('abcd.txt','r')
f1=input('file name: ')
output_file=open(f1,'w')
for line in input_file:
output_file.write(line)
input_file.close()
output_file.close()
My input file looks like the following. Where country is 50 chars long, second category is 6 chars, third is 3 chars, fourt is 25 chars, and year is 4 chars long. The following is the inputfile.
Afghanistan WB_LI 68 Eastern Mediterranean 2012
Albania WB_LMI 90 Europe 1980
Albania WB_LMI 90 Europe 1981
The following is how my output file looks like:
Afghanistan WB_LI 68 Eastern Mediterranean 2012
Albania WB_LMI 90 Europe 1980
Albania WB_LMI 90 Europe 1981
Use string formatting, mainly {:-x} where x denotes the minimal length of the string (filled with whitespace) and - denotes to left-align the contents:
output_file.write('{:-50} {:-6} {:-3} {:-25} {:-4}\n'.format(country, category, third, fourth, year))
Related
I have the following dataframe:
import pandas as pd
fertilityRates = pd.read_csv('fertility_rate.csv')
fertilityRatesRowCount = len(fertilityRates.axes[0])
fertilityRates.head(fertilityRatesRowCount)
I have found a way to find the mean for each row over columns 1960-1969, but would like to do so without removing the column called "Country".
The following is what is outputted after I execute the following commands:
Mean1960To1970 = fertilityRates.iloc[:, 1:11].mean(axis=1)
Mean1960To1970
You can use pandas.DataFrame.loc to select a range of years (e.g "1960":"1968" means from 1960 to 1968).
Try this :
Mean1960To1968 = (
fertilityRates[["Country"]]
.assign(Mean= fertilityRates.loc[:, "1960":"1968"].mean(axis=1))
)
# Output :
print(Mean1960To1968)
Country Mean
0 _World 5.004444
1 Afghanistan 7.450000
2 Albania 5.913333
3 Algeria 7.635556
4 Angola 7.030000
5 Antigua and Barbuda 4.223333
6 Arab World 7.023333
7 Argentina 3.073333
8 Armenia 4.133333
9 Aruba 4.044444
10 Australia 3.167778
11 Austria 2.715556
Assuming I have the following toy dataframe, df:
Country Population Region HDI
China 100 Asia High
Canada 15 NAmerica V.High
Mexico 25 NAmerica Medium
Ethiopia 30 Africa Low
I would like to create new columns based on the population, region, and HDI of Ethiopia in a loop. I tried the following method, but it is time-consuming when a lot of columns are involved.
df['Population_2'] = df['Population'][df['Country'] == "Ethiopia"]
df['Region_2'] = df['Region'][df['Country'] == "Ethiopia"]
df['Population_2'].fillna(method='ffill')
My final DataFrame df should look like:
Country Population Region HDI Population_2 Region_2 HDI_2
China 100 Asia High 30 Africa Low
Canada 15 NAmerica V.High 30 Africa Low
Mexico 25 NAmerica Medium 30 Africa Low
Ethiopia 30 Africa Low 30 Africa Low
How about this?
for col in ['Population', 'Region', 'HDI']:
df[col + '_2'] = df.loc[df.Country=='Ethiopia', col].iat[0]
I don't quite understand the broader point of what you're trying to do, and if Ethiopia could have multiple values the solution might be different. But this works for the problem as you presented it.
You can use:
# select Ethiopia row and add suffix "_2" to the columns (except Country)
s = (df.drop(columns='Country')
.loc[df['Country'].eq('Ethiopia')].add_suffix('_2').squeeze()
)
# broadcast as new columns
df[s.index] = s
output:
Country Population Region HDI Population_2 Region_2 HDI_2
0 China 100 Asia High 30 Africa Low
1 Canada 15 NAmerica V.High 30 Africa Low
2 Mexico 25 NAmerica Medium 30 Africa Low
3 Ethiopia 30 Africa Low 30 Africa Low
You can use assign and also assuming that you have only row corresponding to Ethiopia:
d = dict(zip(df.columns.drop('Country').map('{}_2'.format),
df.set_index('Country').loc['Ethiopia']))
df = df.assign(**d)
print(df):
Country Population Region HDI Population_2 Region_2 HDI_2
0 China 100 Asia High 30 Africa Low
1 Canada 15 NAmerica V.High 30 Africa Low
2 Mexico 25 NAmerica Medium 30 Africa Low
3 Ethiopia 30 Africa Low 30 Africa Low
Can I have a two line caption in pandas dataframe?
Create dataframe with:
df = pd.DataFrame({'Name' : ['John','Harry','Gary','Richard','Anna','Richard','Gary','Richard'], 'Age' : [25,32,37,43,44,56,37,22], 'Zone' : ['East','West','North','South','East','West','North', 'South']})
df=df.drop_duplicates('Name',keep='first')
df.style.set_caption("Team Members Per Zone")
which outputs:
Team Members Per Zone
Name Age Zone
0 John 25 East
1 Harry 32 West
4 Anna 44 East
6 Gary 37 North
7 Richard 22 South
However I'd like it to look like:
Team Members
Per Zone
Name Age Zone
0 John 25 East
1 Harry 32 West
4 Anna 44 East
6 Gary 37 North
7 Richard 22 South
Using a break works for me in JupyterLab:
df.style.set_caption('This is line one <br> This is line two')
Have you tried with \n ? (Sorry too low reputation to just comment.
I have a dataframe called wine that contains a bunch of rows I need to drop.
How do i drop all rows in column 'country' that are less than 1% of the whole?
Here are the proportions:
#proportion of wine countries in the data set
wine.country.value_counts() / len(wine.country)
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
New Zealand 0.009069
Israel 0.006133
Greece 0.004493
Canada 0.002526
Hungary 0.001755
Romania 0.001558
...
I got lazy and didn't include all of the results, but i think you catch my drift. I need to drop all rows with proportions less than .01
Here is the head of my dataframe:
country designation points price province taster_name variety year price_category
Portugal Avidagos 87 15.0 Douro Roger Voss Portuguese Red 2011.0 low
You can use something like this:
df = df[df.proportion >= .01]
From that dataset it should give you something like this:
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
figured it out
country_filter = wine.country.value_counts(normalize=True) > 0.01
country_index = country_filter[country_filter.values == True].index
wine = wine[wine.country.isin(list(country_index))]
I am a new in python and is trying to read my excel file in spyder, anaconda. However, when I run it, some row is missing and replaced with '...'. I have seven columns and 100 rows in my excel file. The column arrangement also quite weird.
This is my code:
import pandas as pd
print(" Comparing within 100 Airline \n\n")
def view():
airlines = pd.ExcelFile('Airline_final.xlsx')
df1 = pd.read_excel("Airline_final.xlsx",sheet_name=2)
print("\n\n 1: list of all Airlines \n")
print(df1)
view()
Here is what I get:
18 #051 Cubana Cuba
19 #003 Aigle Azur France
20 #011 Air Corsica France
21 #012 Air France France
22 #019 Air Mediterranee France
23 #050 Corsair France
24 #072 HOP France
25 #087 Joon France
26 #006 Air Berlin Germany
27 #049 Condor Flugdienst Germany
28 #057 Eurowings Germany
29 #064 Germania Germany
.. ... ... ...
70 #018 Air Mandalay Myanmar
71 #020 Air KBZ Myanmar
72 #067 Golden Myanmar Airlines Myanmar
73 #017 Air Koryo North Korea
74 #080 Jetstar Asia Singapore
75 #036 Binter Canarias Spain
76 #040 Canaryfly Spain
77 #073 Iberia and Iberia Express Spain
To print the whole dataframe use:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df1)