Python matplotlib into barchart - python
I have 2 python file with 1 csv file.
I'm trying to accumulate All the Visitors from 2000 - 2009 from each countries and Select The Top 3 country as it will show up the barchart of the Top 3 Country
The Error I having is :
Traceback (most recent call last):
File "C:/ASP/pythonProjectDA_YODA/main.py", line 3, in <module>
countries=Countries("2000","2009","China","Japan")
File "C:\ASP\pythonProjectDA_YODA\countries.py", line 8, in __init__
dfVisitor.index=pd.to_datetime(dfVisitor.index)
File "C:\Users\65965\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\tools\datetimes.py", line 812, in to_datetime
result = convert_listlike(arg, format, name=arg.name)
File "C:\Users\65965\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\tools\datetimes.py", line 459, in _convert_listlike_datetimes
result, tz_parsed = objects_to_datetime64ns(
File "C:\Users\65965\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\arrays\datetimes.py", line 2044, in objects_to_datetime64ns
result, tz_parsed = tslib.array_to_datetime(
File "pandas\_libs\tslib.pyx", line 352, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslib.pyx", line 579, in pandas._libs.tslib.array_to_datetime
File "pandas\_libs\tslib.pyx", line 718, in pandas._libs.tslib.array_to_datetime_object
File "pandas\_libs\tslib.pyx", line 552, in pandas._libs.tslib.array_to_datetime
TypeError: <class 'tuple'> is not convertible to datetime
I have no idea what this means cus this is still first time for me to learn this.
The main.py file code is stated below :
**from countries import Countries
countries=Countries("ListedCountries.csv","2000","2009","China","Japan")
countries.top3()
countries.drawchart()**
Another Python file is stated below as well :
**import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
class Countries:
def __init__(self,syear,eyear,scountries,ecountries):
dfVisitor=pd.read_csv("ListedCountries.csv")
dfVisitor.index=pd.to_datetime(dfVisitor.index)
dfVisitor.columns=dfVisitor.colums.str.strip()
dfOther=dfVisitor.loc[syear:eyear, scountries:ecountries]
dfOtherTotal=dfOther.sum()
self.dfOtherTotalSorted=dfOtherTotal.sort_values(ascending=False)
print(self.dfOtherTotalSorted)
def top3(self):
value=self.dfOtherTotalSorted.to_dict()
c=1
print("Top 3 countries in the region over a span of 10 years")
for x, y in value.items():
if c<=3:
print(c,x,y)
c+=1 #c=c+1
if len(value)>0:
return True
else:
return False
def drawchart(self):
ps=self.dfOtherTotalSorted
index=np.arrange(len(ps.index))
plt.xlabel("No. of Visitors in 10 years visit Singapore",fontsize=10)
plt.ylabel((1000000, 2000000, 3000000, 4000000), fontsize=15)
plt.xticks(index,ps.index,fontsize=15, rotation=90)
plt.title("Total visitors from 2000 to 2009 in Singapore")
plt.bar(ps.index,ps.values)
plt.show()**
Lastly i have is the CSV.file :
**Japan,HongKong,China,Taiwan,Korea
2000 Jan,72,131,16,288,38,887,19,329,32,621
2000 Feb,71,245,28,265,45,148,30,528,28,932
2000 Mar,91,844,21,513,30,644,22,934,30,682
2000 Apr,60,540,29,308,36,511,27,737,27,237
2000 May,62,152,20,822,37,934,23,635,30,739
2000 Jun,67,977,22,011,30,706,26,582,25,318
2000 Jul,84,634,30,218,36,148,35,570,32,960
2000 Aug,101,785,31,963,41,162,30,732,34,877
2000 Sep,89,417,20,566,31,239,19,824,23,207
2000 Oct,73,383,21,512,35,195,17,685,28,185
2000 Nov,80,889,21,326,32,999,17,034,31,169
2000 Dec,73,898,22,183,37,762,19,314,28,426
2001 Jan,65,381,27,778,56,460,20,418,32,727
2001 Feb,72,335,18,442,36,157,20,078,32,777
2001 Mar,85,655,27,025,30,320,16,438,32,441
2001 Apr,58,348,25,816,37,542,19,756,30,150
2001 May,58,984,19,806,41,999,16,381,28,842
2001 Jun,64,582,23,752,31,882,19,445,26,914
2001 Jul,76,373,24,929,45,570,25,185,34,830
2001 Aug,92,508,28,515,51,208,21,981,35,899
2001 Sep,69,850,20,024,34,386,15,218,23,526
2001 Oct,35,970,19,363,42,586,14,259,24,125
2001 Nov,32,294,18,583,41,208,15,219,29,452
2001 Dec,43,483,22,124,48,080,17,709,27,400
2002 Jan,47,447,16,630,50,303,18,995,38,613
2002 Feb,49,583,26,760,81,649,21,463,30,745
2002 Mar,68,549,24,043,42,728,16,038,38,393
2002 Apr,49,149,21,771,63,880,17,554,32,704
2002 May,50,563,23,490,56,486,16,570,27,807
2002 Jun,54,892,22,965,41,186,17,251,27,519
2002 Jul,66,566,26,488,51,147,25,238,32,353
2002 Aug,85,655,26,513,62,699,22,147,39,236
2002 Sep,77,884,18,914,47,217,13,553,21,472
2002 Oct,58,489,21,025,57,693,15,730,28,827
2002 Nov,54,294,17,425,56,422,11,981,28,758
2002 Dec,60,338,19,941,58,683,12,796,24,591
2003 Jan,53,131,17,336,62,454,15,826,34,976
2003 Feb,50,469,24,563,89,704,17,940,32,707
2003 Mar,54,497,16,460,54,063,11,498,25,186
2003 Apr,12,501,4,808,23,002,2,531,2,890
2003 May,7,056,5,510,3,994,1,283,2,552
2003 Jun,14,051,16,426,8,405,5,412,8,477
2003 Jul,28,636,29,541,20,989,18,298,25,714
2003 Aug,43,016,34,391,52,847,19,466,30,591
2003 Sep,47,623,17,839,57,716,13,190,20,942
2003 Oct,38,418,19,234,56,700,14,982,24,175
2003 Nov,37,630,18,368,67,541,12,271,29,059
2003 Dec,47,021,21,778,71,068,12,233,24,125
2004 Jan,39,191,22,763,79,717,17,014,30,255
2004 Feb,43,760,17,189,50,903,13,918,29,835
2004 Mar,53,022,18,564,53,481,13,060,25,853
2004 Apr,38,801,24,158,75,068,13,484,26,713
2004 May,43,714,23,922,70,021,13,963,31,482
2004 Jun,44,112,21,679,63,014,15,181,29,912
2004 Jul,56,066,27,380,92,649,21,955,35,568
2004 Aug,66,617,30,887,90,212,19,708,38,602
2004 Sep,62,264,19,562,62,134,13,542,25,956
2004 Oct,51,340,21,884,70,449,13,840,26,936
2004 Nov,48,066,19,317,88,223,12,747,31,623
2004 Dec,51,858,24,381,84,369,14,030,28,344
2005 Jan,48,004,17,457,45,801,16,774,20,386
2005 Feb,40,310,28,713,61,601,19,104,24,531
2005 Mar,52,225,31,089,52,249,15,669,23,476
2005 Apr,41,599,23,614,68,775,16,345,28,923
2005 May,43,968,25,187,62,872,16,019,28,927
2005 Jun,43,020,23,843,61,150,16,710,32,366
2005 Jul,49,791,35,295,93,889,27,702,42,961
2005 Aug,61,522,38,649,101,134,22,950,42,791
2005 Sep,57,085,23,649,67,061,15,670,25,572
2005 Oct,49,532,22,996,74,501,17,754,30,060
2005 Nov,50,402,20,552,88,704,14,094,31,277
2005 Dec,50,994,22,770,79,945,15,123,32,803
2006 Jan,45,402,23,587,81,734,19,898,40,604
2006 Feb,44,695,22,743,96,562,17,723,40,835
2006 Mar,62,353,21,726,91,092,16,227,36,144
2006 Apr,41,269,28,836,97,423,17,657,31,780
2006 May,42,907,24,008,78,594,15,410,34,236
2006 Jun,43,153,23,998,71,213,17,393,36,327
2006 Jul,52,407,28,265,113,127,27,109,45,685
2006 Aug,62,970,30,672,103,459,23,438,44,846
2006 Sep,51,284,20,463,63,550,15,350,29,315
2006 Oct,47,552,21,801,70,690,17,087,35,025
2006 Nov,52,047,22,845,88,343,15,953,43,791
2006 Dec,48,367,22,530,81,414,16,218,36,134
2007 Jan,49,959,19,559,76,116,17,156,46,756
2007 Feb,46,920,26,025,111,934,23,307,31,464
2007 Mar,58,843,22,361,79,239,16,091,42,071
2007 Apr,37,962,29,338,99,136,15,343,32,219
2007 May,38,813,25,261,85,198,14,952,34,408
2007 Jun,41,289,25,551,77,239,16,868,38,027
2007 Jul,49,234,31,990,108,881,24,849,46,123
2007 Aug,58,288,32,177,114,463,21,028,45,910
2007 Sep,54,186,22,902,76,181,16,276,30,265
2007 Oct,51,825,23,224,83,831,14,426,33,383
2007 Nov,53,784,22,638,103,906,13,742,43,965
2007 Dec,53,411,21,084,97,832,14,118,39,701
2008 Jan,52,973,19,817,108,486,16,342,50,432
2008 Feb,47,449,27,263,121,031,17,829,40,998
2008 Mar,57,364,27,600,98,180,13,778,39,683
2008 Apr,36,301,20,232,107,639,13,944,33,946
2008 May,42,382,22,867,86,785,14,276,36,412
2008 Jun,40,879,23,055,70,565,13,146,35,998
2008 Jul,47,659,28,218,105,528,19,398,39,614
2008 Aug,53,699,26,847,91,325,16,923,42,338
2008 Sep,48,771,20,765,66,582,12,303,25,914
2008 Oct,47,736,20,016,76,836,13,503,30,930
2008 Nov,48,225,19,197,79,096,12,535,24,445
2008 Dec,47,602,22,238,66,689,11,947,22,308
2009 Jan,38,382,23,399,105,144,15,986,25,516
2009 Feb,42,807,19,720,80,037,12,744,27,387
2009 Mar,46,797,21,290,91,275,12,542,20,759
2009 Apr,31,633,28,587,86,525,11,970,21,323
2009 May,29,800,21,529,52,058,11,602,21,854
2009 Jun,28,060,21,703,41,650,11,486,20,991
2009 Jul,46,633,33,382,72,326,17,351,30,389
2009 Aug,50,698,36,218,86,530,17,490,32,332
2009 Sep,52,561,21,509,59,588,10,443,15,714
2009 Oct,43,247,25,512,83,273,13,208,16,069
2009 Nov,38,949,20,525,90,358,11,900,19,620
2009 Dec,40,420,21,046,87,983,10,039,20,033
112,551,37,334,126,870,29,368,52,654**
This code works, after you eliminate the embedded commas in the CSV file. One other problem is you were trying to print "China":"Japan", but your columns were not in that order. It needed to be "Japan":"China". You also had several spelling errors (colums, arrange).
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
class Countries(object):
def __init__(self,filename,syear,eyear,scountries,ecountries):
dfVisitor=pd.read_csv(filename)
dfVisitor = dfVisitor.set_index('Date')
dfVisitor.index=pd.to_datetime(dfVisitor.index)
dfOther=dfVisitor.loc[syear:eyear, scountries:ecountries]
dfOtherTotal=dfOther.sum()
self.dfOtherTotalSorted=dfOtherTotal.sort_values(ascending=False)
print(self.dfOtherTotalSorted)
def top3(self):
value=self.dfOtherTotalSorted.to_dict()
print("Top 3 countries in the region over a span of 10 years")
for x, y in list(value.items())[:3]:
print(x,y)
def drawchart(self):
ps=self.dfOtherTotalSorted
index=np.arange(len(ps.index))
plt.xlabel("No. of Visitors in 10 years visit Singapore",fontsize=10)
plt.ylabel((1000000, 2000000, 3000000, 4000000), fontsize=15)
plt.xticks(index,ps.index,fontsize=15, rotation=90)
plt.title("Total visitors from 2000 to 2009 in Singapore")
plt.bar(ps.index,ps.values)
plt.show()
countries=Countries("ListedCountries.csv",pd.to_datetime("2000-01-01"),pd.to_datetime("2009-12-31"),"Japan","China")
countries.top3()
countries.drawchart()
Related
calculate bad month from the given csv
I tried finding the five worst months from the data but I'm not sure about the process as I'm very confused . The answer should be something like (June 2001, July 2002 )but when I tried to solve it my answer wasn't as expected. Only the data of January was sorted. This the the way I tried solving my question and the csv data file is also provided on the screenshot. My solution is given below: PATH = "tourist_arrival.csv" df = pd.read_csv(PATH) print(df.sort_values(by=['Jan.','Feb.','Mar.','Apr.','May.','Jun.','Jul.','Aug.','Sep.','Oct.','Nov.','Dec.'],ascending=False)) Year ,Jan.,Feb.,Mar.,Apr.,May.,Jun.,Jul.,Aug.,Sep.,Oct.,Nov.,Dec.,Total 1992, 17451,27489,31505,30682,29089,22469,20942,27338,24839,42647,32341,27561,334353 1993 ,19238,23931,30818,20121,20585,19602,13588,21583,23939,42242,30378,27542,293567 1994, 21735,24872,31586,27292,26232,22907,19739,27610,27959,39393,28008,29198,326531 1995 ,22207,28240,34219,33994,27843,25650,23980,27686,30569,46845,35782,26380,363395 1996 ,27886,29676,39336,36331,29728,26749,22684,29080,32181,47314,37650,34998,393613 1997,25585,32861,43177,35229,33456,26367,26091,35549,31981,56272,40173,35116,421857 1998,28822,37956,41338,41087,35814,29181,27895,36174,39664,62487,47403,35863,463684 1999,29752,38134,46218,40774,42712,31049,27193,38449,44117,66543,48865,37698,491504 2000,25307,38959,44944,43635,28363,26933,24480,34670,43523,59195,52993,40644,463646 2001,30454,38680,46709,39083,28345,13030,18329,25322,31170,41245,30282,18588,361237 2002,17176,20668,28815,21253,19887,17218,16621,21093,23752,35272,28723,24990,275468 2003,21215,24349,27737,25851,22704,20351,22661,27568,28724,45459,38398,33115,338132 2004,30988,35631,44290,33514,26802,19793,24860,33162,25496,43373,36381,31007,385297 2005,25477,20338,29875,23414,25541,22608,23996,36910,36066,51498,41505,38170,375398 2006,28769,25728,36873,21983,22870,26210,25183,33150,33362,49670,44119,36009,383926 2007,33192,39934,54722,40942,35854,31316,35437,44683,45552,70644,52273,42156,526705 2008,36913,46675,58735,38475,30410,24349,25427,40011,41622,66421,52399,38840,500277 2009,29278,40617,49567,43337,30037,31749,30432,44174,42771,72522,54423,41049,509956 2010,33645,49264,63058,45509,32542,33263,38991,54672,54848,79130,67537,50408,602867 2011,42622,56339,67565,59751,46202,46115,42661,71398,63033,96996,83460,60073,736215 2012,52501,66459,89151,69796,50317,53630,49995,71964,66383,86379,83173,63344,803092 2013,47846,67264,88697,65152,52834,54599,54011,68478,66755,99426,75485,57069,797616
melt your DataFrame and then sort_values: output = df.melt("Year", df.drop(["Year", "Total"], axis=1).columns, var_name="Month").sort_values("value").reset_index(drop=True) >>> output Year Month value 0 2001 Jun. 13030 1 1993 Jul. 13588 2 2002 Jul. 16621 3 2002 Jan. 17176 4 2002 Jun. 17218 .. ... ... ... 259 2012 Oct. 86379 260 2013 Mar. 88697 261 2012 Mar. 89151 262 2011 Oct. 96996 263 2013 Oct. 99426 [264 rows x 3 columns] For just the 5 worst months, you can do: >>> output.iloc[:5] Year Month value 0 2001 Jun. 13030 1 1993 Jul. 13588 2 2002 Jul. 16621 3 2002 Jan. 17176 4 2002 Jun. 17218
Summarize values in panda data frames
I want to calculate the maximum value for each year and show the sector and that value. For example, from the screenshot, I would like to display: 2010: Telecom 781 2011: Tech 973 I have tried using: df.groupby(['Year', 'Sector'])['Revenue'].max() but this does not give me the name of Sector which has the highest value.
Try using idxmax and loc: df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()] MVCE: import pandas as pd import numpy as np np.random.seed(123) df = pd.DataFrame({'Sector':['Telecom','Tech','Financial Service','Construction','Heath Care']*3, 'Year':[2010,2011,2012,2013,2014]*3, 'Revenue':np.random.randint(101,999,15)}) df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()] Output: Sector Year Revenue 3 Construction 2013 423 12 Financial Service 2012 838 9 Heath Care 2014 224 1 Tech 2011 466 5 Telecom 2010 843
Also .sort_values + .tail, grouping on just year. Data from #Scott Boston df.sort_values('Revenue').groupby('Year').tail(1) Output: Sector Year Revenue 9 Heath Care 2014 224 3 Construction 2013 423 1 Tech 2011 466 12 Financial Service 2012 838 5 Telecom 2010 843
Transpose subset of pandas dataframe into multi-indexed data frame
I have the following dataframe: df.head(14) I'd like to transpose just the yr and the ['WA_','BA_','IA_','AA_','NA_','TOM_'] variables by Label. The resulting dataframe should then be a Multi-indexed frame with Label and the WA_, BA_, etc. and the columns names will be 2010, 2011, etc. I've tried, transpose(), groubby(), pivot_table(), long_to_wide(), and before I roll my own nested loop going line by line through this df I thought I'd ping the community. Something like this by every Label group: I feel like the answer is in one of those functions but I'm just missing it. Thanks for your help!
From what I can tell by your illustrated screenshots, you want WA_, BA_ etc as rows and yr as columns, with Label remaining as a row index. If so, consider stack() and unstack(): # sample data labels = ["Albany County","Big Horn County"] n_per_label = 7 n_rows = n_per_label * len(labels) years = np.arange(2010, 2017) min_val = 10000 max_val = 40000 data = {"Label": sorted(np.array(labels * n_per_label)), "WA_": np.random.randint(min_val, max_val, n_rows), "BA_": np.random.randint(min_val, max_val, n_rows), "IA_": np.random.randint(min_val, max_val, n_rows), "AA_": np.random.randint(min_val, max_val, n_rows), "NA_": np.random.randint(min_val, max_val, n_rows), "TOM_": np.random.randint(min_val, max_val, n_rows), "yr":np.append(years,years) } df = pd.DataFrame(data) AA_ BA_ IA_ NA_ TOM_ WA_ Label yr 0 27757 23138 10476 20047 34015 12457 Albany County 2010 1 37135 30525 12296 22809 27235 29045 Albany County 2011 2 11017 16448 17955 33310 11956 19070 Albany County 2012 3 24406 21758 15538 32746 38139 39553 Albany County 2013 4 29874 33105 23106 30216 30176 13380 Albany County 2014 5 24409 27454 14510 34497 10326 29278 Albany County 2015 6 31787 11301 39259 12081 31513 13820 Albany County 2016 7 17119 20961 21526 37450 14937 11516 Big Horn County 2010 8 13663 33901 12420 27700 30409 26235 Big Horn County 2011 9 37861 39864 29512 24270 15853 29813 Big Horn County 2012 10 29095 27760 12304 29987 31481 39632 Big Horn County 2013 11 26966 39095 39031 26582 22851 18194 Big Horn County 2014 12 28216 33354 35498 23514 23879 17983 Big Horn County 2015 13 25440 28405 23847 26475 20780 29692 Big Horn County 2016 Now set Label and yr as indices. df.set_index(["Label","yr"], inplace=True) From here, unstack() will pivot the inner-most index to columns. Then, stack() can swing our value columns down into rows. df.unstack().stack(level=0) yr 2010 2011 2012 2013 2014 2015 2016 Label Albany County AA_ 27757 37135 11017 24406 29874 24409 31787 BA_ 23138 30525 16448 21758 33105 27454 11301 IA_ 10476 12296 17955 15538 23106 14510 39259 NA_ 20047 22809 33310 32746 30216 34497 12081 TOM_ 34015 27235 11956 38139 30176 10326 31513 WA_ 12457 29045 19070 39553 13380 29278 13820 Big Horn County AA_ 17119 13663 37861 29095 26966 28216 25440 BA_ 20961 33901 39864 27760 39095 33354 28405 IA_ 21526 12420 29512 12304 39031 35498 23847 NA_ 37450 27700 24270 29987 26582 23514 26475 TOM_ 14937 30409 15853 31481 22851 23879 20780 WA_ 11516 26235 29813 39632 18194 17983 29692
Pandas Split Column String and Plot unique values
I have a dataframe Df that looks like this: Country Year 0 Australia, USA 2015 1 USA, Hong Kong, UK 1982 2 USA 2012 3 USA 1994 4 USA, France 2013 5 Japan 1988 6 Japan 1997 7 USA 2013 8 Mexico 2000 9 USA, UK 2005 10 USA 2012 11 USA, UK 2014 12 USA 1980 13 USA 1992 14 USA 1997 15 USA 2003 16 USA 2004 17 USA 2007 18 USA, Germany 2009 19 Japan 2006 20 Japan 1995 I want to make a bar chart for the Country column, if i try this Df.Country.value_counts().plot(kind='bar') I get this plot which is incorrect because it doesn't separate the countries. My goal is to obtain a bar chart that plots the count of each country in the column, but to achieve that, first i have to somehow split the string in each row (if needed) and then plot the data. I know i can use Df.Country.str.split(', ') to split the strings, but if i do this i can't plot the data. Anyone has an idea how to solve this problem?
You could use the vectorized Series.str.split method to split the Countrys: In [163]: df['Country'].str.split(r',\s+', expand=True) Out[163]: 0 1 2 0 Australia USA None 1 USA Hong Kong UK 2 USA None None 3 USA None None 4 USA France None ... If you stack this DataFrame to move all the values into a single column, then you can apply value_counts and plot as before: import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame( {'Country': ['Australia, USA', 'USA, Hong Kong, UK', 'USA', 'USA', 'USA, France', 'Japan', 'Japan', 'USA', 'Mexico', 'USA, UK', 'USA', 'USA, UK', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA, Germany', 'Japan', 'Japan'], 'Year': [2015, 1982, 2012, 1994, 2013, 1988, 1997, 2013, 2000, 2005, 2012, 2014, 1980, 1992, 1997, 2003, 2004, 2007, 2009, 2006, 1995]}) counts = df['Country'].str.split(r',\s+', expand=True).stack().value_counts() counts.plot(kind='bar') plt.show()
from collections import Counter c = pd.Series(Counter(df.Country.str.split(',').sum())) >>> c.plot(kind='bar', title='Country Count')
new_df = pd.concat([Series(row['Year'], row['Country'].split(',')) for _, row in DF.iterrows()]).reset_index() (DF is your old DF). this will give you one data point for each country name. Hope this helps. Cheers!
how to format output file in python?
I am trying to format my output file exactly like my input file. I was wondering if anyone could give me some pointers. My codes are: input_file=open('abcd.txt','r') f1=input('file name: ') output_file=open(f1,'w') for line in input_file: output_file.write(line) input_file.close() output_file.close() My input file looks like the following. Where country is 50 chars long, second category is 6 chars, third is 3 chars, fourt is 25 chars, and year is 4 chars long. The following is the inputfile. Afghanistan WB_LI 68 Eastern Mediterranean 2012 Albania WB_LMI 90 Europe 1980 Albania WB_LMI 90 Europe 1981 The following is how my output file looks like: Afghanistan WB_LI 68 Eastern Mediterranean 2012 Albania WB_LMI 90 Europe 1980 Albania WB_LMI 90 Europe 1981
Use string formatting, mainly {:-x} where x denotes the minimal length of the string (filled with whitespace) and - denotes to left-align the contents: output_file.write('{:-50} {:-6} {:-3} {:-25} {:-4}\n'.format(country, category, third, fourth, year))