Python: Looping over list of countries for holidays - python

I am fairly new to Python. I am leveraging Python's holiday package which has public holidays by country. In order to get a country's holiday, you can run something like:
sorted(holidays.US(years=np.arange(2014,2030,1)).items()
This will give the date and holiday. Now, I want the data against a bunch of countries. How do I loop over the list of countries instead of replacing the country name in the above code every single time?
the countries under consideration here are:
[FRA, Norway, Finland, US, Germany, UnitedKingdom, Sweden]
I tried a for loop like this:
countrylistLoop = ['FRA', 'Norway', 'Finland', 'US', 'Germany', 'UnitedKingdom', 'Sweden']
for i in countrylistLoop:
print(sorted(holidays.i(years=np.arange(2014,2030,1)).items()),columns=['Date','Holiday'])
This throws an AttributeError:
AttributeError: module 'holidays' has no attribute 'i'.
This makes sense but I am not sure how to proceed!
Ideally, I would like to loop over and store the results in a dataframe. Any help is highly appreciated! Thank you!

You could get the items the following way
import holidays
countrylistLoop = ['FRA', 'Norway', 'Finland', 'US', 'Germany', 'UnitedKingdom', 'Sweden']
for country in countrylistLoop:
hd = sorted(holidays.CountryHoliday(country, years=np.arange(2014,2030,1)).items())
But it doesn't take the columns argument for sorted.
or you can sort items based on the index
hd = sorted(list(holidays.CountryHoliday(country,
years=np.arange(2014,2030,1)).items()),
key=lambda holiday: holiday[1])

To provide an additional country identifier you can do this:
all_holidays = []
country_list = ['UnitedStates', 'India', 'Germany']
for country in country_list:
for holiday in holidays.CountryHoliday(country, years = np.arange(2018,2021,1)).items():
all_holidays.append({'date' : holiday[0], 'holiday' : holiday[1], 'country': country})
all_holidays = pd.DataFrame(all_holidays)
all_holidays
The result will be:
enter image description here

for i in countrylistLoop:
holiday = getattr(holidays, i)(years=np.arange(2014,2030,1)).items()
sorted(holiday)
print(holiday)
To get an attribute dynamically, use getattr.
Otherwise, I split the sorted function out because it returns None, as all mutating builtins for python do.

Related

Write a function that filters a dataset for rows that contains all of the words in a list of words

I want to get a sub-dataframe that contains all elements in a list.
Let's take the DataFrame as an example.
my_dict = {
'Job': ['Painting', 'Capentry', 'Teacher', 'Farming'],
'Job_Detail': ['all sort of painting',
'kitchen utensils, all types of roofing etc.',\
'skill and practical oriented teaching',\
'all agricultural practices']
}
df = pd.DataFrame(my_dict)
Output looks thus:
Job Job_Detail
0 Painting all sort of painting
1 Capentry kitchen utensils, all types of roofing etc.
2 Teacher skill and practical oriented teaching
3 Farming all agricultural practices
my_lst = ['of', 'all']
I want to filter df with mylst to get a sub_DataFrame that looks like this:
Job Job_Detail
0 Painting all sort of painting
1 Capentry kitchen utensils, all types of roofing etc.
I've tried df[df.Job_Detail.isin(['of', 'all']) but it returns an empty DataFrame.
I'm no pandas expert, but the best function to use here seems to be str.contains
From the docs:
Series.str.contains(pat, case=True, flags=0, na=None, regex=True)
Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
Edit: This masks using or, not and
import pandas as pd
my_dict = {
'Job': ['Painting', 'Capentry', 'Teacher', 'Farming'],
'Job_Detail': ['all sort of painting',
'kitchen utensils, all types of roofing etc.',
'skill and practical oriented teaching',
'all agricultural practices']
}
my_lst = ['of', 'all']
df = pd.DataFrame(my_dict)
print(df)
mask = df.Job_Detail.str.contains('|'.join(my_lst), regex=True)
print(df[mask])
Here's a solution that masks uing and:
import pandas as pd
my_dict = {
'Job': ['Painting', 'Capentry', 'Teacher', 'Farming'],
'Job_Detail': ['all sort of painting',
'kitchen utensils, all types of roofing etc.',
'skill and practical oriented teaching',
'all agricultural practices']
}
my_lst = ['of', 'all']
df = pd.DataFrame(my_dict)
print(df)
print("------")
masks = [df.Job_Detail.str.contains(word) for word in my_lst]
mask = pd.concat(masks, axis=1).all(axis=1)
print(df[mask])
#Lone Your code answered a different question, but it helped me arrive at the answer. Thank you, appreciated.
Here's the closest to what I needed:
df[(df.Job_Detail.str.contains('of')) & (df.Job_Detail.str.contains('all'))]

Get list items that don't match most frequent value

I have a list of dictionaries looking like this:
[{'customer': 'Charles', 'city': 'Paris'}, {'customer': 'John', 'city': 'New York'}, {'customer': 'Jean', 'city': 'Paris'}]
I tried something using collections which will return me the name of the most common city in this list of dictionaries:
city_counts = Counter(c['city'] for c in customers)
return city_counts .most_common(1)[0][0]
From this, I would like to return a list of all customers who are not in this city.
So, if I take the list I gave above, ['John'] should be the output.
I there a best way to do it ?

Add same value to multiple sets of rows. The value changes based on condition

I have a dataframe that is dynamically created.
I create my first set of rows as:
df['tourist_spots'] = pd.Series(<A list of tourist spots in a city>)
To this df I add:
df['city'] = <City Name>
So far so good. A bunch of rows are created with the same city name for multiple tourist spots.
I want to add a new city. So I do:
df['tourist_spots'].append(pd.Series(<new data>))
Now, when I append a new city with:
df['city'].append('new city')
the previously updated city data is gone. It is as if every time the rows are replaced and not appended.
Here's an example of what I want:
Step 1:
df['tourist_spot'] = pd.Series('Golden State Bridge' + a bunch of other spots)
For all the rows created by the above data I want:
df['city'] = 'San Francisco'
Step 2:
df['tourist_spot'].append(pd.Series('Times Square' + a bunch of other spots)
For all the rows created by the above data, I want:
df['city'] = 'New York'
How can I achieve this?
Use dictionary to add rows to your data frame, it is faster method.
Here is an e.g.
STEP 1
Create dictionary:
dict_df = [{'tourist_spots': 'Jones LLC', 'City': 'Boston'},
{'tourist_spots': 'Alpha Co', 'City': 'Boston'},
{'tourist_spots': 'Blue Inc', 'City': 'Singapore' }]
STEP2
Convert dictionary to dataframe:
df = pd.DataFrame(dict_df)
STEP3
Add new entries to dataframe in dictionary format:
df = df.append({'tourist_spots': 'New_Blue', 'City': 'Singapore'}, ignore_index=True)
References:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html

How do I use mapping of dictionary for value correction?

I have a pandas series whose unique values are something like:
['toyota', 'toyouta', 'vokswagen', 'volkswagen,' 'vw', 'volvo']
Now I want to fix some of these values like:
toyouta -> toyota
(Note that not all values have mistakes such as volvo, toyota etc)
I've tried making a dictionary where key is the correct word and value is the word to be corrected and then map that onto my series.
This is how my code looks:
corrections = {'maxda': 'mazda', 'porcshce': 'porsche', 'toyota': 'toyouta', 'vokswagen': 'vw', 'volkswagen': 'vw'}
df.brands = df.brands.map(corrections)
print(df.brands.unique())
>>> [nan, 'mazda', 'porsche', 'toyouta', 'vw']
As you can see the problem is that this way, all values not present in the dictionary are automatically converted to nan. One solution is to map all the correct values to themselves, but I was hoping there could be a better way to go about this.
Use:
df.brands = df.brands.map(corrections).fillna(df.brands)
Or:
df.brands = df.brands.map(lambda x: corrections.get(x, x))
Or:
df.brands = df.brands.replace(corrections)

AttributeError: 'list' object has no attribute 'keys' when attempting to create DataFrame from list of dicts

Thanks a ton for any help,
I have a list of dictionaries that I need to put in a data frame. I know the normal method in pandas is
final_df=pd.DataFrame.from_records(Mixed_and_Poured[0], index='year')
where Mixed_and_poured is a list containing another list that actually holds the dictionaries
print Mixed_and_Poured
[[{'Country': 'Brazil', u'Internet users': '2.9', 'Year': '2000'}, {'Country': 'Brazil', u'Internet users': '21', 'Year': '2005'}, {'Country': 'Brazil', u'Internet users': '40.7', 'Year': '2010'}, {'Country': 'Brazil', u'Internet users': '45', 'Year': '2011'},
I could swear
final_df=pd.DataFrame.from_records(Mixed_and_Poured[0], index='year')
was just working!! but when I ran it today it throws
AttributeError: 'list' object has no attribute 'keys'
Why is it looking for keys in this list now?
So turns out I wasnt actually operating on a list of just dictionaries, there was a little bastard list hiding at the end there.
Sorry ya'll!
I can't reproduce your error with the data given, I get a KeyError.
But why even use from_records?
pd.DataFrame(Mixed_and_Poured[0]).set_index('Year')
Out:
Country Internet users
Year
2000 Brazil 2.9
2005 Brazil 21
2010 Brazil 40.7
2011 Brazil 45
I had this issue as well when a couple of items from a list were unavailable (None). The list was quite large so I didn't notice at first. Easiest quick-fix I used was to make a new list with just the items was None:
list1 = [2,3,4, None, 2]
list1 = [item for item in list1 if item != None]
list1
[2, 3, 4, 2]

Categories