Plot a histogram of text values - python

Let's say I have a list of text values (i.e., names), and I want to plot an histogram of those values, with the xticks labeled with those names.
import matplotlib.pyplot as plt
listofnames = ['Al', 'Ca', 'Re', 'Ma', 'Al', 'Ma', 'Ma', 'Re', 'Ca']
a,b,c = plt.hist(listofnames)
First of all, this code gives an error
TypeError: cannot perform reduce with flexible type
which I don't have on my complete program (with a list of >2k names, with no more than 12 different names). I haven't been able to see why this simple example list gives an error while the complete one doesn't.
But the actual point is: I can do the histogram, but the bins are not labeled with the names. How could I do that?
Thanks

Use the xticks function:
plt.xticks( arange(5), ('Tom', 'Dick', 'Harry', 'Sally', 'Sue') )
Complete example (by the way, your code doesn't work for me, but instead of your error, I get TypeError: len() of unsized object, so I'm histogramming manually here):
import matplotlib.pyplot as plt
listofnames = ['Al', 'Ca', 'Re', 'Ma', 'Al', 'Ma', 'Ma', 'Re', 'Ca']
import collections
x = collections.Counter(listofnames)
l = range(len(x.keys()))
plt.bar(l, x.values(), align='center')
plt.xticks(l, x.keys())

Related

Python plotting top five values from dictionary- Got top five values but it's not running correctly

I've pieced together code that gets me the top five values from a dictionary and I want to make a bar graph. However, when I try to run the following code, it spits out an error. I thought I was good to go, but am not sure how to proceed on this. Any advice? Do I need to convert list into a dictionary?
import matplotlib.pyplot as plt
from collections import Counter
stateInfo={
'Alabama': 4887681,
'Alaska': 735139,
'Arizona': 7158024,
'Arkansas':3009733,
'California': 39461588,
'Colorado': 5691287,
'Connecticut': 3571520,
'Delaware': 965479,
'Florida': 21244317,
'Georgia': 10511131,
'Hawaii': 1420593
}
c = Counter(stateInfo)
most_common = c.most_common(5)
plt.bar(range(len(most_common)), list(most_common.values()), align='center')
plt.xticks(range(len(most_common)), list(most_common.keys()))
plt.show()
AttributeError: 'list' object has no attribute 'values'

How do I add values to a list stored as a dictionary value?

I have an empty dictionary, and need to pull industry info base on ticker symbols. I would then like to add all tickers under the same industry in a list with the industry as the key.
For example, the end would look something like the below:
{'technology': ['AAPL', 'ADBE'], 'Consumer Cyclical': ['TSLA', 'UA']}
Here is what I've been working on with no success:
import yfinance as yf
tickers = ['AAPL', 'ADBE', 'AMD', 'AMAT', 'AMZN', 'ANF',
'APA', 'BA', 'BABA', 'BBY', 'BIDU', 'BMY', 'BRX', 'BZUN',
'C', 'CAT', 'CLF', 'CMCSA', 'CMG', 'COST', 'CRM', 'CVX',
'DE', 'EBAY', 'FB', 'FCX', 'FDX', 'FSLR',
'GILD', 'GM', 'GME', 'GOOG','GPRO', 'GS', 'HAL', 'HD',
'HIG', 'HON', 'IBM', 'JCPB', 'JD', 'JPM', 'LULU', 'LYG',
'MA', 'MCD', 'MDT', 'MS', 'MSFT','MU', 'NEM', 'NFLX',
'NKE','PBR', 'QCOM', 'SLB', 'SNAP', 'SPG', 'TSLA', 'TWTR',
'TXN', 'UA', 'UAL', 'V', 'VZ' 'X', 'XLNX', 'ZM']
sector_dict = dict()
for ticker in tickers:
try:
sector = yf.Ticker(ticker).info['sector']
sector_dict[sector].update(ticker)
except:
sector_dict.update({'no sector':[ticker]})
The below just gives me an empty dictionary. Does anybody see where the issue is?
Assuming the information you need is returned from the API call - the code below may work for you.
import yfinance as yf
from collections import defaultdict
tickers = ['AAPL','ADBE']
sector_dict = defaultdict(list)
for ticker in tickers:
try:
sector_dict[yf.Ticker(ticker).info['sector']].append(ticker)
except Exception as e:
print(f'Failed to get ticker info for {ticker}')
print(sector_dict)
output
defaultdict(<class 'list'>, {'Technology': ['AAPL', 'ADBE']})
You should always avoid catch-all exceptions.
Your original example was masking the fact that update isn't a list method.
When you subscript a python dictionary like sector_dict[ticker], we're now talking about the value associated with the ticker key. In this case a list.
Also update isn't used like that, so I think it was masking a second error. It's usage is to update a dictionary with another dictionary or an iterable. Not to update an existing entry.
Finally, the try clause should be as small as possible, in order to be sure where the error is coming from or at least you can guarantee there won't be conflicting exceptions such as this case.
I think that's why your list is returning with only the last ticker in my previous solution, as yf.Ticker causes a KeyError and the KeyError exception gets called instead of the last one.
Here's how I'd do it:
sector_dict = {'no sector':[]}
for ticker in tickers:
try:
sector = yf.Ticker(ticker).info['sector']
except KeyError:
sector_dict['no sector'].append(ticker)
try:
sector_dict[sector].append(ticker)
except KeyError:
sector_dict[sector] = [ticker]

Read dynamically CSV files

How I read CSV's files dynamically in Python, when change the suffix name files?
Example:
import pandas as pd
uf = ['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'DF', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP1', 'SP2', 'SE', 'TO']
for n in uf:
{n} = pd.read_csv('Basico_{n}.csv', encoding='latin1', sep=';', header=0)
The {} is not recognize into "for-loop".
I want to read the different file suffix names within in list items and create different DataFrames by same rules.
You have two main issues:
{n} = is invalid syntax. You can't assign to a variable name without messing with locals or globals. Doing so is almost always a bad idea anyway because it's much more difficult to programmatically access names that are, in a way, hard-coded. If the list of names is dynamic, then you need to start accessing globals() to get at them and this leads to bugs.
'Basico_{n}.csv' misses the f out of fstrings. n will not be added to the string if you don't specify that it's an f-string by prepending f.
Instead:
import pandas as pd
uf = ['AC', 'AL', 'AP', 'AM', 'BA', 'CE', 'DF', 'ES', 'GO', 'MA', 'MT', 'MS', 'MG', 'PA', 'PB', 'PR', 'PE', 'PI', 'RJ', 'RN', 'RS', 'RO', 'RR', 'SC', 'SP1', 'SP2', 'SE', 'TO']
dfs = {} # Create a dict to store the names
for n in uf:
dfs[n] = pd.read_csv(f'Basico_{n}.csv', encoding='latin1', sep=';', header=0)
'Basico_{n}.csv'
Will only work for python >= 3.6
Try
{n} = pd.read_csv('Basico_{}.csv'.format(n), encoding='latin1', sep=';', header=0)

Same DataFrame.reindex code - different output

Good afternoon everyone,
I want to filter out from a DataFrame the columns that I am not interested in.
To do that - and since the columns could change based on user input (that I will not show here) - I am using the following code within my offshore_filter function:
# Note: 'df' is my DataFrame, with different country codes as rows and years as columns' headers
import datetime as d
import pandas as pd
COUNTRIES = [
'EU28', 'AL', 'AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
'ES', 'FI', 'FR', 'GE', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
'MD', 'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK',
'TR', 'UA', 'UK', 'XK'
YEARS = list(range(2005, int(d.datetime.now().year)))
def offshore_filter(df, countries=COUNTRIES, years=YEARS):
# This function is specific for filtering out the countries
# and the years not needed in the analysis
# Filter out all of the countries not of interest
df.drop(df[~df['country'].isin(countries)].index, inplace=True)
# Filter out all of the years not of interest
columns_to_keep = ['country', 'country_name'] + [i for i in years]
temp = df.reindex(columns=columns_to_keep)
df = temp # This step to avoid the copy vs view complication
return df
When I pass a years list of integers, the code works well and filters the DataFrame by taking only the columns in the years list.
However, if the DataFrame's column headers are strings (e.g. '2018' instead of 2018), changing [i for i in years] into [str(i) for i in years] doesn't work, and I have columns of Nan's (as the reindex documentation states).
Can you help me spot me why?

How do I set the order of a grouped bar chart with Chartify?

How can users change the order of the grouped bars in the example below?
ch = chartify.Chart(blank_labels=True, x_axis_type='categorical')
ch.plot.bar(
data_frame=quantity_by_fruit_and_country,
categorical_columns=['fruit', 'country'],
numeric_column='quantity')
ch.show('png')
The bar plot method has a categorical_order_by parameter that can be used to change the order. As specified in the documentation, set it equal to values or labels to sort by those corresponding dimensions.
For a custom sort, you can pass a list of values to the categorical_order_by parameter. Since the bar is grouped by two dimensions, the list should contain tuples as in the example below:
from itertools import product
outside_groups = ['Apple', 'Orange', 'Banana', 'Grape']
inner_groups = ['US', 'JP', 'BR', 'CA', 'GB']
sort_order = list(product(outside_groups, inner_groups))
# Plot the data
ch = chartify.Chart(blank_labels=True, x_axis_type='categorical')
ch.plot.bar(
data_frame=quantity_by_fruit_and_country,
categorical_columns=['fruit', 'country'],
numeric_column='quantity',
categorical_order_by=sort_order)
ch.show('png')

Categories