KeyError with Pandas CSV - python

Getting KeyError: Revenue
My CSV file
Product,Revenue
Onetap Master,538.07
Aimware Masterpack,306.06
Personal Config,159.94
Aimware Lua,29.95
Config Swap,22.76
The code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv(open('sales.csv'),index_col=1, sep=',')
print(df.columns.tolist())
pd.value_counts(df['Revenue']).plot.bar()
plt.show()
When I use Product instead of Revenue it works just fine

Simply drop the
index_col=1
From your pd.read_csv() step, and it works. You can also skip the
open()
and
sep=','
parts of pd.read_csv()
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('sales.csv')
print(df.columns.tolist()) # returns ['Product', 'Revenue']
pd.value_counts(df['Revenue']).plot.bar()
plt.show()

When you use index_col=1 you make the Revenue column the index and the dataframe looks like:
Product
Revenue
538.07 Onetap Master
306.06 Aimware Masterpack
159.94 Personal Config
29.95 Aimware Lua
22.76 Config Swap
So it is a single column dataframe, which can be make evident by examining df.columns: is is just [Products])
TL/DR: in you want to use Revenue as a column, do not put it into the index

Related

how to drop a categorical value from a data frame column in python?

I am working with a data frame title price_df. and I would like to drop the rows that contain '4wd' from the column drive-wheels. I have tried price_df2 = price_df.drop(index='4wd', axis=0) and a few other variations after reading the docs pages in pandas, but I continue to get error codes. Could anyone direct me to the correct way to drop the rows that contain values 4wd from the column and data frame? Below is the code I have ran before trying to drop the values:
# Cleaned up Dataset location
fileName = "https://library.startlearninglabs.uw.edu/DATASCI410/Datasets/Automobile%20price%20data%20_Raw_.csv"
# Import libraries
from scipy.stats import norm
import numpy as np
import pandas as pd
import math
import numpy.random as nr
price_df = pd.read_csv(fileName)
round(price_df.head(),2) #getting an overview of that data
price_df.loc[:,'drive-wheels'].value_counts()
price_df2 = price_df.drop(index='4wd', axis=0)
You can use pd.DataFrame.query and back ticks for this column name with a hyphen:
price_df.query('`drive-wheels` != "4wd"')
Try this
price_df = pd.read_csv(fileName)
mask = price_df["drive-wheels"] =="4wd"
price_df = price_df[~mask]
Get a subset of your data with this one-liner:
price_df2 = price_df[price_df.drive-wheels != '4wd']

How to delete \xa0 symbol from all dataframe in pandas?

I have a data frame with the 2011 census (Chile) information. In some columns names and some variable values, I have \xa0 symbol and is trouble to call some parts of data frames. My code is the following:
import numpy as numpy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn
#READ AND SAVE EACH DATA SET IN A DATA FRAME
data_2011 = pd.read_csv("2011.csv")
#SHOW FIRST 5 ROWS
print(data_2011.head())
Doing this got the following output:
This far, everything is right, but when I want to see the column names using:
print("ATRIBUTOS DATOS CENSO 2011",list(data_2011))
I got the following ouput:
How can I fix this? Thanks in advance
Try the following code:
import re
inputarray = re.sub(r'\s', data_2011).split(",')

How can I use a CSV file for Python pdblp instead of a ticker reference for getting API from con.ref

I very new to Python and I want to replace an exact ticker with a reference to a column of a Data Frame I created from a CVS file, can this be done. i'm using:
import pandas as pd
import numpy as np
import pdblp as pdblp
import blpapi as blp
con = pdblp.BCon(debug=False, port=8194, timeout=5000)
con.start()
con.ref("CLF0CLH0 Comdty","PX_LAST")
tickers = pd.read_csv("Tick.csv")
so "tickers" has a colum 'ticker1' which is a list of tickers, i want to replace
con.ref("CLF0CLH0 Comdty","PX_LAST") with somthing like
con.ref([tickers('ticker1')],"PX_LAST")
any ideas?
assuming you would want to load all tickers into one df, i think it would look something like this:
df = pd.DataFrame(columns=["set your columns"])
for ticker in tickers.tickers1:
df_tmp = pd.DataFrame()
con.ref(ticker,"PX_LAST")
df_tmp = con.fetch #you'll have to fetch the records into a df
df.append(df_tmp)
Ended up using the following .tolist() function, and worked well.
tickers = pd.read_csv("Tick.csv")
tickers1=tickers['ticker'].tolist()
con.ref(tickers1,[PX_LAST])

Join 2 CSV with Pandas

I have 2 CSV (emails1.csv and emails2.csv)
What i need is Join these 2 CSV into one, because they are too big for work with excel.
I need to export to CSV and TXT.
What i did is create a Python file:
import csv
import pandas as pd
import numpy as np
df1 = pd.read_csv("emails1.csv")
df2 = pd.read_csv("emails2.csv")
df3 = pd.merge(df1, df2, on=["email"])
df3.to_csv("final.csv",index=False)
The CSV only have the email column
Thanks for the help.
You are missing out on how to join the two dataframes.
I just made a small adjustment to your given code and it works perfectly.
import csv
import pandas as pd
import numpy as np
df1 = pd.read_csv("emails1.csv")
df2 = pd.read_csv("emails2.csv")
df3 = df1.merge(df2, on=["email"], how='outer')
df3.to_csv("final.csv",index=False)
Please note the how parameter, and the way merge is called.
This is emails1.csv :
email
one#gmail.com
two#gmail.com
This is emails2.csv :
email
three#gmail.com
four#gmail.com
And this is the final.csv after executing my code:
email
one#gmail.com
two#gmail.com
three#gmail.com
four#gmail.com
I hope this is what you wanted.
:-) cheers!

Plotting a multiple column in Pandas (converting strings to floats)

I'd like to plot "MJD" vs "MULTIPLE_MJD" for the data given here::
https://www.dropbox.com/s/cicgc1eiwrz93tg/DR14Q_pruned_several3cols.csv?dl=0
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import ast
filename = 'DR14Q_pruned_several3cols.csv'
datafile= path+filename
df = pd.read_csv(datafile)
df.plot.scatter(x='MJD', y='N_SPEC')
plt.show()
ser = df['MJD_DUPLICATE'].apply(ast.literal_eval).str[1]
df['MJD_DUPLICATE'] = pd.to_numeric(ser, errors='coerce')
df['MJD_DUPLICATE_NEW'] = pd.to_numeric(ser, errors='coerce')
df.plot.scatter(x='MJD', y='MJD_DUPLICATE')
plt.show()
This makes a plot, but only for one value of MJD_DUPLICATE::
print(df['MJD_DUPLICATE_NEW'])
0 55214
1 55209
...
Thoughts??
There are two issues here:
Telling Pandas to parse tuples within the CSV. This is covered here: Reading back tuples from a csv file with pandas
Transforming the tuples into multiple rows. This is covered here: Getting a tuple in a Dafaframe into multiple rows
Putting those together, here is one way to solve your problem:
# Following https://stackoverflow.com/questions/23661583/reading-back-tuples-from-a-csv-file-with-pandas
import pandas as pd
import ast
df = pd.read_csv("DR14Q_pruned_several3cols.csv",
converters={"MJD_DUPLICATE": ast.literal_eval})
# Following https://stackoverflow.com/questions/39790830/getting-a-tuple-in-a-dafaframe-into-multiple-rows
df2 = pd.DataFrame(df.MJD_DUPLICATE.tolist(), index=df.MJD)
df3 = df2.stack().reset_index(level=1, drop=True)
# Now just plot!
df3.plot(marker='.', linestyle='none')
If you want to remove the 0 and -1 values, a mask will work:
df3[df3 > 0].plot(marker='.', linestyle='none')

Categories