I have a CSV file that I am uploading into Jupyter and I am trying to delete multiple columns at once. I thought the "DEL" command would be the best but I can't get it to work.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
tmbd_movies = pd.read_csv('tmdb-movies.csv')
tmbd_movies.head()
del(tmbd_movies['imdb_id','homepage','tagline','keywords','overview'])
The goal was to remove the following columns:
imdb_id','homepage','tagline','keywords','overview
You want this:
tmbd_movies.drop(['imdb_id','homepage','tagline','keywords','overview'], 'columns', inplace=True)
Related
I've exported an Excel into a CSV where all the columns and entires look correct and normal. However, when I put it into a data frame and print the head, the structure becomes very messy and unreadable due to columns being unstructured.
As you can see in the image, the values are not neatly under user_id.
https://imgur.com/a/gbWaTwi
I'm using the following code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
then
df1 = pd.read_csv('../doc.csv', low_memory=False)
df1.head
Do this --- print the invocation of head. Just saying .head isn't enough.
print(df1.head())
(My first ever StackOverflow question)
I'm trying to plot bitcoin's market-cap against the date using pandas and matplotlib in Python.
Here is my code:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#read in CSV file using Pandas built in method
df = pd.read_csv("btc.csv", index_col=0, parse_dates=True)
Here are some details about the data frame:
dataframe details
matplotlib code:
#Plot marketcap(usd)
plt.plot(df.index, df["marketcap(USD)"])
plt.show()
Result:
Incorrect result
The plot seems to be more like scribbles that seem to move backwards. How could I fix this?
You can plot your Pandas Series "marketcap(USD)" directly using:
df["marketcap(USD)"].plot()
See the Pandas documentation on Basic Plotting
I am trying to work with Dask because my dataframe has become large and that pandas by itself can't simply process it. I read my dataset in as follows and get the following result that looks odd, not sure why its not outputting the dataframe:
import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
import dask.bag as db
import json
%matplotlib inline
Leads = db.read_text('Leads 6.4.18.txt')
Leads
This returns (instead of my pandas dataframe):
dask.bag<bag-fro..., npartitions=1>
Then when I try to rename a few columns:
Leads_updated = Leads.rename(columns={'Business Type':'Business_Type','Lender
Type':'Lender_Type'})
Leads_updated
I get:
AttributeError: 'Bag' object has no attribute 'rename'
Can someone please explain what I am not doing correctly. The ojective is to just use Dask for all these steps since it is too big for regular Python/Pandas. My understanding is the syntax used under Dask should be the same as Pandas.
Working through following the Machine Learning Tutorial:
http://machinelearningmastery.com/machine-learning-in-python-step-by-step/
Specifically, Section 4.2. Unfortunately, my code is throwing an error
NameError: name 'scatter_matrix' is not defined
Here is my code:
import pandas
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = pandas.read_csv(url, names=names)
scatter_matrix(dataset)
plt.show()
There's at least one Stack Overflow question on scatter_matrix, but I haven't able to figure out what's missing.
Pandas scatter_matrix - plot categorical variables
You will have to import it like this:
from pandas.plotting import scatter_matrix
Cause you've imported the Pandas. You could use it like below:
pd.scatter_matrix(dataset)
However, pandas.scatter_matrix() is deprecated. use pandas.plotting.scatter_matrix() instead
I could not able to change the data types of specific column, I tried using these below codes, but neither works for me.
getting ValueError: could not convert string to float: '?'
There were some issues in version supporting of Pandas as I got to know when I searched around.
data.Global_intensity = data.Global_intensity.astype(pd.np.float)
data.Global_intensity.apply(float)
data['Global_intensity'] = data['Global_intensity'].astype('float')
Here are the versions of modules I am working with.
python: 3.5.2.final.0
pandas: 0.19.0
numpy: 1.11.1
part of the code
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6
data = pd.read_csv('Desktop/household_power_consumptions.csv')
print (data.head())
print ('\n Data Types:')
data['Global_intensity'] = data['Global_intensity'].astype('float')
print (data.dtypes)
What I need is to convert the data type of object into float of Global_intensity column to work with matplotlib.pylab library.
Thank you.