Create DataFrame having trouble

Create DataFrame having trouble - python

I'm trying to convert views_dict[2017] to a dataframe. Using jupyter notebook. If I use variable of views_dict I have numerous years in dictionary format.
views_dict[2017]
[2102206,
1331781,
925375,
382331,
321960,
278439,
231613,
206570,
179082,
173855,
137089,
123836,
122077,
120140,
114837,
108279,
103176,
93963,
79388,
72907]
df = pd.DataFrame(list(zip(views_dict)), columns = ['views'])
df
NameError Traceback (most recent call last)
Input In [27], in <cell line: 1>()
----> 1 df = pd.DataFrame(list(zip(views_dict)), columns = ['views'])
2 df
NameError: name 'pd' is not defined

Related

I am trying to use Jupyter to run analysis and have run the code below but I get NameError instead. I had defined df at the beginning

df = pd.read_csv('dowjones.csv', index_col=0);
df['rm'] = 100 * (np.log(df.DJIA) - np.log(df.DJIA.shift(1)))
df.head()
I initially defined df here, in the code above
df = df.dropna()
formula = 'MSFTtrans ~ rm'
results2 = smf.ols(formula, df).fit(cov_type = 'HAC', cov_kwds={'maxlags':10,'use_correction':True})
print(results2.summary())
Then I ran the code above
NameError Traceback (most recent call last)
<ipython-input-3-b46efd5c722d> in <module>
2
3
----> 4 df = df.dropna()
5 formula = 'MSFTtrans ~ rm'
6 results2 = smf.ols(formula, df).fit(cov_type = 'HAC', cov_kwds={'maxlags':10,'use_correction':True})
NameError: name 'df' is not defined
This is the error I got saying df is not defined.

There should not be a semi colon at the end of df = pd.read_csv().
Also run the first code and then run the second code. What you are doing is you are not running the first code so df is not defined and when you try to run second code, it is giving you the error.

python - How to read table with chunksize and names

how can i read data from a csv with chnunksize and names?
I tried this:
sms = pd.read_table('demodata.csv', header=None, names=['label', 'good'])
X = sms.label.tolist()
y = sms.good.tolist()
and it worked totaly fine. But if try this, i'll get an error:
sms = pd.read_table('demodata.csv', chunksize=100, header=None, names=['label', 'good'])
X = sms.label.tolist()
y = sms.good.tolist()
And i get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-18-e3f35149ab7f> in <module>()
----> 1 X = sms.label.tolist()
2 y = sms.good.tolist()
AttributeError: 'TextFileReader' object has no attribute 'label'
Why does it work in the first but not in the second place?

Getting error while trying to use a list with numpy to get some stat values

Hi I am having problems with this code:
**import numpy as np
# Summarize the data about minutes spent in the classroom
#total_minutes = total_minutes_by_account.values()
total_minutes = list(total_minutes_by_account.values())
type(total_minutes)
# Printing out the samething converting to a list
print('Printing out the samething converting to a list ')
print(type(total_minutes))
print ('Mean:', np.mean(total_minutes))
print ('Standard deviation:', np.std(total_minutes))
print ('Minimum:', np.min(total_minutes))
print ('Maximum:', np.max(total_minutes))**
The error I get is:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-93-945375bf6098> in <module>()
3 # Summarize the data about minutes spent in the classroom
4 #total_minutes = total_minutes_by_account.values()
----> 5 total_minutes = list(total_minutes_by_account.values())
6 type(total_minutes)
7 #print(total_minutes)
AttributeError: 'list' object has no attribute 'values'
I really would lie to know how I can make this work, I can do it with pandas converitng it to a numpy array and the getting values for the statistics I want with numpy

pyspark: type object 'Row' has no attribute 'fromSeq'

I have the following code:
from pyspark.sql import Row
z1=["001",1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,30,41,42,43]
print z1
r1 = Row.fromSeq(z1)
print (r1)
Then I got error:
AttributeError Traceback (most recent call last)
<ipython-input-6-fa5cf7d26ed0> in <module>()
2 z1=["001",1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,30,41,42,43]
3 print z1
----> 4 r1 = Row.fromSeq(z1)
5
6 print (r1)
AttributeError: type object 'Row' has no attribute 'fromSeq'
Anyone know what I might have missed? Thanks!

If you don't provide names just use tuple:
tuple(z1)
This is all what is needed to build correct DataFrame

I ran dropna in dataframe but got an error message

I ran this statement dr=df.dropna(how='all') to remove missing values and got the error message shown below:
AttributeError Traceback (most recent call last)
<ipython-input-29-07367ab952bc> in <module>
----> 1 dr=df.dropna(how='all')
AttributeError: 'list' object has no attribute 'dropna'

According to pdf https://www.google.com/url?sa=t&source=web&rct=j&url=https://readthedocs.org/projects/tabula-py/downloads/pdf/latest/&ved=2ahUKEwiKr-mQ9qTnAhUKwqYKHcAtAcoQFjADegQIBRAB&usg=AOvVaw32D890VNjAq5wOkTo4icOi&cshid=1580168098808
df = tabula.read_pdf(file, lattice=True, pages='all', area=(1, 1, 1000, 100), relative_area=True)
pages='all' => probably return a list of Dataframe
So you have to check:
for sub_df in df:
dr=sub_df.dropna(how='all')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create DataFrame having trouble - python

Related

I am trying to use Jupyter to run analysis and have run the code below but I get NameError instead. I had defined df at the beginning

python - How to read table with chunksize and names

Getting error while trying to use a list with numpy to get some stat values

pyspark: type object 'Row' has no attribute 'fromSeq'

I ran dropna in dataframe but got an error message

Categories

Resources