I'm currently working with pandas and ipython. Since pandas dataframes are copied when you perform operations with it, my memory usage increases by 500 mb with every cell. I believe it's because the data gets stored in the Out variable, since this doesn't happen with the default python interpreter.
How do I disable the Out variable?
The first option you have is to avoid producing output. If you don't really need to see the intermediate results just avoid them and put all the computations in a single cell.
If you need to actually display that data you can use InteractiveShell.cache_size option to set a maximum size for the cache. Setting this value to 0 disables caching.
To do so you have to create a file called ipython_config.py (or ipython_notebook_config.py) under your ~/.ipython/profile_default directory with the contents:
c = get_config()
c.InteractiveShell.cache_size = 0
After that you'll see:
In [1]: 1
Out[1]: 1
In [2]: Out[1]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-2-d74cffe9cfe3> in <module>()
----> 1 Out[1]
KeyError: 1
You can also create different profiles for ipython using the command ipython profile create <name>. This will create a new profile under ~/.ipython/profile_<name> with a default configuration file. You can then launch ipython using the --profile <name> option to load that profile.
Alternatively you can use the %reset out magic to reset the output cache or use the %xdel magic to delete a specific object:
In [1]: 1
Out[1]: 1
In [2]: 2
Out[2]: 2
In [3]: %reset out
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
Flushing output cache (2 entries)
In [4]: Out[1]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-d74cffe9cfe3> in <module>()
----> 1 Out[1]
KeyError: 1
In [5]: 1
Out[5]: 1
In [6]: 2
Out[6]: 2
In [7]: v = Out[5]
In [8]: %xdel v # requires a variable name, so you cannot write %xdel Out[5]
In [9]: Out[5] # xdel removes the value of v from Out and other caches
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-9-573c4eba9654> in <module>()
----> 1 Out[5]
KeyError: 5
Related
I'm trying to calculate the daily returns of stock in percentage format from a CSV file by defining a function.
Here's my code:
def daily_ret(ticker):
return f"{df[ticker].pct_change()*100:.2f}%"
When I call the function, I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-40-7122588f1289> in <module>()
----> 1 daily_ret('AAPL')
<ipython-input-39-7dd6285eb14d> in daily_ret(ticker)
1 def daily_ret(ticker):
----> 2 return f"{df[ticker].pct_change()*100:.2f}%"
TypeError: unsupported format string passed to Series.__format__
Where am I going wrong?
f-strings can't be used to format iterables like that, even Series:
Use map or apply instead:
def daily_ret(ticker):
return (df[ticker].pct_change() * 100).map("{:.2f}%".format)
def daily_ret(ticker):
return (df[ticker].pct_change() * 100).apply("{:.2f}%".format)
import numpy as np
import pandas as pd
df = pd.DataFrame({'A': np.arange(1, 6)})
print(daily_ret('A'))
0 nan%
1 100.00%
2 50.00%
3 33.33%
4 25.00%
Name: A, dtype: object
I have a csv contains about 600K observations, and I'm importing it using fread
DT = dt.fread('C:\\Users\\myamulla\\Desktop\\proyectos_de_py\\7726_analysis\\datasets\\7726export_Jan_23.csv')
It is throwing out an error as -
--------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-3-01684fbecd91> in <module>
----> 1 dt.fread('C:\\Users\\myamulla\\Desktop\\proyectos_de_py\\7726_analysis\\datasets\\7726export_Jan_23.csv')
IOError: Too few fields on line 432815: expected 14 but found only 4 (with sep=','). Set fill=True to ignore this error. <<19731764,2021-01-23 23:30:15,2021-01-23 23:42:20,"Vote for David Borrero, your Republican in HD 105. Potestad betrayed Prez Trump. Borrero is for our values & POTUS Trump.>>
As suggested here, i passed the argument fill=True in fread statement.
DT = dt.fread('C:\\Users\\myamulla\\Desktop\\proyectos_de_py\\7726_analysis\\datasets\\7726export_Jan_23.csv',fill=True)
It executes, but DT will be created EMPTY.
How to get it resolved ?
My code is this...
results = requests.get(url).json()['response']['groups'][0]['items']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-225-110fa1855079> in <module>
1 london_venues = getNearbyVenues(names=df2['Postcode'],
2 latitudes=df2['Latitude'],
----> 3 longitudes=df2['Longitude']
4 )
<ipython-input-223-c23495b2f972> in getNearbyVenues(names, latitudes, longitudes, radius)
16
17 # make the GET request
---> 18 results = requests.get(url).json()['response']['groups'][0]['items']
19
20 # return only relevant information for each nearby venue
KeyError: 'groups'
I think it is because there is no data returned in some cases - is there a way I can just return no data?
If in some cases you do not have group, you can simply change your line to:
results = requests.get(url).json()['response'].get('groups',[{}])[0].get('items', [])
it will returns None if you miss groups or items in your response.
I'm currently working with pandas and ipython. Since pandas dataframes are copied when you perform operations with it, my memory usage increases by 500 mb with every cell. I believe it's because the data gets stored in the Out variable, since this doesn't happen with the default python interpreter.
How do I disable the Out variable?
The first option you have is to avoid producing output. If you don't really need to see the intermediate results just avoid them and put all the computations in a single cell.
If you need to actually display that data you can use InteractiveShell.cache_size option to set a maximum size for the cache. Setting this value to 0 disables caching.
To do so you have to create a file called ipython_config.py (or ipython_notebook_config.py) under your ~/.ipython/profile_default directory with the contents:
c = get_config()
c.InteractiveShell.cache_size = 0
After that you'll see:
In [1]: 1
Out[1]: 1
In [2]: Out[1]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-2-d74cffe9cfe3> in <module>()
----> 1 Out[1]
KeyError: 1
You can also create different profiles for ipython using the command ipython profile create <name>. This will create a new profile under ~/.ipython/profile_<name> with a default configuration file. You can then launch ipython using the --profile <name> option to load that profile.
Alternatively you can use the %reset out magic to reset the output cache or use the %xdel magic to delete a specific object:
In [1]: 1
Out[1]: 1
In [2]: 2
Out[2]: 2
In [3]: %reset out
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
Flushing output cache (2 entries)
In [4]: Out[1]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-d74cffe9cfe3> in <module>()
----> 1 Out[1]
KeyError: 1
In [5]: 1
Out[5]: 1
In [6]: 2
Out[6]: 2
In [7]: v = Out[5]
In [8]: %xdel v # requires a variable name, so you cannot write %xdel Out[5]
In [9]: Out[5] # xdel removes the value of v from Out and other caches
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-9-573c4eba9654> in <module>()
----> 1 Out[5]
KeyError: 5
I notice that many DataFrame functions if used without parentheses seem to behave like 'properties' e.g.
In [200]: df = DataFrame (np.random.randn (7,2))
In [201]: df.head ()
Out[201]:
0 1
0 -1.325883 0.878198
1 0.588264 -2.033421
2 -0.554993 -0.217938
3 -0.777936 2.217457
4 0.875371 1.918693
In [202]: df.head
Out[202]:
<bound method DataFrame.head of 0 1
0 -1.325883 0.878198
1 0.588264 -2.033421
2 -0.554993 -0.217938
3 -0.777936 2.217457
4 0.875371 1.918693
5 0.940440 -2.279781
6 1.152370 -2.733546>
How is this done and is it good practice ?
This is with pandas 0.15.1 on linux
They are different and not recommended, one clearly shows that it's a method and happens to output the results whilst the other shows the expected output.
Here's why you should not do this:
In [23]:
t = df.head
In [24]:
t.iloc[0]
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-24-b523e5ce509d> in <module>()
----> 1 t.iloc[0]
AttributeError: 'function' object has no attribute 'iloc'
In [25]:
t = df.head()
t.iloc[0]
Out[25]:
0 0.712635
1 0.363903
Name: 0, dtype: float64
So OK you don't use parentheses to call the method correctly and see an output that appears valid but if you took a reference to this and tried to use it, you are operating on the method rather than the slice of the df which is not what you intended.