Print full pandas index in Jupyter Notebook - python

I have a pandas index with 380 elements and want to print the full index in Jupyter Notebook. I googled already but everything I've found did not help. For example this does not work:
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(my_index)
Neither this works:
with np.printoptions(threshold=np.inf):
print(my_index.array)
In both cases only the first 10 and last 10 elements are shown. The elements in between are abbreviated by "...".

IIUC, seems like you're looking for 'display.max_seq_items'.
From the documentation :
display.max_seq_items : int or None
When pretty-printing a long sequence, no more then max_seq_items
will be printed. If items are omitted, they will be denoted by the
addition of "..." to the resulting string.
with pd.option_context('display.max_seq_items', None):
print(my_index)
Tested it in Jupyter with :
my_index = pd.date_range(start='2023-02-01', end='2023-05-27') #len(my_index) = 116
Without the context :
DatetimeIndex(['2023-02-01', '2023-02-02', '2023-02-03', '2023-02-04', '2023-02-05', '2023-02-06', '2023-02-07', '2023-02-08', '2023-02-09',
'2023-02-10',
...
'2023-05-18', '2023-05-19', '2023-05-20', '2023-05-21', '2023-05-22', '2023-05-23', '2023-05-24', '2023-05-25', '2023-05-26',
'2023-05-27'],
dtype='datetime64[ns]', length=116, freq='D')
With the context :
DatetimeIndex(['2023-02-01', '2023-02-02', '2023-02-03', '2023-02-04', '2023-02-05', '2023-02-06', '2023-02-07', '2023-02-08', '2023-02-09',
'2023-02-10', '2023-02-11', '2023-02-12', '2023-02-13', '2023-02-14', '2023-02-15', '2023-02-16', '2023-02-17', '2023-02-18',
'2023-02-19', '2023-02-20', '2023-02-21', '2023-02-22', '2023-02-23', '2023-02-24', '2023-02-25', '2023-02-26', '2023-02-27',
'2023-02-28', '2023-03-01', '2023-03-02', '2023-03-03', '2023-03-04', '2023-03-05', '2023-03-06', '2023-03-07', '2023-03-08',
'2023-03-09', '2023-03-10', '2023-03-11', '2023-03-12', '2023-03-13', '2023-03-14', '2023-03-15', '2023-03-16', '2023-03-17',
'2023-03-18', '2023-03-19', '2023-03-20', '2023-03-21', '2023-03-22', '2023-03-23', '2023-03-24', '2023-03-25', '2023-03-26',
'2023-03-27', '2023-03-28', '2023-03-29', '2023-03-30', '2023-03-31', '2023-04-01', '2023-04-02', '2023-04-03', '2023-04-04',
'2023-04-05', '2023-04-06', '2023-04-07', '2023-04-08', '2023-04-09', '2023-04-10', '2023-04-11', '2023-04-12', '2023-04-13',
'2023-04-14', '2023-04-15', '2023-04-16', '2023-04-17', '2023-04-18', '2023-04-19', '2023-04-20', '2023-04-21', '2023-04-22',
'2023-04-23', '2023-04-24', '2023-04-25', '2023-04-26', '2023-04-27', '2023-04-28', '2023-04-29', '2023-04-30', '2023-05-01',
'2023-05-02', '2023-05-03', '2023-05-04', '2023-05-05', '2023-05-06', '2023-05-07', '2023-05-08', '2023-05-09', '2023-05-10',
'2023-05-11', '2023-05-12', '2023-05-13', '2023-05-14', '2023-05-15', '2023-05-16', '2023-05-17', '2023-05-18', '2023-05-19',
'2023-05-20', '2023-05-21', '2023-05-22', '2023-05-23', '2023-05-24', '2023-05-25', '2023-05-26', '2023-05-27'],
dtype='datetime64[ns]', freq='D')

You can try
print(list(id_to_submit.index))
or
list(id_to_submit.index)
It is work for me

Related

How to hide the index column of a pandas dataframe?

Help please, I need to delete the 'date' index column, or else 'date' will appear in the first column with the actions
heat_ds = pd.DataFrame(columns=['PFE','GS','BA','NKE','V','AAPL','TSLA','NVDA','MRK','CVX','UNH'])
heat_ds['PFE'] = pfizer['Close']
heat_ds['GS'] = goldmans['Close']
heat_ds['BA'] = boeingc['Close']
heat_ds['NKE'] = nike['Close']
heat_ds['V'] = visa['Close']
heat_ds['AAPL'] = aaple['Close']
heat_ds['TSLA'] = tesla['Close']
heat_ds['NVDA'] = tesla['Close']
heat_ds['MRK'] = tesla['Close']
heat_ds['CVX'] = chevronc['Close']
heat_ds['UNH'] = unitedh['Close']
First of all date represents index. To drop it first reset index to remove date from index of dataframe and make it a normal column and then drop that column.
heat_ds = heat_ds.reset_index()
heat_ds = heat_ds.drop('index', axis=1)
or in one line
heat_ds = heat_ds.reset_index(drop=True)
Deleting the index is probably not the best approach here.
If you are concerned about display, Styler.hide_index() or Styler.hide() (depending on your version of Pandas) would work. Usage examples here.
For my older version of Pandas,
df.style.hide_index()
in Jupyter cell works just fine. Of course, for exporting to csv, you would use index=False if needed.
If you wish to still print the index, but hide the extra offset caused by the index name, you can set the latter to None:
df.index.name = None

Using loc still rises SettingWithCopyWarning Warning while changing column

I want to filter URLs form text column of my df by filtering all http https like below:
data.loc[:,'text_'] = data['text_'].str.replace(r'\s*https?://\S+(\s+|$)', ' ').str.strip()
I used the loc as advised in other answers but I still keep getting the warning.
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: FutureWarning: The default value of regex will change from True to False in a future version.
"""Entry point for launching an IPython kernel.
time: 9.81 s (started: 2022-03-19 06:35:42 +00:00)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:1773: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
How to do this operation correctly ie. without the warning?
UPDATE:
I've generated data from kaggle dataset:
kaggle datasets download clmentbisaillon/fake-and-real-news-dataset
and then:
true_df.drop_duplicates(keep='first')
fake_df.drop_duplicates(keep='first')
true_df['is_fake'] = 0
fake_df['is_fake'] = 1
news_df = pd.concat([true_df, fake_df])
news_df = news_df.sample(frac=1).reset_index(drop=True)
drop_list = ['subject', 'date']
column_filter = news_df.filter(drop_list)
news_df.drop(column_filter, axis=1)
news_df['text_'] = news_df['title'] + news_df['text']
data = news_df[['text_', 'is_fake']]
Next for the following line:
data.loc[:,'text_'] = data['text_'].str.replace(r'\s*https?://\S+(\s+|$)', ' ').str.strip()
I get that error from the start of the post.
UPDATE 2:
As mentioned by #Riley Adding the
data = data.copy()
Fix the SettingWithCopyWarning however the:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: FutureWarning: The default value of regex will change from True to False in a future version.
"""Entry point for launching an IPython kernel.
Still remains. To fix it change regex=True fo replace:
data.loc[:,'text_'] = data['text_'].str.replace(r'\s*https?://\S+(\s+|$)', ' ', regex=True).str.strip()

Mining for Term that is "Included In" Entry Rather than "Equal To"

I am doing some data mining. I have a database that looks like this (pulling out three lines):
100324822$10032482$1$PS$BENICAR$OLMESARTAN MEDOXOMIL$1$Oral$UNK$$$Y$$$$021286$$$TABLET$
1014687010$10146870$2$SS$BENICAR HCT$HYDROCHLOROTHIAZIDE\OLMESARTAN MEDOXOMIL$1$Oral$1/2 OF 40/25MG TABLET$$$Y$$$$$.5$DF$FILM-COATED TABLET$QD
115700162$11570016$5$C$Olmesartan$OLMESARTAN$1$Unknown$UNK$$$U$U$$$$$$$
My Code looks like this :
with open('DRUG20Q4.txt') as fileDrug20Q4:
drugTupleList20Q4 = [tuple(map(str, i.split('$'))) for i in fileDrug20Q4]
drug20Q4 = []
for entryDrugPrimaryID20Q4 in drugTupleList20Q4:
drug20Q4.append((entryDrugPrimaryID20Q4[0], entryDrugPrimaryID20Q4[3], entryDrugPrimaryID20Q4[5]))
fileDrug20Q4.close()
drugNameDataFrame20Q4 = pd.DataFrame(drug20Q4, columns = ['PrimaryID', 'Role', 'Drug Name']) drugNameDataFrame20Q4 = pd.DataFrame(drugNameDataFrame20Q4.loc[drugNameDataFrame20Q4['Drug Name'] == 'OLMESARTAN'])
Currently the code will pull only entries with the exact name "OLMESARTAN" out, how do I capture all the variations, for instance "OLMESARTAN MEDOXOMIL" etc? I can't simply list all the varieties as there's an infinite amount of variations, so I would need something that captures anything with the term "OLMESARTAN" within it.
Thanks!
You can use str.contains to get what you are looking for.
Here's an example (using some string I found in the documentation):
import pandas as pd
df = pd.DataFrame()
item = 'Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.'
df['test'] = item.split(' ')
df[df['test'].str.contains('de')]
This outputs:
test
4 Index
22 Index.

Expected unicode, got pandas._libs.properties.CachedProperty

I,m trying to add empty column in my dataset on colab but it give me this error. and when I,m trying to run it on my local machine it works perfectly fine. does anybody know possible solution for this?
My code.
dataframe["Comp"] = ''
dataframe["Negative"] = ''
dataframe["Neutral"] = ''
dataframe["Positive"] = ''
dataframe
Error message
TypeError: Expected unicode, got pandas._libs.properties.CachedProperty
I run into similar issue today.
"Expected unicode, got pandas._libs.properties.CachedProperty"
my dataframe(called df) has timeindex. When add a new column to it, and fill with numpy.array data, it raise this error. I tried set it with df.index or df.index.value. It always raise this error.
Finally, I solved by 3 stesp:
df = df.reset_index()
df['new_column'] = new_column_data # it is np.array format
df = df.set_index('original_index_name')
WY
this Quetion is the same as https://stackoverflow.com/a/67997139/16240186, and there's a simple way to solve it: df = df.asfreq('H') # freq can be min\D\M\S\5min etc.

Replace None with NaN and ignore NoneType in Pandas

I'm attempting to create a raw string variable from a pandas dataframe, which will eventually be written to a .cfg file, by firstly joining two columns together as shown below and avoiding None:
Section of df:
command value
...
439 sensitivity "0.9"
440 cl_teamid_overhead_always 1
441 host_writeconfig None
...
code:
...
df = df['value'].replace('None', np.nan, inplace=True)
print df
df = df['command'].astype(str)+' '+df['value'].astype(str)
print df
cfg_output = '\n'.join(df.tolist())
print cfg_output
I've attempted to replace all the None values with NaN firstly so that no lines in cfg_output contain "None" as part of of the string. However, by doing so I seem to get a few undesired results. I made use of print statements to see what is going on.
It seems that df = df['value'].replace('None', np.nan, inplace=True), simply outputs None.
It seems that df = df['command'].astype(str)+' '+df['value'].astype(str) and cfg_output = '\n'.join(df.tolist()), cause the following error:
TypeError: 'NoneType' object has no attribute '__getitem__'
Therefore, I was thinking that by ignoring any occurrences of NaN, the code may run smoothly, although I'm unsure about how to do so using Pandas
Ultimately, my desired output would be as followed:
sensitivity "0.9"
cl_teamid_overhead_always 1
host_writeconfig
First of all, df['value'].replace('None', np.nan, inplace=True) returns None because you're calling the method with the inplace=True argument. This argument tells replace to not return anything but instead modify the original dataframe as it is. Similar to how pop or append work on lists.
With that being said, you can also get the desired output calling fillna with an empty string:
import pandas as pd
import numpy as np
d = {
'command': ['sensitivity', 'cl_teamid_overhead_always', 'host_writeconfig'],
'value': ['0.9', 1, None]
}
df = pd.DataFrame(d)
# df['value'].replace('None', np.nan, inplace=True)
df = df['command'].astype(str) + ' ' + df['value'].fillna('').astype(str)
cfg_output = '\n'.join(df.tolist())
>>> print(cfg_output)
sensitivity 0.9
cl_teamid_overhead_always 1
host_writeconfig
You can replace None to ''
df=df.replace('None','')
df['command'].astype(str)+' '+df['value'].astype(str)
Out[436]:
439 sensitivity 0.9
440 cl_teamid_overhead_always 1
441 host_writeconfig
dtype: object

Categories