Use dataframe columns as arguments for function - python

I want to get arguments from a datafile (excel, .csv, whatever) and pass them as arguments to a Python function.
To get the arguments from the datafile I've converted it to a Pandas dataframe. Created a list of the index of the df and iterate over this list whilst finding all the cell values and passing these as arguments.
I've got some working code (see below) but I feel like it's kinda clunky.
Is there a better way to do this?
import pandas as pd
import os
curdir = os.getcwd()
def pdFunc(Name, Module): #function that takes multiple arguments from the dataframe
print(str(Name) + ',' + str(Module))
#further code will be added here which will create new .csv files etc. This output is not suitable to be placed in a dataframe.
assetList = os.path.join(curdir, 'Lists', 'Assets_ShortTesting_v1.0.xlsx') # setting path for the excel file with the data
assetdf = pd.read_excel(assetList) #importing the data to a dataframe
indexList = assetdf.index.tolist() #creating list to iterate over
for i in indexList: #iterating over list
pdFunc(assetdf.loc[i]['Name'], assetdf.loc[i]['Module']) #finding the cell values from the dataframe and setting them as arguments for the function
Here's the dataframe:
Name ISIN SymbolYF SymbolInvestpy Currency Country Exchange Type Module Constituent of
0 Adyen N.V. NaN ADYEN.AS NaN EUR Netherlands NaN Stock 1 AEX
1 Aegon N.V. NaN AGN.AS NaN EUR Netherlands NaN Stock 1 AEX
2 Aalberts N.V. NaN AALB.AS NaN EUR Netherlands NaN Stock 1 AMX
3 ABN AMRO Bank N.V. NaN ABN.AS NaN EUR Netherlands NaN Stock 1 AMX
4 Anheuser-Busch InBev SA/NV NaN ABI.BR NaN EUR Belgium NaN Stock 2 BEL20
5 Ackermans & Van Haaren NV NaN ACKB.BR NaN EUR Belgium NaN Stock 2 BEL20
6 L'Air Liquide S.A. NaN AI.PA NaN EUR France NaN Stock 2 CAC40
7 Airbus SE NaN AIR.PA NaN EUR France NaN Stock 2 CAC40
8 Vonovia SE NaN VNA.DE NaN EUR Germany NaN Stock 2 DAX
9 US Dollar NaN USD-EUR NaN EUR US NaN Forex 3 Forex
10 Shiba Inu NaN SHIB-EUR NaN EUR NaN NaN Crypto 3 Forex
11 FTSE 1000 NaN ^FTSE NaN EUR United Kingdom NaN Index 3 Index
12 Wheat NaN ZW=F NaN USD NaN NaN Commodity 3 Commodity
13 Apple Inc. NaN AAPL NaN USD US NaN Stock 4 US MegaCap
14 Sirius XM Holdings Inc. NaN SIRI NaN USD US NaN Stock 4 US High volume

Related

Beautifulsoup: Scrape Table with Key Word Search

I'm trying to scrape tables from multiple websites with key words. I want to scrape values from table which fulfill "Cash and cash equivalent" as row header and "2020" as column header at the same time in order to print to excel file in the future. But I cannot get the code work. Hope you can help me on this! Thank you!!
from bs4 import BeautifulSoup
import requests
import time
from pandas import DataFrame
import pandas as pd
#headers={"Content-Type":"text"}
headers = {'User-Agent': 'registr#jh.edu'}
urls={'https://www.sec.gov/Archives/edgar/data/1127993/0001091818-21-000003.txt',
'https://www.sec.gov/Archives/edgar/data/1058307/0001493152-21-003451.txt'}
Cash=[]
for url in urls:
response = requests.get(url, headers = headers)
response.raise_for_status()
time.sleep(0.1)
soup = BeautifulSoup(response.text,'lxml')
for table in soup.find_all('table'):
for tr in table.find_all('tr'):
row = [td.get_text(strip=True) for td in tr.find_all('td')]
headers = [header.get_text(strip=True).encode("utf-8") for header in tr[0].find_all("th")]
try:
if '2020' in headers[0]:
if row[0] == 'Cash and cash equivalent':
Cash_and_cash_equivalent = f'{url}'+ ' ' + headers+ str(row)
Cash.append(Cash_and_cash_equivalent)
if row[0] == 'Cash':
Cash_ = f'{url}'+ ' ' + headers+ str(row)
Cash.append(Cash_)
except IndexError:
continue
print(Cash)
You could do something along these lines:
import requests
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
headers = {'User-Agent': 'registr#jh.edu'}
r = requests.get('https://www.sec.gov/Archives/edgar/data/1127993/0001091818-21-000003.txt', headers=headers)
dfs = pd.read_html(str(r.text))
for x in range(len(dfs)):
if dfs[x].apply(lambda row: row.astype(str).str.contains('Cash and Cash Equivalents').any(), axis=1).any():
df = dfs[x]
df.dropna(how='all')
new_header = df.iloc[2]
df = df[3:]
df.columns = new_header
display(df) ## or print(df) if you're not in a jupyter notebook
This will return two dataframes, with tables #37 and respectively #71. You may need to improve the table header detection, as only table #71 will come out with proper headers (years).
I tried to look at the second url, however it was hanging for me (huge page).
The printout in terminal will look something like this:
NaN NaN 2020 NaN 2019
3 Cash Flows from Operating Activities NaN NaN NaN NaN
4 Net loss NaN $(13,134,778) NaN $ (2,017,347)
5 Adjustments to reconcile net loss to net cash used in operating activities: NaN NaN NaN NaN
6 Depreciation and amortization NaN 84940 NaN 7832
7 Amortization of convertible debt discounts NaN 74775 NaN 60268
8 Accretion and settlement of financing instruments NaN NaN NaN NaN
9 and change in fair value of derivative liability NaN 1381363 NaN (1,346,797)
10 Stock compensation and stock issued for services NaN 2870472 NaN -
11 Stock issued under Put Purchase Agreement NaN 7865077 NaN -
12 NaN NaN NaN NaN NaN
13 Changes in assets and liabilities: NaN NaN NaN NaN
14 Accounts receivable NaN (696,710) NaN 82359
15 Inventories NaN (78,919) NaN 304970
16 Accounts payable NaN (1,462,072) NaN (22,995)
17 Accrued expenses NaN (158,601) NaN (346,095)
18 Deferred revenue NaN 431147 NaN (91,453)
19 Net cash used in operating activities NaN (2,823,306) NaN (3,369,258)
20 NaN NaN NaN NaN NaN
21 Cash Flows from Investing Activities NaN NaN NaN NaN
22 Acquisition of business, net of cash NaN - NaN 2967918
23 Purchases of property and equipment NaN - NaN (17,636)
24 Net cash provided by investing activities NaN - NaN 2950282
25 NaN NaN NaN NaN NaN
26 Cash Flows from Financing Activities NaN NaN NaN NaN
27 Principal payments on financing lease obligations NaN - NaN (1,649)
28 Principal payments on notes payable NaN (774) NaN -
29 Payments on advances from stockholder, net NaN (33,110) NaN -
30 Proceeds from convertible notes payable NaN 840000 NaN 667000
31 Payments on line of credit, net NaN (300,000) NaN -
32 Proceeds from sale of common stock under Purchase Agreement NaN 2316520 NaN -
33 Net cash provided by financing activities NaN 2822636 NaN 665351
34 NaN NaN NaN NaN NaN
35 Net Increase (Decrease) in Cash and Cash Equivalents NaN (670) NaN 246375
36 NaN NaN NaN NaN NaN
37 Cash, Beginning of Period NaN 412391 NaN 169430
38 NaN NaN NaN NaN NaN
39 Cash, End of Period NaN $ 411,721 NaN $ 415,805

tabula.read_pdf in python, getting a list variable and can't read it

I am using tabula to extract some data from a pdf, when I read the file, it outputs a list, not a dataframe, and I'm having problems reading the values,
file = "example.pdf"
path = 'data/' + file
df = tabula.read_pdf(path, pages = '1', multiple_tables = False)
cliente_raw = tabula.read_pdf(path, pages=1,output_format="dataframe")
print(cliente_raw)
This is the output
[ Beneficiario: Nury García Unnamed: 1 NIT/Cédula:
0 Dirección: Calle 115 #53-74 Apto 307 NaN Ciudad:
1 Referencia Descripción NaN
2 Spectral + Porcelai Perfect Face Kit, -/- NaN
3 NaN NaN NaN
4 NaN NaN NaN
5 NaN NaN NaN
39564525 Teléfono: 601 6299329 Unnamed: 5 Unnamed: 6
0 BOGOTA (C/MARCA) País: COLOMBIA NaN NaN
1 Cantidad IVA Valor Unitario NaN Valor Total
2 1 19% 125,210 NaN 125,210
3 NaN Subtotal NaN 125,210
4 NaN IVA NaN 23,790
5 NaN TOTAL NaN 149,000 ]
The len of this variable is 1, so I dont know how to extract the values, any help?

pandas.read_html tables not found

I'm trying to get a list of the major world indices in Yahoo Finance at this URL: https://finance.yahoo.com/world-indices.
I tried first to get the indices in a table by just running
major_indices=pd.read_html("https://finance.yahoo.com/world-indices")[0]
In this case the error was:
ValueError: No tables found
So I read a solution using selenium at pandas read_html - no tables found
the solution they came up with is (with some adjustment):
from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.keys import Keys
from webdrivermanager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().download_and_install())
driver.get("https://finance.yahoo.com/world-indices")
html = driver.page_source
tables = pd.read_html(html)
data = tables[1]
Again this code gave me another error:
ValueError: No tables found
I don't know whether to keep using selenium or the pd.read_html is just fine. Either way I'm trying to get this data and don't know how to procede. Can anyone help me?
You don't need Selenium here, you just have to set the euConsentId cookie:
import pandas as pd
import requests
import uuid
url = 'https://finance.yahoo.com/world-indices'
cookies = {'euConsentId': str(uuid.uuid4())}
html = requests.get(url, cookies=cookies).content
df = pd.read_html(html)[0]
Output:
>>> df
Symbol Name Last Price Change % Change Volume Intraday High/Low 52 Week Range Day Chart
0 ^GSPC S&P 500 4023.89 93.81 +2.39% 2.545B NaN NaN NaN
1 ^DJI Dow 30 32196.66 466.36 +1.47% 388.524M NaN NaN NaN
2 ^IXIC Nasdaq 11805.00 434.04 +3.82% 5.15B NaN NaN NaN
3 ^NYA NYSE COMPOSITE (DJ) 15257.36 326.26 +2.19% 0 NaN NaN NaN
4 ^XAX NYSE AMEX COMPOSITE INDEX 4025.81 122.66 +3.14% 0 NaN NaN NaN
5 ^BUK100P Cboe UK 100 739.68 17.83 +2.47% 0 NaN NaN NaN
6 ^RUT Russell 2000 1792.67 53.28 +3.06% 0 NaN NaN NaN
7 ^VIX CBOE Volatility Index 28.87 -2.90 -9.13% 0 NaN NaN NaN
8 ^FTSE FTSE 100 7418.15 184.81 +2.55% 0 NaN NaN NaN
9 ^GDAXI DAX PERFORMANCE-INDEX 14027.93 288.29 +2.10% 0 NaN NaN NaN
10 ^FCHI CAC 40 6362.68 156.42 +2.52% 0 NaN NaN NaN
11 ^STOXX50E ESTX 50 PR.EUR 3703.42 89.99 +2.49% 0 NaN NaN NaN
12 ^N100 Euronext 100 Index 1211.74 28.89 +2.44% 0 NaN NaN NaN
13 ^BFX BEL 20 3944.56 14.35 +0.37% 0 NaN NaN NaN
14 IMOEX.ME MOEX Russia Index 2307.50 9.61 +0.42% 0 NaN NaN NaN
15 ^N225 Nikkei 225 26427.65 678.93 +2.64% 0 NaN NaN NaN
16 ^HSI HANG SENG INDEX 19898.77 518.43 +2.68% 0 NaN NaN NaN
17 000001.SS SSE Composite Index 3084.28 29.29 +0.96% 3.109B NaN NaN NaN
18 399001.SZ Shenzhen Component 11159.79 64.92 +0.59% 3.16B NaN NaN NaN
19 ^STI STI Index 3191.16 25.98 +0.82% 0 NaN NaN NaN
20 ^AXJO S&P/ASX 200 7075.10 134.10 +1.93% 0 NaN NaN NaN
21 ^AORD ALL ORDINARIES 7307.70 141.10 +1.97% 0 NaN NaN NaN
22 ^BSESN S&P BSE SENSEX 52793.62 -136.69 -0.26% 0 NaN NaN NaN
23 ^JKSE Jakarta Composite Index 6597.99 -1.85 -0.03% 0 NaN NaN NaN
24 ^KLSE FTSE Bursa Malaysia KLCI 1544.41 5.61 +0.36% 0 NaN NaN NaN
25 ^NZ50 S&P/NZX 50 INDEX GROSS 11168.18 -9.18 -0.08% 0 NaN NaN NaN
26 ^KS11 KOSPI Composite Index 2604.24 54.16 +2.12% 788539 NaN NaN NaN
27 ^TWII TSEC weighted index 15832.54 215.86 +1.38% 0 NaN NaN NaN
28 ^GSPTSE S&P/TSX Composite index 20099.81 400.76 +2.03% 294.637M NaN NaN NaN
29 ^BVSP IBOVESPA 106924.18 1236.54 +1.17% 0 NaN NaN NaN
30 ^MXX IPC MEXICO 49579.90 270.58 +0.55% 212.868M NaN NaN NaN
31 ^IPSA S&P/CLX IPSA 5058.88 0.00 0.00% 0 NaN NaN NaN
32 ^MERV MERVAL 38390.84 233.89 +0.61% 0 NaN NaN NaN
33 ^TA125.TA TA-125 1964.95 23.38 +1.20% 0 NaN NaN NaN
34 ^CASE30 EGX 30 Price Return Index 10642.40 -213.50 -1.97% 36.837M NaN NaN NaN
35 ^JN0U.JO Top 40 USD Net TRI Index 4118.19 65.63 +1.62% 0 NaN NaN NaN

Getting “no table found” error when web scraping with pandas for OTC Markets screener website

I want to extract various statistics from this website(https://www.otcmarkets.com/research/stock-screener). Unfortunately, pandas do not recognize the tables presented. Here is my code:
import requests
import pandas as pd
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36'}
def Get_table(screen):
tables = pd.read_html(screen)
tables.columns = tables.iloc[0]
return tables
screen = requests.get('https://www.otcmarkets.com/research/stock-screener', headers = header).text
table = Get_table(screen)
ValueError: No tables found
The page loads the data from external source (URL). You can use this example how to load the data from API and create a dataframe:
import json
import pandas as pd
url = "https://www.otcmarkets.com/research/stock-screener/api"
data = json.loads(requests.get(url).json())
df = pd.json_normalize(data["stocks"])
Prints:
securityId reportDate symbol securityName market marketId securityType country state forexCountry caveatEmptor industryId industry volume volumeChange dividendYield dividendPayer morningStarRating penny price shortInterest shortInterestPercent shortInterestRatio pct1Day pct5Day pct4Weeks pct13Weeks pct52Weeks isBank perfQxComp4Weeks perfQxComp13Weeks perfQxComp52Weeks perfQxBillion4Weeks perfQxBillion13Weeks perfQxBillion52Weeks perfQxBanks4Weeks perfQxBanks13Weeks perfQxBanks52Weeks perfQxIntl4Weeks perfQxIntl13Weeks perfQxIntl52Weeks perfQxUs4Weeks perfQxUs13Weeks perfQxUs52Weeks perfQb4Weeks perfQb13Weeks perfQb52Weeks perfSp4Weeks perfSp13Weeks perfSp52Weeks perfQxDiv4Weeks perfQxDiv13Weeks perfQxDiv52Weeks perfQxCan4Weeks perfQxCan13Weeks perfQxCan52Weeks
0 117230 Aug 3, 2021 12:00:00 AM MHGU MERITAGE HOSPTLTY GRP INC OTCQX U.S. Premier 1 Common Stock USA Michigan USA False 5812 Eating places 216 0.625623 1.122500 True 3.0 True 21.3800 171.0 100.00 0.000025 -0.003263 -0.003263 -0.049778 -0.021510 0.388312 N -1.934711 -0.346011 1.087740 -1.748822 -0.327350 1.101413 -11.044760 -0.532955 0.755842 -1.887983 -0.284616 1.105790 2.474310 0.094201 0.557874 1.046712 0.172299 1.418802 -6.221993 -0.463678 1.170963 -1.515941 -0.269834 1.011993 0.034056 0.013836 0.026113
1 130262 Aug 3, 2021 12:00:00 AM MHGUP MERITAGE HOSPTLTY PFD B OTCQX U.S. Premier 1 Preferred Stock USA Michigan USA False 5812 Eating places 0 2.984908 2.100000 True NaN True 38.0000 NaN NaN 0.000000 NaN NaN NaN NaN NaN N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 32227 Aug 3, 2021 12:00:00 AM TYCB TAYLOR(CLVN B)BKG BRLN MD OTCQX U.S. Premier 1 Common Stock USA Maryland USA False 6712 Bank holding companies 1 0.867442 3.300000 True 3.0 True 35.1000 NaN NaN 0.000000 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 31499 Aug 3, 2021 12:00:00 AM STBI STURGIS BANCORP INC OTCQX U.S. Premier 1 Common Stock USA Michigan USA False 6035 Federal savings institutions 1000 1.142256 3.355200 True 3.0 True 19.0750 NaN NaN 0.000000 -0.008576 -0.002614 0.003947 -0.046250 -0.046250 Y 0.153422 -0.743970 -0.129556 0.138681 -0.703847 -0.131184 0.875847 -1.145925 -0.090025 0.149717 -0.611962 -0.131705 -0.196212 0.202544 -0.066446 -0.083004 0.370465 -0.168987 0.493403 -0.996969 -0.139468 0.120214 -0.580178 -0.120534 -0.002701 0.029750 -0.003110
4 27295 Aug 3, 2021 12:00:00 AM PSBP PSB HOLDING CORP OTCQX U.S. Premier 1 Common Stock USA Maryland USA False 6022 State commercial banks 0 5.595744 0.645856 False 3.0 True 27.8700 19.0 -96.31 0.000012 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 24830 Aug 3, 2021 12:00:00 AM OCBI ORANGE CTY BNCRP INC OTCQX U.S. Premier 1 Common Stock USA New York USA False 6712 Bank holding companies 5 0.109266 2.352900 True 3.0 True 34.0000 200.0 100.00 0.000045 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
6 20776 Aug 3, 2021 12:00:00 AM MNBP MARS BANCORP INC OTCQX U.S. Premier 1 Common Stock USA Pennsylvania USA False 6021 National commercial banks 139 1.306208 3.084700 True 3.0 True 20.7475 NaN NaN 0.000000 0.004722 0.037375 -0.090022 -0.949396 -0.943158 Y -3.498879 -15.271837 -2.641976 -3.162702 -14.448208 -2.675186 -19.974187 -23.522956 -1.835841 -3.414373 -12.562040 -2.685816 4.474732 4.157719 -1.355001 1.892953 7.604722 -3.446082 -11.252328 -20.465275 -2.844113 -2.741543 -11.909604 -2.457995 0.061590 0.610687 -0.063426
7 83455 Aug 3, 2021 12:00:00 AM MNAT MARQUETTE NATL CORP OTCQX U.S. Premier 1 Common Stock USA Illinois USA False 6022 State commercial banks 0 0.978290 2.934800 True 3.0 True 36.8000 49.0 68.97 0.000011 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
8 18948 Aug 3, 2021 12:00:00 AM KISB KISH BANCORP INC OTCQX U.S. Premier 1 Common Stock USA Pennsylvania USA False 6712 Bank holding companies 173 0.440998 3.411800 True 3.0 True 34.0000 2.0 0.00 0.000001 0.008005 0.011905 0.035954 0.054264 0.387755 Y 1.397410 0.872875 1.086181 1.263146 0.825800 1.099834 7.977452 1.344475 0.754759 1.363660 0.717994 1.104205 -1.787155 -0.237638 0.557074 -0.756023 -0.434654 1.416768 4.494046 1.169710 1.169285 1.094940 0.680704 1.010542 -0.024598 -0.034904 0.026076
9 266615 Aug 3, 2021 12:00:00 AM KCLI KANSAS CITY LIFE INS NEW OTCQX U.S. Premier 1 Common Stock USA Missouri USA False 6311 Life insurance 106 0.210918 2.494200 True 3.0 True 43.3000 719.0 0.00 0.000074 0.000000 0.000000 -0.029148 -0.048352 0.503472 N -1.132893 -0.777777 1.410328 -1.024044 -0.735830 1.428056 -6.467393 -1.197997 0.980000 -1.105531 -0.639770 1.433731 1.448862 0.211748 0.723321 0.612915 0.387300 1.839572 -3.643364 -1.042273 1.518232 -0.887678 -0.606542 1.312116 0.019942 0.031102 0.033858
10 12485 Aug 3, 2021 12:00:00 AM FBAK FIRST NB ALASKA OTCQX U.S. Premier 1 Common Stock USA Alaska USA False 6021 National commercial banks 360 1.697465 5.493600 True 3.0 True 233.0000 62.0 37.78 0.000020 -0.004274 0.040179 0.000000 -0.046879 0.308989 Y NaN -0.754085 0.865540 NaN -0.713417 0.876420 NaN -1.161505 0.601442 NaN -0.620282 0.879903 NaN 0.205298 0.443913 NaN 0.375502 1.128974 NaN -1.010525 0.931763 NaN -0.588067 0.805266 NaN 0.030154 0.020779
11 11749 Aug 3, 2021 12:00:00 AM FETM FENTURA FINANCIAL INC OTCQX U.S. Premier 1 Common Stock USA Michigan USA False 6022 State commercial banks 19144 0.671837 1.230000 True 3.0 True 26.0000 185.0 -86.89 0.000040 -0.009524 0.006581 0.000000 0.037924 0.477273 Y NaN 0.610042 1.336938 NaN 0.577141 1.353744 NaN 0.939637 0.929004 NaN 0.501797 1.359123 NaN -0.166082 0.685681 NaN -0.303775 1.743845 NaN 0.817497 1.439227 NaN 0.475736 1.243837 NaN -0.024394 0.032096
12 10994 Aug 3, 2021 12:00:00 AM ENBP ENB FINANCIAL CORP PA OTCQX U.S. Premier 1 Common Stock USA Pennsylvania USA False 6021 National commercial banks 20 0.517902 2.912200 True 3.0 True 23.3500 NaN NaN 0.000000 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 8482 Aug 3, 2021 12:00:00 AM CNIG CORNING NATURAL GAS HLDG OTCQX U.S. Premier 1 Common Stock USA New York USA False 4923 Gas transmission and distribution 260 0.173180 2.510000 True 3.0 True 24.3000 1.0 0.00 0.000000 0.019723 0.022727 0.023158 0.031847 0.494465 N 0.900077 0.512288 1.385097 0.813596 0.484660 1.402508 5.138305 0.789068 0.962468 0.878338 0.421389 1.408081 -1.151112 -0.139469 0.710380 -0.486957 -0.255097 1.806662 2.894630 0.686500 1.491071 0.705254 0.399503 1.288642 -0.015844 -0.020485 0.033252
14 6722 Aug 3, 2021 12:00:00 AM CBAF CITBA FINANCIAL CORP OTCQX U.S. Premier 1 Common Stock USA Indiana USA False 6712 Bank holding companies 0 0.317975 2.142900 True 3.0 True 28.0000 3.0 0.00 0.000002 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
15 5489 Aug 3, 2021 12:00:00 AM CPTP CAPITAL PROPERTIES INC A OTCQX U.S. Premier 1 Common Stock USA Rhode Island USA False 6519 Lessors of Real Property, NEC 0 1.535617 2.002900 True 3.0 True 13.9800 NaN NaN 0.000000 NaN NaN NaN NaN NaN N NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
16 107523 Aug 3, 2021 12:00:00 AM BHWB BLACKHAWK BANCORP INC OTCQX U.S. Premier 1 Common Stock USA Wisconsin USA False 6022 State commercial banks 200 3.166295 1.239400 True 3.0 True 35.5000 116.0 24.73 0.000041 0.000000 0.014286 0.021583 0.109375 0.783920 Y 0.838855 1.759389 2.195918 0.758257 1.664503 2.223521 4.788806 2.709957 1.525887 0.818595 1.447207 2.232357 -1.072816 -0.478989 1.126229 -0.453835 -0.876100 2.864263 2.697743 2.357698 2.363928 0.657284 1.372043 2.043000 -0.014766 -0.070354 0.052718
17 1888 Aug 3, 2021 12:00:00 AM CFNB CALIFORNIA FIRST LEASING OTCQX U.S. Premier 1 Common Stock USA California USA False 6172 Finance Lessors 0 0.095971 2.918900 True 3.0 True 18.5000 1002.0 0.00 0.000097 NaN NaN NaN NaN NaN Y NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
18 53651 Aug 3, 2021 12:00:00 AM BNCC BNCCORP INC OTCQX U.S. Premier 1 Common Stock USA North Dakota USA False 6021 National commercial banks 623 0.444949 0.000000 True 3.0 True 39.2500 103.0 0.00 0.000029 0.006410 0.037265 0.012903 0.014212 0.365217 Y 0.501509 0.228610 1.023048 0.453324 0.216281 1.035908 2.862985 0.352124 0.710890 0.489397 0.188046 1.040024 -0.641382 -0.062239 0.524695 -0.271325 -0.113838 1.334421 1.612844 0.306353 1.101322 0.392957 0.178280 0.951806 -0.008828 -0.009142 0.024560
19 7590 Aug 3, 2021 12:00:00 AM CNAF COMML NATL FINCL CORP PA OTCQX U.S. Premier 1 Common Stock USA Pennsylvania USA False 6022 State commercial banks 3100 0.671460 9.951200 True 3.0 True 20.5000 378.0 -70.14 0.000132 0.000000 0.014851 0.006382 0.072737 0.138889 Y 0.248046 1.170032 0.389056 0.224214 1.106931 0.393947 1.416032 1.802181 0.270345 0.242055 0.962425 0.395512 -0.317228 -0.318538 0.199537 -0.134197 -0.582626 0.507468 0.797712 1.567921 0.418823 0.194356 0.912439 0.361963 -0.004366 -0.046787 0.009340

Calculate Number of Rows containg NaN values

I Have a Data Frame df which is given below and I have to calculate the number of rows containing NaN values.
Name Age City Country
0 jack NaN Sydeny Australia
1 Riti NaN Delhi India
2 Vikas 31 NaN India
3 Neelu 32 Bangalore India
4 Steve 16 New York US
5 John 11 NaN NaN
6 NaN NaN NaN NaN
To get the answer I tried
df.isnull().sum().sum()
And it gives me output 9 by calculating all NaN value, but the is answer is 5 by calculating Rows which contain NaN value. I do not know how to calculate this.
You need df.any() over axis=1 after you check isnull():
df.isnull().any(axis=1).sum()
#5
Just for an example how to get it.
Example DF
>>> df
Name Age City Country
0 jack NaN Sydeny Australia
1 Riti NaN Delhi India
2 Vikas 31.0 NaN India
3 Neelu 32.0 Bangalore India
4 John 16.0 New York US
5 John 11.0 NaN NaN
6 NaN NaN NaN NaN
TO designate the Nan rows with bool...
>>> df.isnull().any(1)
0 True
1 True
2 True
3 False
4 False
5 True
6 True
dtype: bool
To get the row where Nan appeared:
>>> df.index[df.isnull().any(1)]
Int64Index([0, 1, 2, 5, 6], dtype='int64')
Last your answer directly:
>>> df.isnull().any(1).sum()
5
OR
>>> df.index[df.isnull().any(1).sum()]
5

Categories