Plotly Choropleth Map Not Showing Up - python

I'm trying to display a Plotly Choropleth Map in Jupyter Notebooks (I'm a beginner with this type of stuff) and for some reason it won't display correctly.
The csv file I am using for it can be found here:
https://www.kaggle.com/ajaypalsinghlo/world-happiness-report-2021
Here is the code leading up to the choropleth:
# here we're assigning the hover data columns to use for our choropleth map below
hover_data_cols_df = ['Country', 'Life Ladder', 'Log GDP per capita', 'Social support', 'Healthy life expectancy at birth', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
df.groupby('Year').Country.count()
and here is the code for the actual choropleth:
choropleth_map = px.choropleth(df,
locations="Country",
color='Life Ladder',
hover_name = 'Life Ladder',
hover_data = hover_data_cols_df,
color_continuous_scale = px.colors.sequential.Oranges,
animation_frame="Year"
).update_layout (title_text = 'World Happiness Index - year wise data', title_x = 0.5,);
iplot(choropleth_map)
I'm not getting any error messages attached to it currently, however when I check my console log on my browser, I do find this error:
Wolrd-Happiness-Report.ipynb:1 Uncaught ReferenceError: require is not defined
at <anonymous>:1:17
at t.attachWidget (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at t.insertWidget (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at x._insertOutput (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at x.onModelChanged (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at m (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at Object.l [as emit] (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at e.emit (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at c._onListChanged (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
at m (jlab_core.64abc115a1efeec58694.js?v=64abc115a1efeec58694:2)
I'm not too sure if this is related or not!
Thanks all!

Your task requires a setting that associates a country name with a country on the map. It requires that the location mode be the country name.
import pandas as pd
df = pd.read_csv('./data/world-happiness-report.csv', sep=',')
df.sort_values('year', ascending=True, inplace=True)
hover_data_cols_df = ['Country name', 'year', 'Life Ladder', 'Log GDP per capita', 'Social support', 'Healthy life expectancy at birth', 'Freedom to make life choices', 'Generosity', 'Perceptions of corruption']
import plotly.express as px
fig = px.choropleth(df,
locations="Country name",
locationmode='country names',
color='Life Ladder',
hover_name = 'Life Ladder',
hover_data = hover_data_cols_df,
color_continuous_scale = px.colors.sequential.Oranges,
animation_frame="year"
)
fig.update_layout (title_text = 'World Happiness Index - year wise data', title_x = 0.5,);
fig.show()

Related

Folium Choropleth Highlight map elements is not working

This is my first question so hopefully, I'm asking it in a way that makes sense. If not, please correct me.
I want to use the highlight parameter in folium.Choropleth to achieve this sort of behaviour on mouse hover:
but it's not working.
I noticed one strange thing:
I also have folium.features.GeoJsonTooltip in my code and if I disable it, highlighting works. But when it's enabled, highlighting does not work. When folium.features.GeoJsonTooltip is enabled, the code compiles without errors but it's not highlighting countries as it should. All other functionalities work as expected.
folium.Choropleth(
geo_data=df1,
name="choropleth",
data=df3,
columns=["Country", "Estimate_UN"],
key_on="feature.properties.name",
fill_color="YlGnBu",
fill_opacity=0.8,
line_opacity=0.5,
legend_name="GDP Per Capita (in EUR)",
bins=bins,
highlight=True
).add_to(my_map)
Here's my full code:
import folium
import pandas
import geopandas
pandas.set_option('display.max_columns',25)
pandas.set_option('display.width',2000)
pandas.set_option('display.max_rows',300)
url = 'http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)_per_capita'
tables1 = pandas.read_html(url, match='Country/Territory')
df1 = tables1[0] # changes data from List to DataFrame
# makes it single index
df1.columns = ['Country', 'Region', 'Estimate_IMF', 'Year1', 'Estimate_UN', 'Year2', 'Estimate_WB', 'Year3']
# makes it two columns only (Country, Estimate_UN)
df1 = df1.drop(columns=['Region', 'Year1', 'Year2', 'Year3', 'Estimate_IMF', 'Estimate_WB'])
df1['Country'] = df1['Country'].map(lambda x: x.rstrip('*'))
df1['Country'] = df1['Country'].map(lambda x: x.strip())
df1['Country'] = df1['Country'].str.replace('United States', 'United States of America')
df1['Country'] = df1['Country'].str.replace('DR Congo', 'Dem. Rep. Congo')
df1['Country'] = df1['Country'].str.replace('Central African Republic', 'Central African Rep.')
df1['Country'] = df1['Country'].str.replace('South Sudan', 'S. Sudan')
df1['Country'] = df1['Country'].str.replace('Czech Republic', 'Czechia')
df1['Country'] = df1['Country'].str.replace('Bosnia and Herzegovina', 'Bosnia and Herz.')
df1['Country'] = df1['Country'].str.replace('Ivory Coast', """Côte d'Ivoire""")
df1['Country'] = df1['Country'].str.replace('Dominican Republic', 'Dominican Rep.')
df1['Country'] = df1['Country'].str.replace('Eswatini', 'eSwatini')
df1['Country'] = df1['Country'].str.replace('Equatorial Guinea', 'Eq. Guinea')
df1.drop(df1[df1['Estimate_UN'] == '—'].index, inplace = True)
df1['Estimate_UN'] = df1['Estimate_UN'].apply(lambda g:int(str(g)))
### --- Change 'GDP Per Capita' values in GeoJsonToolTip from format of 12345.0 (USD) to €11,604 --- ###
df2 = df1.copy()
df2['Estimate_UN'] = df2['Estimate_UN'].apply(lambda g:g*0.94) # Convert USD to EUR
df3 = df2.copy()
df2['Estimate_UN'] = df2['Estimate_UN'].apply(lambda g:str(int(g)))
df2['Estimate_UN'] = '€' + df2['Estimate_UN'].astype(str)
length = (df2['Estimate_UN'].str.len())
df2.loc[length == 7, 'Estimate_UN'] = df2[['Estimate_UN']].astype(str).replace(r"(\d{3})(\d+)", r"\1,\2", regex=True)
df2.loc[length == 6, 'Estimate_UN'] = df2[['Estimate_UN']].astype(str).replace(r"(\d{2})(\d+)", r"\1,\2", regex=True)
df2.loc[length == 5, 'Estimate_UN'] = df2[['Estimate_UN']].astype(str).replace(r"(\d{1})(\d+)", r"\1,\2", regex=True)
### --- Create map --- ###
world = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
df1 = world.merge(df1, how='left', left_on=['name'], right_on=['Country'])
df1 = df1.dropna(subset=['Estimate_UN'])
df2 = world.merge(df2, how='left', left_on=['name'], right_on=['Country'])
df2 = df2.dropna(subset=['Estimate_UN'])
df3 = world.merge(df3, how='left', left_on=['name'], right_on=['Country'])
df3 = df3.dropna(subset=['Estimate_UN'])
my_map = folium.Map(location=(39.22753573470106, -3.650093262568073),
zoom_start=2,
tiles = 'https://server.arcgisonline.com/arcgis/rest/services/World_Street_Map/MapServer/tile/{z}/{y}/{x}',
attr = 'Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye,Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
min_zoom=2,
min_lot=-179,
max_lot=179,
min_lat=-65,
max_lat=179,
max_bounds=True)
### --- Add tooltip --- ###
gdp = folium.FeatureGroup(name="GDP")
gdp.add_child(folium.GeoJson(data=df2, tooltip = folium.features.GeoJsonTooltip(
fields=['Country','Estimate_UN'],
aliases=['Country:','GDP Per Capita:'],
style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;"),
localize = True),
style_function= lambda y:{
'stroke':'false',
'opacity':'0',
}))
### --- Color countries --- ###
bins = [100,1000,5000,10000,20000,35000,50000,112000]
folium.Choropleth(
geo_data=df1,
name="choropleth",
data=df3,
columns=["Country", "Estimate_UN"],
key_on="feature.properties.name",
fill_color="YlGnBu",
fill_opacity=0.8,
line_opacity=0.5,
legend_name="GDP Per Capita (in EUR)",
bins=bins,
highlight=True
).add_to(my_map)
my_map.add_child(gdp)
my_map.save('index.html')
I'm looking forward to your suggestions on why GeoJsonTooltip is stopping the highlight parameter from working!
My understanding is that folium.Choropleth() has a highlighting feature, but no popup or tooltip feature. If you want to use the tooltip and popup functions, use folium.Geojson(). I will respond with a df3 of the data you presented.
I have implemented my own color map for color coding. The index of
the color map is modified according to the number of colors. See this
for more information about our own colormaps.
The tooltip is introduced as you set it up. We have also added a
pop-up feature. You can add supplementary information. If you don't
need it, please delete it.
The color fill is specified by the style function, which gets the
color name from the estimated value for the previously specified
colormap function. At the same time, a highlight function is added to
change the transparency of the map drawing. The basic code can be found here.
import folium
from folium.features import GeoJsonPopup, GeoJsonTooltip
import branca
bins = [5000,25000,45000,65000,112000]
my_map = folium.Map(location=(39.22753573470106, -3.650093262568073),
zoom_start=2,
tiles = 'https://server.arcgisonline.com/arcgis/rest/services/World_Street_Map/MapServer/tile/{z}/{y}/{x}',
attr = 'Tiles © Esri — Source: Esri, i-cubed, USDA, USGS, AEX, GeoEye,Getmapping, Aerogrid, IGN, IGP, UPR-EGP, and the GIS User Community',
min_zoom=2,
min_lot=-179,
max_lot=179,
min_lat=-65,
max_lat=179,
max_bounds=True)
colormap = branca.colormap.LinearColormap(
vmin=df3['Estimate_UN'].quantile(0.0),
vmax=df3['Estimate_UN'].quantile(1),
colors=["red", "orange", "lightblue", "green", "darkgreen"],
caption="Original Colormap",
index=bins
)
tooltip = folium.features.GeoJsonTooltip(
fields=['Country','Estimate_UN'],
aliases=['Country:','GDP Per Capita:'],
style=("background-color: white; color: #333333; font-family: arial; font-size: 12px; padding: 10px;"),
localize=True)
popup = GeoJsonPopup(
fields=['Country','Estimate_UN'],
aliases=['Country:','GDP Per Capita:'],
localize=True,
labels=True,
style="background-color: yellow;",
)
folium.GeoJson(data=df3,
tooltip=tooltip,
popup=popup,
style_function= lambda y:{
"fillColor": colormap(y["properties"]["Estimate_UN"]),
'stroke':'false',
'opacity': 0.4
},
highlight_function=lambda x: {'fillOpacity': 0.8},
).add_to(my_map)
colormap.add_to(my_map)
# my_map.save('index.html')
my_map

how do i Determine a Cut-Off or Threshold When Working With Fuzzymatcher in python

Please help on the photo is a screenshot of my output and code as well, how do i use the best_match_score I NEED TO FILTER BY THE RETURNED "PRECISION SCORE" THE COLUMN ONLY COMES AFTER THE MERGE (i.e. JUST RETURN EVERYTHING with 'best_match_score' BELOW -1.06)
import fuzzymatcher
import pandas as pd
import os
# pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
REDCAP = pd.read_csv(r"C:\Users\Selamola\Desktop\PythonThings\FuzzyMatching\REDCAP Form A v1 and v2 23 Feb 211.csv")
covidSheet = pd.read_csv(r"C:\Users\Selamola\Desktop\PythonThings\FuzzyMatching\Cases missing REC ID 23 Feb 211.csv")
Data_merge = fuzzymatcher.fuzzy_left_join(covidSheet, REDCAP,
left_on=['Participant Name', 'Particfipant Surname', 'Screening Date',
'Screening Date', 'Hospital Number', 'Alternative Hospital Number'],
right_on=['Patient Name', 'Patient Surname', 'Date Of Admission',
'Date Of Sample Collection', 'Hospital Number', 'Hospital Number'])
# Merged_data = pd.merge(REDCAP, covidSheet, how='left',
# left_on=['Patient Name', 'Patient Surname'],
# right_on=['Participant Name', 'Particfipant Surname'])
# Data_merge.to_csv(r'C:\Users\Selamola\Desktop\PythonThings\FuzzyMatching\DataMacth.csv')
print(Data_merge)
This seems very straightforward unless I'm missing something. Be sure to try read the documentation about slicing data in pandas.
mask = Data_merge['best_match_score'] < .1.06
filtered_data = Data_merge[mask]

Pandas - EmptyDataError: No columns to parse from file when reading stock .csv file

Let me first start by saying I have gone through and done my due diligence trying to find a solution based on questions previously asked on the web.
I've run into an odd bug in my code that I really cannot explain...
So far my code executes the following:
take stock symbols and write OHLC data to a CSV file
loop through the directory that contains the CSV files and use that data to calculate technical indicators
add the technical indicator data to the same CSV file
So the bug is that it executes everything perfectly (99 stocks) EXCEPT for ZM.csv (Zoom). The error that it prints is"
pandas.errors.EmptyDataError: No columns to parse from file.
So to troubleshoot I copied and pasted the data from ZM.csv into a CSV that I know ran fine (I used AAPL) and it actually executed fine. Next, I took the working data from AAPL.csv, pasted it into ZM.csv and ran it again. It throws the same error. I also tried renaming the file to ZMI (randomly) and it worked.
This led me to believe that for some unknown reason that the FILENAME is the root issue. The part where I first create the CSV files, I changed the name of the file to be {symbol}1.csv, {symbol}_.csv, and {symbol}I.csv to no avail. Lastly, I combined the two files together and did not mess with anything else. It worked. Does anyone know why?
The flow is to first run bars.py, check the data/ohlc/ directory CSV files (should only have the OHLC data), run technical_analysis.py, and then check the CSV files again (now with technical indicators).
[bar.py]
from config import *
from datetime import datetime
import requests, json
holdings = open('data/qqq.csv').readlines()
symbols_list = [holding.split(',')[2].strip() for holding in holdings][1:]
symbols = ','.join(symbols_list)
minute_bars_url = '{}/1Min?symbols={}&limit=100'.format(BARS_URL, symbols)
r = requests.get(minute_bars_url, headers=HEADERS)
ohlc_data = r.json()
for symbol in ohlc_data:
filename = 'data/ohlc/{}.csv'.format(symbol)
f = open(filename, 'w+')
f.write('Timestamp,Open,High,Low,Close,Volume\n')
for bar in ohlc_data[symbol]:
t = datetime.fromtimestamp(bar['t'])
timestamp = t.strftime('%I:%M:%S%p-%Z%Y-%m-%d')
line = '{},{},{},{},{},{}\n'.format(timestamp, bar['o'], bar['h'],
bar['l'], bar['c'], bar['v'])
f.write(line)
The variables symbols_list and symbols print as follows:
symbols_list = ['AAPL', 'MSFT', 'AMZN', 'FB', 'GOOGL', 'GOOG', 'TSLA', 'NVDA', 'PYPL', 'ADBE', 'INTC', 'NFLX', 'CMCSA', 'PEP', 'COST', 'CSCO', 'AVGO', 'QCOM', 'TMUS', 'AMGN', 'TXN', 'CHTR', 'SBUX', 'ZM', 'AMD', 'INTU', 'ISRG', 'MDLZ', 'JD', 'GILD', 'BKNGLD', 'BKNG', 'FISV', 'MELI', 'ATVI', 'ADP', 'CSX', 'REGN', 'MU', 'AMAT', 'ADSK', 'VRTX', 'LRCX', 'ILMN', 'ADI', 'BIIB', 'MNST', 'EXC', 'KDP', 'LULU', 'DOCU', 'WDAY', 'CTSH', 'KHC', 'NXPI', 'BIDU', 'XEL', 'DXCM', 'EBAY', 'EA', 'ID', 'SNPS',XX', 'CTAS', 'SNPS', 'ORLY', 'SGEN', 'SPLK', 'ROST', 'WBA', 'KLAC', 'NTES', 'PCAR', 'CDNS', 'MAR', 'VRSK', 'PAYX', 'ASML', 'ANSS', 'MCHP', 'XLNX', 'MRNA', 'CPRT', 'ALGN', 'PDD', 'ALXN', 'SIRI', 'FAST', 'SWKS', 'VRSN', 'DLTR', 'CE 'TTWO', 'RN', 'MXIM', 'INCY', 'TTWO', 'CDW', 'CHKP', 'CTXS', 'TCOM', 'BMRN', 'ULTA', 'EXPE', 'FOXA', 'LBTYK', 'FOX', 'LBTYA']
symbols = AAPL,MSFT,AMZN,FB,GOOGL,GOOG,TSLA,NVDA,PYPL,ADBE,INTC,NFLX,CMCSA,PEP,COST,CSCO,AVGO,QCOM,TMUS,AMGN,TXN,CHTR,SBUX,ZM,AMD,INTU,ISRG,MDLZ,JD,GILD,BKNG,FISV,MELI,ATVI,ADP,CSX,REGN,MU,AMAT,ADSK,VRTX,LRCX,ILMN,ADI,BIIB,MNST,EXC,KDP,LULU,DOCU,WDAU,DOCU,WDAY,CTSH,KHC,NXPI,BIDU,XEL,DXCM,EBAY,EA,IDXX,CTAS,SNPS,ORLY,SGEN,SPLK,ROST,WBA,KLAC,NTES,PCAR,CDNS,MAR,VRSK,PAYX,ASML,ANSS,MCHP,XLNX,MRNA,CPRT,ALGN,PDD,ALXN,SIRI,FAST,SWKS,VRSN,DLTR,CERN,MXIM,INCY,TTWO,CDW,CHKP,CTXS,TCOM,EXPE,FOXA,BMRN,ULTA,EXPE,FOXA,LBTYK,FOX,LBTYA
So ZM is not listed last.
[technical_analysis.py]
import btalib
import pandas as pd
from datetime import datetime
from bars import ohlc_data
from bars import symbols_list as symbols
for symbol in symbols:
try:
file_path = f'data/ohlc/{symbol}.csv'
dataframe = pd.read_csv(file_path,
parse_dates=True,
index_col='Timestamp')
sma6 = btalib.sma(dataframe, period=6)
sma10 = btalib.sma(dataframe, period=10)
rsi = btalib.rsi(dataframe)
macd = btalib.macd(dataframe)
dataframe['SMA-6'] = sma6.df
dataframe['SMA-10'] = sma10.df
dataframe['RSI'] = rsi.df
dataframe['MACD'] = macd.df['macd']
dataframe['Signal'] = macd.df['signal']
dataframe['Histogram'] = macd.df['histogram']
f = open(file_path, 'w+')
dataframe.to_csv(file_path, sep=',', index=True)
except:
print(f'{symbol} is not writing the technical data.')
I think the error might be since 'ZM' is the last symbol in holdings, it contains some whitespace, due to in [bar.py] you created holdings the following way (instead of just the normal pd.read_csv):
holdings = open('data/qqq.csv').readlines()
symbols_list = [holding.split(',')[2].strip() for holding in holdings][1:]
symbols = ','.join(symbols_list)
You can probably reduce the code more to get a minimally viable example. I suspect there is something funny in the qqq.csv file and the split/strip code that makes the last entry not quite what you want.
Hopefully, that'll be clear printing the variable values as below.
with data/qqq.csv like
xname,yname,symbol
xxx,yyy,ZM
and py example
def write_OHLC(fname):
"write example data to a file"
f = open(fname, 'w+')
f.write('Timestamp,Open,High,Low,Close,Volume\n')
# IRL, would parse json and spitout meaningful values
f.write('2020-10-13 16:30,1,10,5,100\n')
def all_symbols():
"get list of all symbols from qqq.csv"
holdings = open('data/qqq.csv').readlines()
symbols_list = [holding.split(',')[2].strip() for holding in holdings][1:]
return symbols_list
# issue saving/reading last(?) symbol
symbols = all_symbols()
print(symbols)
# check just zoom
zm_sym = symbols[-1]
fname = f'data/ohlc/{zm_sym}.csv'
# inspect
print(zm_sym)
print(fname)
# write and read back
write_OHLC(fname)
ZM = pd.read_csv(fname,
parse_dates=True,
index_col='Timestamp')
print(ZM)

how to merge two or more list in custom order in python

I have the following code:
import pandas as pd
y = pd.ExcelFile('C:\\Users\\vibhu\\Desktop\\Training docs\\excel training\\super store data transformation\\Sample - Superstore data transformation by Vaibhav.xlsx')
superstore_orders = y.parse(sheet_name='Orders Input data')
superstore_orders.dtypes
factual_table= superstore_orders[['Order ID','Customer ID','Postal Code','Product ID','Product Name','Sales','Quantity','Discount','Profit' ]]
Order_table= superstore_orders[['Order ID','Order Date','Ship Date','Ship Mode']]
Order_table1= Order_table.drop_duplicates(subset='Order ID', keep='first', inplace=False)
Customer_table= superstore_orders[['Customer ID','Customer Name','Segment']]
Customer_table1= Customer_table.drop_duplicates(subset='Customer ID', keep='first', inplace=False)
Geographical_table= superstore_orders[['Postal Code','Country','City','State','Region']]
Geographical_table1= Geographical_table.drop_duplicates(subset='Postal Code', keep='first', inplace=False)
Product_table= superstore_orders[['Product ID','Category','Sub-Category','Product Name']]
Product_table1= Product_table.drop_duplicates(subset=['Product ID','Product Name'], keep='first', inplace=False)
Final_factual_data = pd.merge(Order_table1, factual_table, how='left', on='Order ID')
Final_factual_data = pd.merge(Customer_table1, Final_factual_data, how='left', on='Customer ID')
Final_factual_data = pd.merge(Geographical_table1,Final_factual_data,how='left', on='Postal Code')
Final_factual_data = pd.merge(Product_table1,Final_factual_data,how='left', on=['Product ID','Product Name'] )
Output is this format:- Product ID Category Sub-Category Product Name Postal Code Country City State Region Customer ID Customer Name Segment Order ID Order Date Ship Date Ship Mode Sales Quantity Discount Profit
I require reformatting in this order :
Order ID order date ship date ship mode Customer ID cutomer name segment Postal Code country city state reion Product ID Product Name product key cateory subcategory Sales Quantity Discount Profit
Final_factual_data1 = Final_factual_data [['Order ID','Order Date','Ship Date','Ship Mode','Customer ID','Customer Name','Segment','Country','City','State','Postal Code','Region','Product ID','Category','Sub-Category','Product Name','Sales','Quantity','Discount','Profit']]
this code help me to get the desired answer
Just assign the intended ordered sequence to columns attribute:
Final_factual_data.columns = ['Order ID', 'order date', 'ship date', 'ship mode', 'Customer ID', 'cutomer name', 'segment', 'Postal Code', 'country', 'city', 'state reion', 'Product ID', 'Product Name', 'product key', 'cateory', 'subcategory', 'Sales', 'Quantity', 'Discount', 'Profit']

How to iterate through pandas columns and replace cells with information with the next row down

I am trying to loop through a pandas dataframe column and based on if the next row down does not include "Property Address" add the information from that next row down to the previous row. For example, if I have a column that goes from top to bottom ["Property Address", "Alternate Address", "Property Address"] I would like to take the information from "Alternate Address" and add that information to the column above it ("Property Address"). I have already double checked that there are no trailing or leading spaces and that everything is lower case so that all comparisons will work. However, I still get this error:
if i == "Property Address" and df.loc[i+1, :] != "Property Address":
TypeError: must be str, not int
Does anyone have ideas on what I can do so that this will work? I am new to Python, and I am really lost. Please let me know if there is any more information that I should provide to make answering this question easier. Thanks
Here is my code so far:
import pandas as pd
import time
df = pd.read_excel('BRH.xls') # Reads the Excel File and creates a
dataframe
# Column Headers
df = df[['street', 'state', 'zip', 'Address Type', 'mStreet', 'mState', 'mZip']]
propertyAddress = "Property Address" # iterates thru column and replaces
the current row with info from next row down
for i in df['Address Type']:
if i == "Property Address" and df.loc[i+1, :] != "Property Address":
df['mStreet'] == df.loc[i + 1, 'street']
df['mState'] == df.loc[i + 1, 'state']
df['mZip'] = df.loc[i + 1, 'zip']
df.to_excel('BRHOut.xls')
print('operation complete in:', time.process_time(), 'ms')
You can use pd.Series.shift to construct an appropriate mask.
Here's some untested pseudo-code:
m1 = df['AddressType'].shift() == 'Property Address'
m2 = df['AddressType'] != 'Property Address'
mask = m1 & m2
for col in ['Street', 'State', 'Zip']:
df.loc[mask, 'm'+col] = df.loc[mask, col.lower()].shift(-1)
Your TypeError is happening because i is a string. When you call df.loc[i+1, :], you are attempting to do something like "Property Address" + 1. Once you resolve that, you will still have some indexing issues in the body of your for loop.
#jpp gave a very succinct answer, but I believe that it pulls information from the intended destination and writes it to the intended source. In other words, the roles of "Property Address" and "Alternate Address" are reversed. I believe this will deliver the correct result:
Setup
import pandas as pd
df = pd.DataFrame(data={
'street': [
'123 Main Street',
'1600 Pennsylvania Ave',
'567 Fake Ave',
'1 University Ave'
],
'state': ['CA', 'DC', 'DC', 'CA'],
'zip': ['95126', '20500', '20500', '94301'],
'Address Type': [
'Property Address',
'Alternate Address',
'Property Address',
'Alternate Address'
],
'mStreet': [None, None, None, None],
'mState': [None, None, None, None],
'mZip': [None, None, None, None],
},
columns=[
'street',
'state',
'zip',
'Address Type',
'mStreet',
'mState',
'mZip'
])
# Create a new dataframe with all address attributes shifted UP one row
next_address_attributes = df[['Address Type', 'street', 'state', 'zip']].shift(-1)
# Create a series to indicate whether information should be drawn from next row
# All the decision-making is right here
get_attributes_from_next_address = ((df['Address Type'] == 'Property Address')
& (next_address_attributes['Address Type'] != 'Property Address'))
Using For Loop
for i, getting_attributes_is_necessary in get_attributes_from_next_address.iteritems():
if getting_attributes_is_necessary:
df.at[i, 'mStreet'] = next_address_attributes.at[i, 'street']
df.at[i, 'mState'] = next_address_attributes.at[i, 'state']
df.at[i, 'mZip'] = next_address_attributes.at[i, 'zip']
Loopless
df.loc[get_attributes_from_next_address, 'mStreet'] = next_address_attributes.loc[get_attributes_from_next_address, 'street']
df.loc[get_attributes_from_next_address, 'mState'] = next_address_attributes.loc[get_attributes_from_next_address, 'state']
df.loc[get_attributes_from_next_address, 'mZip'] = next_address_attributes.loc[get_attributes_from_next_address, 'zip']

Categories