How to hide the index column of a pandas dataframe? - python

Help please, I need to delete the 'date' index column, or else 'date' will appear in the first column with the actions
heat_ds = pd.DataFrame(columns=['PFE','GS','BA','NKE','V','AAPL','TSLA','NVDA','MRK','CVX','UNH'])
heat_ds['PFE'] = pfizer['Close']
heat_ds['GS'] = goldmans['Close']
heat_ds['BA'] = boeingc['Close']
heat_ds['NKE'] = nike['Close']
heat_ds['V'] = visa['Close']
heat_ds['AAPL'] = aaple['Close']
heat_ds['TSLA'] = tesla['Close']
heat_ds['NVDA'] = tesla['Close']
heat_ds['MRK'] = tesla['Close']
heat_ds['CVX'] = chevronc['Close']
heat_ds['UNH'] = unitedh['Close']

First of all date represents index. To drop it first reset index to remove date from index of dataframe and make it a normal column and then drop that column.
heat_ds = heat_ds.reset_index()
heat_ds = heat_ds.drop('index', axis=1)
or in one line
heat_ds = heat_ds.reset_index(drop=True)

Deleting the index is probably not the best approach here.
If you are concerned about display, Styler.hide_index() or Styler.hide() (depending on your version of Pandas) would work. Usage examples here.
For my older version of Pandas,
df.style.hide_index()
in Jupyter cell works just fine. Of course, for exporting to csv, you would use index=False if needed.
If you wish to still print the index, but hide the extra offset caused by the index name, you can set the latter to None:
df.index.name = None

Related

In pandas, how do I insert a new row into a dataframe one column value at a time

I want to insert a new row into my dataframe, one value at a time, so I know exactly which values going into which column, don't judge me.
Here is what I have but printing it show empty dataframe. I am checking if date already exist to insert a new row or get the existing row for that date.
if(trade["date"] in self.df["date"]):
row = self.df[self.df.date == trade["date"]]
else:
row = self.df.append(pd.Series(), ignore_index=True)
row["date"] = trade["date"]
row["direction"] = trade["direction"]
row["type"] = trade["type"]
row["strategy"] = trade["strategy"]
row["strike"] = trade["strike"]
row["shortLeg"] = trade["shortLeg"]
row["longLeg"] = trade["longLeg"]
row["shortLeg_strike"] = trade["shortLeg_strike"]
row["longLeg_strike"] = trade["longLeg_strike"]
row["maxRisk"] = trade["maxRisk"]
row["maxReturn"] = trade["maxReturn"]
row["returnRatio"] = trade["returnRatio"]
row["breakevenPrice"] = trade["breakevenPrice"]
row["profitTargetPrice"] = trade["profitTargetPrice"]
print(self.df)
Oh figure it out, stupid mistake, the row is not linked to the original dataframe.
self.df = row

Problem importing data from Excel to Treeview

I created a treeview to display the data imported from an excel file.
def afficher():
fichier = r"*.xlsx"
df = pd.read_excel(fichier)
for row in df:
refOF = row['refOF']
refP = row['refP']
refC = row['refC']
nbreP = row['nbreP']
dateFF = row['dateFF']
self.ordreF.insert("", 0, values=(refOF, refP, refC, nbreP, dateFF))
but I encounter the following error:
refOF = row['refOF']
TypeError: string indices must be integers
please tell me how I can solve this problem.
Another way is replacing the original for loop with the following:
for tup in df[['refOF', 'refP', 'refC', 'nbreP', 'dateFF']].itertuples(index=False, name=None):
self.ordreF.insert("", 0, values=tup)
It works because df.itertuples(index=False, name=None) returns a regular tuple without index in the assigned column order. The tuple can be fed into the values= argument directly.
With your loop you are actually not iterating over the rows, but over the column names. That is the reason for the error message, because row is the string with the colum name and if you use [] you need to specify an integer or an integer based slice, but not a string.
To make your code work, you would just need to modify your code a bit to iterate over the rows:
def afficher():
fichier = r"*.xlsx"
df = pd.read_excel(fichier)
for idx, row in df.iterrows():
refOF = row['refOF']
refP = row['refP']
refC = row['refC']
nbreP = row['nbreP']
dateFF = row['dateFF']
self.ordreF.insert("", 0, values=(refOF, refP, refC, nbreP, dateFF))

Randomization of a list with conditions using Pandas

I'm new to any kind of programming as you can tell by this 'beautiful' piece of hard coding. With sweat and tears (not so bad, just a little), I've created a very sequential code and that's actually my problem. My goal is to create a somewhat-automated script - probably including for-loop (I've unsuccessfully tried).
The main aim is to create a randomization loop which takes original dataset looking like this:
dataset
From this data set picking randomly row by row and saving it one by one to another excel list. The point is that the row from columns called position01 and position02 should be always selected so it does not match with the previous pick in either of those two column values. That should eventually create an excel sheet with randomized rows that are followed always by a row that does not include values from the previous pick. So row02 should not include any of those values in columns position01 and position02 of the row01, row3 should not contain values of the row2, etc. It should also iterate in the range of the list length, which is 0-11. Important is also the excel output since I need the rest of the columns, I just need to shuffle the order.
I hope my aim and description are clear enough, if not, happy to answer any questions. I would appreciate any hint or help, that helps me 'unstuck'. Thank you. Code below. (PS: I'm aware of the fact that there is probably much more neat solution to it than this)
import pandas as pd
import random
dataset = pd.read_excel("C:\\Users\\ibm\\Documents\\Psychopy\\DataInput_Training01.xlsx")
# original data set use for comparisons
imageDataset = dataset.loc[0:11, :]
# creating empty df for storing rows from imageDataset
emptyExcel = pd.DataFrame()
randomPick = imageDataset.sample() # select randomly one row from imageDataset
emptyExcel = emptyExcel.append(randomPick) # append a row to empty df
randomPickIndex = randomPick.index.tolist() # get index of the row
imageDataset2 = imageDataset.drop(index=randomPickIndex) # delete the row with index selected before
# getting raw values from the row 'position01'/02 are columns headers
randomPickTemp1 = randomPick['position01'].values[0]
randomPickTemp2 = randomPick
randomPickTemp2 = randomPickTemp2['position02'].values[0]
# getting a dataset which not including row values from position01 and position02
isit = imageDataset2[(imageDataset2.position01 != randomPickTemp1) & (imageDataset2.position02 != randomPickTemp1) & (imageDataset2.position01 != randomPickTemp2) & (imageDataset2.position02 != randomPickTemp2)]
# pick another row from dataset not including row selected at the beginning - randomPick
randomPick2 = isit.sample()
# save it in empty df
emptyExcel = emptyExcel.append(randomPick2, sort=False)
# get index of this second row to delete it in next step
randomPick2Index = randomPick2.index.tolist()
# delete the another row
imageDataset3 = imageDataset2.drop(index=randomPick2Index)
# AND REPEAT the procedure of comparison of the raw values with dataset already not including the original row:
randomPickTemp1 = randomPick2['position01'].values[0]
randomPickTemp2 = randomPick2
randomPickTemp2 = randomPickTemp2['position02'].values[0]
isit2 = imageDataset3[(imageDataset3.position01 != randomPickTemp1) & (imageDataset3.position02 != randomPickTemp1) & (imageDataset3.position01 != randomPickTemp2) & (imageDataset3.position02 != randomPickTemp2)]
# AND REPEAT with another pick - save - matching - picking again.. until end of the length of the dataset (which is 0-11)
So at the end I've used a solution provided by David Bridges (post from Sep 19 2019) on psychopy websites. In case anyone is interested, here is a link: https://discourse.psychopy.org/t/how-do-i-make-selective-no-consecutive-trials/9186
I've just adjusted the condition in for loop to my case like this:
remaining = [choices[x] for x in choices if last['position01'] != choices[x]['position01'] and last['position01'] != choices[x]['position02'] and last['position02'] != choices[x]['position01'] and last['position02'] != choices[x]['position02']]
Thank you very much for the helpful answer! and hopefully I did not spam it over here too much.
import itertools as it
import random
import pandas as pd
# list of pair of numbers
tmp1 = [x for x in it.permutations(list(range(6)),2)]
df = pd.DataFrame(tmp1, columns=["position01","position02"])
df1 = pd.DataFrame()
i = random.choice(df.index)
df1 = df1.append(df.loc[i],ignore_index = True)
df = df.drop(index = i)
while not df.empty:
val = list(df1.iloc[-1])
tmp = df[(df["position01"]!=val[0])&(df["position01"]!=val[1])&(df["position02"]!=val[0])&(df["position02"]!=val[1])]
if tmp.empty: #looped for 10000 times, was never empty
print("here")
break
i = random.choice(tmp.index)
df1 = df1.append(df.loc[i],ignore_index = True)
df = df.drop(index=i)

df.replace() not being converted into the text or csv file

When I use:
df = df.replace(oldvalue, newvalue)
it replaces the file, but when I try to put the new dataframe into either a text file or a csv file, it does not update and continues to be the original output before the replace.
I am getting the data from two files and trying to add them together. Right now I am trying to change the formatting to match the original formatting.
I have tried altering the placement of the replacement, as well as editing my df.replace command numerous times to either include regrex=True, to_replace, value=, and other small things. Below is a small sampling of code.
drdf['adg'] = adgvals #adds adg values into dataframe
for column, valuex in drdf.iteritems():
#value = value.replace('444.000', '444.0')
for indv in valuex:
valuex = valuex.replace('444.000', '444.0')
for difindv in valuex:
fourspace = ' '
if len(difindv) == 2:
indv1 = difindv + fourspace
value1 = valuex.replace(difindv, indv1)
drdf = drdf.replace(to_replace=valuex, value=value1)
#Transfers new dataframe into new text file
np.savetxt(r'/Users/username/test.txt', drdf.values, fmt='%s', delimiter='' )
drdf.to_csv(r'/Users/username/089010219.tot')
It should be replacing the values (for example 40 with 40(four spaces). It does this within the spyder interface, but it does not translate into the files that are being created.
Did you try:
df.replace(old, new, inplace=True)
Inplace essentially puts the new value 'inplace' of the old in some cases. However, I do not claim to know all the inner technical workings of inplace.
This is how I would do it with map:
drdf['adg'] = adgvals #adds adg values into dataframe
for column, valuex in drdf.iteritems():
#value = value.replace('444.000', '444.0')
for indv in valuex:
valuex = valuex.map('444.000':'444.0')
for difindv in valuex:
fourspace = ' '
if len(difindv) == 2:
indv1 = difindv + fourspace
value1 = valuex.map(difindv:indv1)
drdf = drdf.replace(valuex,value1)
#Transfers new dataframe into new text file
np.savetxt(r'/Users/username/test.txt', drdf.values, fmt='%s', delimiter='' )
drdf.to_csv(r'/Users/username/089010219.tot')

Inserting Values into a Multilevel Index Dataframe

I have created a dataframe as shown:
idx = pd.MultiIndex.from_product([['batch1', 'batch2','batch3', 'batch4', 'batch5'], ['quiz1', 'quiz2']])
cols=['noofpresent', 'lesserthan50', 'between50and60', 'between60and70', 'between70and80', 'greaterthan80']
statdf = pd.DataFrame('-', idx, cols)
statdf
statdf.loc['quiz1', 'noofpresent'] = qdf1.b4ispresent.count()
statdf.loc['quiz2', 'noofpresent'] = qdf2.b4ispresent.count()
statdf.noopresent = qdf1.b4ispresent.count()
statdf.noopresent = qdf2.b4ispresent.count()
statdf
Then I made some calculations. I now want to append that specific calculation of the figures '50' and '53' in column 'noofpresent' in 'batch4', 'quiz1' and 'quiz2' respectively. But instead this happened...
How can I insert my data into the right place?
you can index it like this.
statdf.loc['batch4','quiz1']['noofpresent'] = qdf1.b4ispresent.count()
statdf.loc['batch4','quiz2']['noofpresent'] =qdf2.b4ispresent.count()

Categories