I created a treeview to display the data imported from an excel file.
def afficher():
fichier = r"*.xlsx"
df = pd.read_excel(fichier)
for row in df:
refOF = row['refOF']
refP = row['refP']
refC = row['refC']
nbreP = row['nbreP']
dateFF = row['dateFF']
self.ordreF.insert("", 0, values=(refOF, refP, refC, nbreP, dateFF))
but I encounter the following error:
refOF = row['refOF']
TypeError: string indices must be integers
please tell me how I can solve this problem.
Another way is replacing the original for loop with the following:
for tup in df[['refOF', 'refP', 'refC', 'nbreP', 'dateFF']].itertuples(index=False, name=None):
self.ordreF.insert("", 0, values=tup)
It works because df.itertuples(index=False, name=None) returns a regular tuple without index in the assigned column order. The tuple can be fed into the values= argument directly.
With your loop you are actually not iterating over the rows, but over the column names. That is the reason for the error message, because row is the string with the colum name and if you use [] you need to specify an integer or an integer based slice, but not a string.
To make your code work, you would just need to modify your code a bit to iterate over the rows:
def afficher():
fichier = r"*.xlsx"
df = pd.read_excel(fichier)
for idx, row in df.iterrows():
refOF = row['refOF']
refP = row['refP']
refC = row['refC']
nbreP = row['nbreP']
dateFF = row['dateFF']
self.ordreF.insert("", 0, values=(refOF, refP, refC, nbreP, dateFF))
Related
I'm very new to python. I think it's very simple thing but I can't. What I have to do is removing some strings of one column's each value from specific strings.
available_list
AE,SG,MO
KR,CN
SG
MO,MY
all_list = 'AE,SG,MO,MY,KR,CN,US,HK,YS'
I want to remove available_list values from all_list.
What I tried is following code.
col1 = df['available_list']
all_ori = 'AE,SG,MO,MY,KR,CN,US,HK,YS'.split(',')
all_c = all_ori.copy()
result=[]
for i in col1:
for s in i:
all_c.remove(s)
result.append(all_c)
all_c = all_main.copy()
result_df = pd.DataFrame({'Non-Priviliges' : result})
But the result was,
|Non-Priviliges|
|[MY, KR, CN, US, HK, YS]|
|[SG, MO, US, HK, YS]|
|[AE, SG, KR, CN, US, HK, YS]|
The problems are "[", "]". How I remove them?
And after replacing them,
I want to paste this series to existing excel file, next-to the column named "Priviliges".
Could you give me some advice? thanks!
Assuming your filename is "hello.xlsx", Following is my answer:
import pandas as pd
df = pd.read_excel('hello.xlsx')
all_list_str = 'AE,SG,MO,MY,KR,CN,US,HK,YS'
all_list = all_list_str.split(',')
def find_non_priv(row):
#convert row item string value to list
row_list = row.split(',')
return ','.join(list(set(all_list) - set(row_list)))
# pandas apply is used to call function to each row items.
df['Non-Priviliges'] = df['available_list'].apply(find_non_priv)
df.to_excel('output.xlsx')
So I have a dataframe called reactions_drugs
and I want to create a table called new_r_d where I keep track of how often a see a symptom for a given medication like
Here is the code I have but I am running into errors such as "Unable to coerce to Series, length must be 3 given 0"
new_r_d = pd.DataFrame(columns = ['drugname', 'reaction', 'count']
for i in range(len(reactions_drugs)):
name = reactions_drugs.drugname[i]
drug_rec_act = reactions_drugs.drug_rec_act[i]
for rec in drug_rec_act:
row = new_r_d.loc[(new_r_d['drugname'] == name) & (new_r_d['reaction'] == rec)]
if row == []:
# create new row
new_r_d.append({'drugname': name, 'reaction': rec, 'count': 1})
else:
new_r_d.at[row,'count'] += 1
Assuming the rows in your current reactions (drug_rec_act) column contain one string enclosed in a list, you can convert the values in that column to lists of strings (by splitting each string on the comma delimiter) and then utilize the explode() function and value_counts() to get your desired result:
df['drug_rec_act'] = df['drug_rec_act'].apply(lambda x: x[0].split(','))
df_long = df.explode('drug_rec_act')
result = df_long.groupby('drugname')['drug_rec_act'].value_counts().reset_index(name='count')
I'm passing dataframe from mapInPandas function in pyspark. so I need all values of ID column should be seperated by comma(,) like this 'H57R6HU87','A1924334','496A4806'
x1['ID'] looks like this
H57R6HU87
A1924334
496A4806'
Here is my code to get unique ID's, I am getting TypeError: string indices must be integers
# batch_iter= cust.toPandas()
for x1 in batch_iter:
IDs= ','.join(f"'{i}'" for i in x1['ID'].unique())
You probably don't need a loop, try:
batch_iter = cust.toPandas()
IDs = ','.join(f"'{i}'" for i in batch_iter['ID'].unique())
Or you can try using Spark functions only:
df2 = df.select(F.concat_ws(',', F.collect_set('ID')).alias('ID'))
If you want to use mapInPandas:
def pandas_func(iter):
for x1 in iter:
IDs = ','.join(f"'{i}'" for i in x1['ID'].unique())
yield pd.DataFrame({'ID': IDs}, index=[0])
df.mapInPandas(pandas_func)
# But I suspect you want to do this instead:
# df.repartition(1).mapInPandas(pandas_func)
I'm trying to loop through a list of dataframes and perform operations on them. In the final command I want to rename the dataframe as the original key plus '_rand_test'. I'm getting the error:
SyntaxError: cannot assign to operator
Is there a way to do this?
segments = [main_h, main_m, main_l]
seg_name = ['main_h', 'main_m', 'main_l']
for i in segments:
control = pd.DataFrame(i.groupby('State', group_keys=False).apply(lambda x : x.sample(frac = .1)))
control['segment'] = 'control'
test= i[~i.index.isin(control.index)]
test['segment'] = 'test'
seg_name[i]+'_rand_test' = pd.concat([control,test])
The error is because you are trying to perform addition on the left side of an = sign, which you can never do. If you want to rename the dataframe you could just do it on the next line. I'm unsure of what exactly you're trying to rename based off of the code, but if it's just the corresponding string in the seg_name list then the next line would look like this:
seg_name[segments.index(i)] += 'rand_test'
The reason for the segments.index(i) is because you're looping over the elements in segments, not their indexes, so you need to get the index of the element.
Maybe this will work for you?
Create an empty list befor you run the loop and fill that list with append function. And then you rename all the elements of the new list.
segments = [main_h, main_m, main_l]
seg_name = ['main_h', 'main_m', 'main_l']
new_list= []
for i in segments:
control = pd.DataFrame(i.groupby('State', group_keys=False).apply(lambda x : x.sample(frac = .1)))
control['segment'] = 'control'
test= i[~i.index.isin(control.index)]
test['segment'] = 'test'
new_list.append(df)
new_names_list=[item +'_rand_test' for item in new_list]
I am trying to create a dataframe with Python, which works fine with the following command:
df_test2 = DataFrame(index = idx, data=(["-54350","2016-06-25T10:29:57.340Z","2016-06-25T10:29:57.340Z"]))
but, when I try to get the data from a variable instead of hard-coding it into the data argument; eg. :
r6 = ["-54350", "2016-06-25T10:29:57.340Z", "2016-06-25T10:29:57.340Z"]
df_test2 = DataFrame(index = idx, data=(r6))
I expect this is the same and it should work? But I get:
ValueError: DataFrame constructor not properly called!
Reason for the error:
It seems a string representation isn't satisfying enough for the DataFrame constructor
Fix/Solutions:
import ast
# convert the string representation to a dict
dict = ast.literal_eval(r6)
# and use it as the input
df_test2 = DataFrame(index = idx, data=(dict))
which will solve the error.