How to add/append to python object?

How to add/append to python object? - python

I have a python object that was created and defined by swagger editor. I have written a program to populate it. However, instead of appending it, it keeps replacing previous entries. Below is my code;
from xlrd import open_workbook
from abc_model import BES as bes
for sheet in wb.sheet():
if sheet.name == "ABC"
number_columns = sheet.ncols
for i in range(2,number_of_columns):
xyz = bes(name = sheet.cell(19,i).value))
model.abc_model = xyz
print(model)
This only prints and asssign the content of column 4 (assuming there are total 4 columns). However, it should have contents of columns 3 and 4.
Any idea what I am doing wrong?

Instead of model.abc_model = xyz try model.abc_model += xyz
You're setting your column to the last value obtained, rather than adding them together with your iteration. This is why you're only getting the second value, and not the first or both.

As #Dylan Smite mentioned, use model.abc_model += xyz. This line of code basically mean that model.abc_model = model.abc_model + xyz . It means you add xyz to model.abc_model and then assign it back to model.abc_model

Related

Auto Increment column value is larger than I expected

When I put data in DB with python,
I met a problem that auto increment column value is larger than I expected.
Assume that I use the following function multiple times to put data into the DB.
'db_engine' is a DB, which contains table 'tbl_student' and 'tbl_score'.
To manage the total number of student, table 'tbl_student' has Auto increment column named 'index'.
def save_in_db(db_engine, dataframe):
# tbl_student
student_dataframe = pd.DataFrame({
"ID":dataframe['ID'],
"NAME":dataframe['NAME'],
"GRADE":dataframe['GRADE'],
})
student_dataframe.to_sql(name="tbl_student",con=db_engine, if_exists='append', index=False)
# tbl_score
score_dataframe = pd.DataFrame({
"SCORE_MATH": dataframe['SCORE_MATH'],
"SCORE_SCIENCE":dataframe['SCORE_SCIENCE'],
"SCORE_HISTORY":dataframe['SCORE_HISTORY'],
})
score_dataframe.to_sql(name="tbl_score",con=db_engine, if_exists='append', index=False)
'tbl_student' after some inputs is as follows:
index
ID
NAME
GRADE
0
2023001
Amy
1
1
2023002
Brady
1
2
2023003
Caley
4
6
2023004
Dee
2
7
2023005
Emma
2
8
2023006
Favian
3
12
2023007
Grace
3
13
2023008
Harry
3
14
2023009
Ian
3
Please take a look column 'index'.
When I put in several times, 'index' has larger value than I expected.
What should I try to solve this problem?

You could try:
student_dataframe.reset_index()

Actually, the problem situation is 'index' part connected to another table as a FOREIGN KEY.
Every time I add a data, the error occurred because there was no key(because the index value is not continuous!).
I solve this problem by checking the index part once before put data in DB and setting it as key.
Following code is what I tried.
index_no = get_index(db_engine)
dataframe.index = dataframe.index + index_no + 1 - len(dataframe)
dataframe.reset_index(inplace=True)
If anyone has the same problem, it could be nice way to try another way rather than trying to make auto increment key sequential.

filling in columns with info from other file based on condition

So there are 2 csv files im working with:
file 1:
City KWR1 KWR2 KWR3
Killeen
Killeen
Houston
Whatever
file2:
location link reviews
Killeen www.example.com 300
Killeen www.differentexample.com 200
Killeen www.example3.com 100
Killeen www.extraexample.com 20
Here's what im trying to make this code do:
look at the 'City' in file one, take the top 3 links in file 2 (you can go ahead and assume the cities wont get mixed up) and then put these top 3 into the KWR1 KWR2 KWR3 columns for all the same 'City' values.
so it gets the top 3 and then just copies them to the right of all the Same 'City' values.
even asking this question correctly is difficult for me, hope i've provided enough information.
i know how to read the file in with pandas and all that, just cant code this exact situation in...

It is a little unusual requirement but I think you need to three steps:
1. Keep only the first three values you actually need.
df = df.sort_values(by='reviews',ascending=False).groupby('location').head(3).reset_index()
Hopefully this keeps only the first three from every city.
Then you somehow need to label your data, there might be better ways to do this but here is one way:- You assign a new column with numbers and create a user defined function
import numpy as np
df['nums'] = np.arange(len(df))
Now you have a column full of numbers (kind of like line numbers)
You create your function then that will label your data...
def my_func(index):
if index % 3 ==0 :
x = 'KWR' + str(1)
elif index % 3 == 1:
x = 'KWR' + str(2)
elif index % 3 == 2:
x = 'KWR' + str(3)
return x
You can then create the labels you need:
df['labels'] = df.nums.apply(my_func)
Then you can do:
my_df = pd.pivot_table(df, values='reviews', index=['location'], columns='labels', aggfunc='max').reset_index()
Which literally pulls out the labels (pivots) and puts the values in to the right places.

Save each Excel-spreadsheet-row with header in separate .txt-file (saved as a parameter-sample to be read by simulation programs)

I'm a building energy simulation modeller with an Excel-question to enable automated large-scale simulations using parameter samples (samples generated using Monte Carlo). Now I have the following question in saving my samples:
I want to save each row of an Excel-spreadsheet in a separate .txt-file in a 'special' way to be read by simulation programs.
Let's say, I have the following excel-file with 4 parameters (a,b,c,d) and 20 values underneath:
a b c d
2 3 5 7
6 7 9 1
3 2 6 2
5 8 7 6
6 2 3 4
Each row of this spreadsheet represents a simulation-parameter-sample.
I want to store each row in a separate .txt-file as follows (so 5 '.txt'-files for this spreadsheet):
'1.txt' should contain:
a=2;
b=3;
c=5;
d=7;
'2.txt' should contain:
a=6;
b=7;
c=9;
d=1;
and so on for files '3.txt', '4.txt' and '5.txt'.
So basically matching the header with its corresponding value underneath for each row in a separate .txt-file ('header equals value;').
Is there an Excel add-in that does this or is it better to use some VBA-code? Anybody some idea?
(I'm quit experienced in simulation modelling but not in programming, therefore this rather easy parameter-sample-saving question in Excel. (Solutions in Python are also welcome if that's easier for you people))

my idea would be to use Python along with Pandas as it's one of the most flexible solutions, as your use case might expand in the future.
I'm gonna try making this as simple as possible. Though I'm assuming, that you have Python, that you know how to install packages via pip or conda and are ready to run a python script on whatever system you are using.
First your script needs to import pandas and read the file into a DataFrame:
import pandas as pd
df = pd.read_xlsx('path/to/your/file.xlsx')
(Note that you might need to install the xlrd package, in addition to pandas)
Now you have a powerful data structure, that you can manipulate in plenty of ways. I guess the most intuitive one, would be to loop over all items. Use string formatting, which is best explained over here and put the strings together the way you need them:
outputs = {}
for row in df.index:
s = ""
for col in df.columns:
s += "{}={};\n".format(col, df[col][row])
print(s)
now you just need to write to a file using python's io method open. I'll just name the files by the index of the row, but this solution will overwrite older text files, created by earlier runs of this script. You might wonna add something unique like the date and time or the name of the file you read to it or increment the file name further with multiple runs of the script, for example like this.
All together we get:
import pandas as pd
df = pd.read_excel('path/to/your/file.xlsx')
file_count = 0
for row in df.index:
s = ""
for col in df.columns:
s += "{}={};\n".format(col, df[col][row])
file = open('test_{:03}.txt'.format(file_count), "w")
file.write(s)
file.close()
file_count += 1
Note that it's probably not the most elegant way and that there are one liners out there, but since you are not a programmer I thought you might prefer a more intuitive way, that you can tweak yourself easily.

I got this to work in Excel. You can expand the length of the variables x,y and z to match your situation and use LastRow, LastColumn methods to find the dimensions of your data set. I named the original worksheet "Data", as shown below.
Sub TestExportText()
Dim Hdr(1 To 4) As String
Dim x As Long
Dim y As Long
Dim z As Long
For x = 1 To 4
Hdr(x) = Cells(1, x)
Next x
x = 1
For y = 1 To 5
ThisWorkbook.Sheets.Add After:=Sheets(Sheets.Count)
ActiveSheet.Name = y
For z = 1 To 4
With ActiveSheet
.Cells(z, 1) = Hdr(z) & "=" & Sheets("Data").Cells(x + 1, z) & ";"
End With
Next z
x = x + 1
ActiveSheet.Move
ActiveWorkbook.ActiveSheet.SaveAs Filename:="File" & y & ".txt", FileFormat:=xlTextWindows
ActiveWorkbook.Close SaveChanges:=False
Next y
End Sub

If you can save your Excel spreadsheet as a CSV file then this python script will do what you want.
with open('data.csv') as file:
data_list = [l.rstrip('\n').split(',') for l in file]
counter = 1
for x in range (1, len (data_list)) :
output_file_name = str (counter) + '.txt'
with open (output_file_name, 'w' ) as file :
for x in range (len (data_list [counter])) :
print (x)
output_string = data_list [0] [x] + '=' + data_list [counter] [x] + ';\n'
file.write (output_string)
counter += 1

Python: creating dataframe with filename and file last modify time

i want to read file name in folder which i already did using file=glob.glob... function.
and add in 'file_last_mod_t' column last modify file time.
my part of code:
df=pd.DataFrame(columns=['filename','file_last_mod_t','else'])
df.set_index('filename')
for file in glob.glob('folder_path'): #inside this folder is file.txt
file_name=os.path.basename('folder_path')
df.loc[file_name]= os.path.getmtime(file)
which gives me:
df:
filename,file_last_mod_t,else
file.txt,123456,123456 #123456 its time result example
i want to add this last modify time only to file_last_mod_t column, not for all.
i want to receive :
df:
filename,file_last_mod_t,else
file.txt,123456,
thanks in advice
after code modification:
df=pd.read_csv('C:/df.csv')
filename_list= pd.Series(result_from_other_definition)# it looks same as in #filename column
df['filename']=filename_list # so now i have dataframe with 3 columns and firs column have files list
df.set_index('filename')
for file in glob.glob('folder_path'):#inside this folder is file.txt
df['file_last_mod_t']=df['filename'].apply(lambda x: (os.path.getmtime(x)) #the way how getmtime is present is now no matter, could be #float numbers
df.to_csv('C:/df.csv')
#printing samples:
first run:
df['filename']=filename_list
print (df)
,'filename','file_last_mod_t','else'
0,file1.txt,NaN,NaN
1,file2.txt,NaN,NaN
code above works fine after first run when df is empty , only with headers.
after next run when i run the code and df.csv have some content i am changing manually value of timestamp in file, i am receiving an error : TypeError: stat: path should be string, bytes, os.PathLike or integer,not float this code should replace manually modified cell with good timestamp. i think it's connected with apply
also i dont know why index appear in df
**solved **

Please see comment on code as following:
import os
import pandas as pd
import datetime as dt
import glob
# this is the function to get file time as string
def getmtime(x):
x= dt.datetime.fromtimestamp(os.path.getmtime(x)).strftime("%Y-%m-%d %H:%M:%d")
return x
df=pd.DataFrame(columns=['filename','file_last_mod_t','else'])
df.set_index('filename')
# I set filename list to df['filename']
df['filename'] = pd.Series([file for file in glob.glob('*')])
# I applied a time modified file to df['file_last_mod_t'] by getmtime function
df['file_last_mod_t'] = df['filename'].apply(lambda x: getmtime(x))
print (df)
The result is
filename file_last_mod_t else
0 dataframe 2019-05-04 18:43:04 NaN
1 fer2013.csv 2018-05-26 12:18:26 NaN
2 file.txt 2019-05-04 18:49:04 NaN
3 file2.txt 2019-05-04 18:51:04 NaN
4 Untitled.ipynb 2019-05-04 17:41:04 NaN
5 Untitled1.ipynb 2019-05-04 20:51:04 NaN
For the updated question, I started with df.csv what have data as following:
filename,file_last_mod_t,else
file1.txt,,
And, I think you want to add new files. So, I made the code as following:
import os
import pandas as pd
df=pd.read_csv('df.csv')
df_adding=pd.DataFrame(columns=['filename','file_last_mod_t','else'])
df_adding['filename'] = pd.Series(['file2.txt'])
df = df.append(df_adding)
df = df.drop_duplicates('filename')
df['file_last_mod_t']=df['filename'].apply(lambda x: (os.path.getmtime(x))) #the way how getmtime is present is now no matter, could be #float numbers
df.to_csv('df.csv', index=False)
I created df_adding dataframe for new files and I appended it to df which read df.csv.
Finally, we can apply getmtime and save if to df.csv.

Python 2.7 - xlrd - Matching A String To a Cell Value

Using Python 2.7 on Mac OSX Lion with xlrd
My problem is relatively simple and straightforward. I'm trying to match a string to an excel cell value, in order to insure that other data, within the row that value will be matched to, is the correct value.
So, say for instance that player = 'Andrea Bargnani' and I want to match a row that looks like this:
Draft Player Team
1 Andrea Bargnani - Toronto Raptors
I do:
num_rows = draftSheet.nrows - 1
cur_row = -1
while cur_row < num_rows:
cur_row += 1
row = draftSheet.row(cur_row)
if row[1] == player:
ranking == row[0]
The problem is that the value of row[1] is text:u'Andrea Bargnani, as opposed to just Andrea Bargnani.
I know that Excel, after Excel 97, is all unicode. But even if I do player = u'Andrea Bargnani' there is still the preceding text:. So I tried player = 'text:'u'Andrea Bargnani', but when the variable is called it ends up looking like u'text: Andrea Bargnani and still does not produce a match.
I would like to then just strip the test: u' off of the returned row[1] value in order to get an appropriate match.

You need to get a value from the cell.
I've created a sample excel file with a text "Andrea Bargnani" in the A1 cell. And here the code explaining the difference between printing the cell and it's value:
import xlrd
book = xlrd.open_workbook("input.xls")
sheet = book.sheet_by_index(0)
print sheet.cell(0, 0) # prints text:u'Andrea Bargnani'
print sheet.cell(0, 0).value # prints Andrea Bargnani
Hope that helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to add/append to python object? - python

Instead of model.abc_model = xyz try model.abc_model += xyz You're setting your column to the last value obtained, rather than adding them together with your iteration. This is why you're only getting the second value, and not the first or both.

As #Dylan Smite mentioned, use model.abc_model += xyz. This line of code basically mean that model.abc_model = model.abc_model + xyz . It means you add xyz to model.abc_model and then assign it back to model.abc_model

Related

Auto Increment column value is larger than I expected

filling in columns with info from other file based on condition

Save each Excel-spreadsheet-row with header in separate .txt-file (saved as a parameter-sample to be read by simulation programs)

Python: creating dataframe with filename and file last modify time

Python 2.7 - xlrd - Matching A String To a Cell Value

Categories

Resources