python equivalent to listObjects in VBA for Excel (tables) - python

I have implemented a program in VBA for excel to generate automatic communications based on user inputs (selections of cells).
Such Macro written in VBA uses extensively the listObject function of VBA
i.e.
defining a table (list object)
Dim ClsSht As Worksheet
Set ClsSht = ThisWorkbook.Sheets("paragraph texts")
Dim ClsTbl As ListObject
Set ClsTbl = ClsSht.ListObjects(1)
accessing the table in the code in a very logical manner:
ClsTbl being now the table where I want to pick up data.
myvariable= ClsTbl.listcolumns("D1").databodyrange.item(34).value
Which means myvariable is the item (row) 34 of the data of the column D1 of the table clstbl
I decided to learn python to "translate" all that code into python and make a django based program accesable for anyone.
I am a beginner in Python and I am wondering what would be the equivalent in python to listobject of VBA. This decision will shape my whole program in python from the beginning, and I am hesitating a lot to decide what is the python equivalent to listobject in VBA.
The main idea here getting a way where I can access tables-data in a readable way,
i.e. give me the value of column "text" where column "chapter" is 3 and column paragraph is "2". The values are unique, meaning there is only one value in "text" column where that occurs.
Some observations:
I know everything can be done with lists in python, lists can contain lists that can contain lists..., but this is terrible for readability. mylist1[2][3] (assuming for instance that every row could be a list of values, and the whole table a list of lists of rows).
I don't considered an option to build any database. There are multiple relatively small tables (from 10 to 500 rows and from 3 to 15 columns) that are related but not in a database manner. That would force me to learn yet another language SQL or so, and I have more than enough with python and DJango.
The user modifies the structure of many tables (chapters coming together or getting splitted.
the data is 100% strings. The only integers are numbers to sort out text. I don't perform any mathematical operation with values but simple add together pieces of text and make replacements in texts.
the tables will be load into Python as CSV text files.
Please indicate me if there is something not enough clear in the question and I will complete it
Would it be necesary to operate with numpy? pandas?
i.e give me the value of cell

A DataFrame using pandas should provide everything you need, i.e. converstion to strings, manipulation, import and export. As a start, try
import pandas as pd
df = pd.read_csv('your_file.csv')
print(df)
print(df['text'])
The entries of the first row will be converted to labels of the DataFrame columns.

Related

Converting for loop to numpy calculation for pandas dataframes

So I have a python script that compares two dataframes and works to find any rows that are not in both dataframes. It currently iterates through a for loop which is slow.
I want to improve the speed of the process, and know that iteration is the problem. However, I haven't been having much luck using various numpy methods such as merge and where.
Couple of caveats:
The column names from my file sources aren't the same, so I set their names into variables and use the variable names to compare.
I want to only use the column names from one of the dataframes.
df_new represents new information to be checked against what is currently on file (df_current)
My current code:
set_current = set(df_current[current_col_name])
df_out = pd.DataFrame(columns=df_new.columns)
for i in range(len(df_new.index)):
# if the row entry is new, we add it to our dataset
if not df_new[new_col_name][i] in set_current:
df_out.loc[len(df_out)] = df_new.iloc[i]
# if the row entry is a match, then we aren't going to do anything with it
else:
continue
# create a xlsx file with the new items
df_out.to_excel("data/new_products_to_examine.xlsx", index=False)
Here are some simple examples of dataframes I would be working with:
df_current
|partno|description|category|cost|price|upc|brand|color|size|year|
|:-----|:----------|:-------|:---|:----|:--|:----|:----|:---|:---|
|123|Logo T-Shirt||25|49.99||apple|red|large|2021||
|456|Knitted Shirt||35|69.99||apple|green|medium|2021||
df_new
|mfgr_num|desc|category|cost|msrp|upc|style|brand|color|size|year|
|:-------|:---|:-------|:---|:---|:--|:----|:----|:----|:---|:---|
|456|Knitted Shirt||35|69.99|||apple|green|medium|2021|
|789|Logo Vest||20|39.99|||apple|yellow|small|2022|
There are usually many more columns in the current sheet, but I wanted the table displayed to be somewhat readable. The key is that I would only want the columns in the "new" dataframe to be output.
I would want to match partno with mfgr_num since the spreadsheets will always have them, whereas some items don't have upc/gtin/ean.
It's still a unclear what you want without providing examples of each dataframe. But if you want to test unique IDs in differently named columns in two different dataframes, try an approach like this.
Find the IDs that exist in the second dataframe
test_ids = df2['cola_id'].unique().tolist()
the filter the first dataframe for those IDs.
df1[df1['keep_id'].isin(test_ids)]
Here is the answer that works - was supplied to me by someone much smarter.
df_out = df_new[~df_new[new_col_name].isin(df_current[current_col_name])]

Applying python dictionary to column of a CSV file

I have a CSV file that includes one column data that is not user friendly. I need to translate that data into something that makes sense. Simple find/replace seems bulky since there are dozens if not hundreds of different possible combinations I want to translate.
For instance: BLK = Black or MNT TP = Mountain Top
There are dozens if not hundreds of translations possible - I have lots of them already in a CSV table. The problem is how to use that dictionary to change the values in another CSV table. It is also important to note that this will (eventually) need to run on its own every few minutes - not just a one time translation.
It would be nice if you could describe in more detail what's the data you're working on. I'll do my best guess though.
Let's say you have a CSV file, you use pandas to read it into a data frame named df, and the "not user friendly" column named col.
To replace all the value in column col, first, you need a dictionary containing all the keys (original texts) and values (new texts):
my_dict = {"BLK": "Black", "MNT TP": Mountain Top,...}
Then, map the dictionary to the column:
df["col"] = df["col"].map(lambda x: my_dict.get(x, x))
If a key appears in the dictionary, it will be replaced by the new corresponding value in the dictionary, otherwise, it keeps the original value.

How to modify existing tables in powerpoint using pandas and python-pptx?

I have a table similar to this that I would like to modify:
I'm still new to this library and I can't seem to understand how to modify existing tables from the documentation. What's the best way to do it? I already have the dataframe (3 columns, 8 rows) and I can access the table (e.g. slide[0].shape[1]). How can I use this dataframe to update the data (movies, tickets sold, % watched?)
You'll have to write a short routine to do this cell by cell. Each cell in a PowerPoint table is a text-container and there's no "fill the cells of this table from a sequence of sequences (matrix)" method in the API.
Note that all values must be type str (or unicode if you're on Python 2). A PowerPoint table has no notion of numbers or formulas like Excel does. Each cell contains only text.
Something like this would do the trick in simple cases, but you can see already that there can be complexities like "what if I want to skip a header line" or "what if the matrix is a different size than the table", or "what if I want to format numbers with particular decimal places depending on the column", etc.
M = {"matrix", some sequence of sequences like a `numpy` array or list of lists}
table = {"table" object retrieved from a shape on a slide}
populate_table(table, M)
def populate_table(table, M):
for row in range(len(M)):
row_cells = table.rows[row].cells
for col in range(len(M[0]):
row_cells[col].text = str(M[row][col])
The code involved is small and easily customizeable and the complexity of designing and then using a general-purpose solution is fairly high, so this is left up to the user to work out for themselves.
There's nothing stopping you from creating a function of your own like the populate_table(matrix, table) above and using it wherever it suits you.

Efficiency list of dictionaries

Hello I'm moving from VB to python and I'm doing some of my first projects to learn the syntax and the basics of the language, what I'm trying to do now is a sort of simulation of a "management app" before working with databases I'm doing it first with files.
What I do is having this file (which will be my database in the future) where I have stored the informations about some employees the data I store are
I'd ,name ,surname ,date of birth ,status, code , contract
On the file I have them stored like this
1|Bob|Brown|07/12/1985|Active|202020|1
(The pin is a number I generate to let the user "login" to see his informations and the contract is an id of the contracts that I have on another file so the foreign key)
Now I store all of these in a list of dictionaries so my overall data structure would look like this
[{Id:1,name:Bob, surname:Brown, dateB:07/12/1985,status:Active,code:202020, contact:1},{Id:2,name:Josh, surname:Allen, dateB:05/02/1999,status:Active,code:202021, contact:3}]
Each time I add a new employee I create a new dictionary like
NewEmpl = dict(id=3,name=Robert,surname=Lasky,dateB=03/11/1997,status=Active, code=202022, contract=2)
list_employees.append(NewEmpl)
F.write(str(id)+"|"+name+"|"+surname.....
update both the file and the list but I wonder if there is a more efficient way to store the data than how I'm doing right know with a list of dictionaries
You can use pandas. It is designed to be used for tabular data. Since it's written using numpy which uses c++ it is very efficient. You can turn your list of dicts to a DataFrame as follows:
df = pd.DataFrame(list_employees)
And to a csv it is as simple as:
df.to_csv('file.csv')

Python categorize datatypes

I plan to make a 'table' class that I can use throughout my data-analyzis program to store gathered data to. Objective is to make simple tables like this:
ID Mean size Stdv Date measured Relative flatness
----------------------------------------------------------------
1 133.4242 34.43 Oct 20, 2013 32093
2 239.244 34.43 Oct 21, 2012 3434
I will follow the sqlite3 suggestion from this post: python-data-structure-for-maintaing-tabular-data-in-memory, but I will still need to save it as a csv file (not as a dbase) and I want it to eat my data as we go: add columns on the fly whenever new measures become available and are deemed to be interesting. For that the class will need to be able to determine the data type of the data thrown at it.
Sqlite3 has limited datatypes, float, int, date and string. Python and numpy together have many types. Is there an easy was to quickly decide what the datatype is of the variable? So my table class can automatically add a column when new data is entered containing new fields.
I am not too concerned about performance, the table should be fairly small.
I want to use my class like so:
dt = Table()
dt.add_record({'ID':5, 'Mean size':39.4334'})
dt.add_record({'ID':5, 'Goodness of fit': 12})
In the last line, there is new data. the Table class needs to figure out what kind of data that is and then add a column to the sqlite3 table. Making it all string seems a bit to floppy, I still want to keep my high precision floats correct....
Also: If something like this already exists, I'd like to know about it.
It seems that your question is: "Is there an easy was to quickly decide what the datatype is of the variable?". This is a simple question, and the answer is:
type(variable).
But the context you provide requires a more careful answer.
Since SQLite3 only provides only a few data types (slightly different ones than what you said), you need to map your input variables to the types provided by SQLite3.
But you may encounter further problems: You may need to change the types of columns as you receive new records, if you do not want to require that the column type be fixed in advance.
For example, for the Goodness of fit column in your example, you get an int (12) first. But you may get a float (e.g. 10.1) the second time, which shows that both values must be interpreted as floats. And if next time you receive a string, then all of them must be strings, right? But then the exact formatting of the numbers also counts: whereas 12 and 12.0 are the same when you interpret them as floats, they are not when you interpret them as strings; and the first value may become "12.0" when you convert all of them to strings.
So either you throw an exception when the type of consecutive values for the same column do not match, or you try to convert the previous values according to the new ones; but occasionally you may need to re-read the input.
Nevertheless, once you make those decision regarding the expected behavior, it should not be a very difficult problem to implement.
Regarding your last question: I personally do not know of an existing implementation to this problem.

Categories