How to modify existing tables in powerpoint using pandas and python-pptx? - python

I have a table similar to this that I would like to modify:
I'm still new to this library and I can't seem to understand how to modify existing tables from the documentation. What's the best way to do it? I already have the dataframe (3 columns, 8 rows) and I can access the table (e.g. slide[0].shape[1]). How can I use this dataframe to update the data (movies, tickets sold, % watched?)

You'll have to write a short routine to do this cell by cell. Each cell in a PowerPoint table is a text-container and there's no "fill the cells of this table from a sequence of sequences (matrix)" method in the API.
Note that all values must be type str (or unicode if you're on Python 2). A PowerPoint table has no notion of numbers or formulas like Excel does. Each cell contains only text.
Something like this would do the trick in simple cases, but you can see already that there can be complexities like "what if I want to skip a header line" or "what if the matrix is a different size than the table", or "what if I want to format numbers with particular decimal places depending on the column", etc.
M = {"matrix", some sequence of sequences like a `numpy` array or list of lists}
table = {"table" object retrieved from a shape on a slide}
populate_table(table, M)
def populate_table(table, M):
for row in range(len(M)):
row_cells = table.rows[row].cells
for col in range(len(M[0]):
row_cells[col].text = str(M[row][col])
The code involved is small and easily customizeable and the complexity of designing and then using a general-purpose solution is fairly high, so this is left up to the user to work out for themselves.
There's nothing stopping you from creating a function of your own like the populate_table(matrix, table) above and using it wherever it suits you.

Related

Applying python dictionary to column of a CSV file

I have a CSV file that includes one column data that is not user friendly. I need to translate that data into something that makes sense. Simple find/replace seems bulky since there are dozens if not hundreds of different possible combinations I want to translate.
For instance: BLK = Black or MNT TP = Mountain Top
There are dozens if not hundreds of translations possible - I have lots of them already in a CSV table. The problem is how to use that dictionary to change the values in another CSV table. It is also important to note that this will (eventually) need to run on its own every few minutes - not just a one time translation.
It would be nice if you could describe in more detail what's the data you're working on. I'll do my best guess though.
Let's say you have a CSV file, you use pandas to read it into a data frame named df, and the "not user friendly" column named col.
To replace all the value in column col, first, you need a dictionary containing all the keys (original texts) and values (new texts):
my_dict = {"BLK": "Black", "MNT TP": Mountain Top,...}
Then, map the dictionary to the column:
df["col"] = df["col"].map(lambda x: my_dict.get(x, x))
If a key appears in the dictionary, it will be replaced by the new corresponding value in the dictionary, otherwise, it keeps the original value.

python equivalent to listObjects in VBA for Excel (tables)

I have implemented a program in VBA for excel to generate automatic communications based on user inputs (selections of cells).
Such Macro written in VBA uses extensively the listObject function of VBA
i.e.
defining a table (list object)
Dim ClsSht As Worksheet
Set ClsSht = ThisWorkbook.Sheets("paragraph texts")
Dim ClsTbl As ListObject
Set ClsTbl = ClsSht.ListObjects(1)
accessing the table in the code in a very logical manner:
ClsTbl being now the table where I want to pick up data.
myvariable= ClsTbl.listcolumns("D1").databodyrange.item(34).value
Which means myvariable is the item (row) 34 of the data of the column D1 of the table clstbl
I decided to learn python to "translate" all that code into python and make a django based program accesable for anyone.
I am a beginner in Python and I am wondering what would be the equivalent in python to listobject of VBA. This decision will shape my whole program in python from the beginning, and I am hesitating a lot to decide what is the python equivalent to listobject in VBA.
The main idea here getting a way where I can access tables-data in a readable way,
i.e. give me the value of column "text" where column "chapter" is 3 and column paragraph is "2". The values are unique, meaning there is only one value in "text" column where that occurs.
Some observations:
I know everything can be done with lists in python, lists can contain lists that can contain lists..., but this is terrible for readability. mylist1[2][3] (assuming for instance that every row could be a list of values, and the whole table a list of lists of rows).
I don't considered an option to build any database. There are multiple relatively small tables (from 10 to 500 rows and from 3 to 15 columns) that are related but not in a database manner. That would force me to learn yet another language SQL or so, and I have more than enough with python and DJango.
The user modifies the structure of many tables (chapters coming together or getting splitted.
the data is 100% strings. The only integers are numbers to sort out text. I don't perform any mathematical operation with values but simple add together pieces of text and make replacements in texts.
the tables will be load into Python as CSV text files.
Please indicate me if there is something not enough clear in the question and I will complete it
Would it be necesary to operate with numpy? pandas?
i.e give me the value of cell
A DataFrame using pandas should provide everything you need, i.e. converstion to strings, manipulation, import and export. As a start, try
import pandas as pd
df = pd.read_csv('your_file.csv')
print(df)
print(df['text'])
The entries of the first row will be converted to labels of the DataFrame columns.

Python categorize datatypes

I plan to make a 'table' class that I can use throughout my data-analyzis program to store gathered data to. Objective is to make simple tables like this:
ID Mean size Stdv Date measured Relative flatness
----------------------------------------------------------------
1 133.4242 34.43 Oct 20, 2013 32093
2 239.244 34.43 Oct 21, 2012 3434
I will follow the sqlite3 suggestion from this post: python-data-structure-for-maintaing-tabular-data-in-memory, but I will still need to save it as a csv file (not as a dbase) and I want it to eat my data as we go: add columns on the fly whenever new measures become available and are deemed to be interesting. For that the class will need to be able to determine the data type of the data thrown at it.
Sqlite3 has limited datatypes, float, int, date and string. Python and numpy together have many types. Is there an easy was to quickly decide what the datatype is of the variable? So my table class can automatically add a column when new data is entered containing new fields.
I am not too concerned about performance, the table should be fairly small.
I want to use my class like so:
dt = Table()
dt.add_record({'ID':5, 'Mean size':39.4334'})
dt.add_record({'ID':5, 'Goodness of fit': 12})
In the last line, there is new data. the Table class needs to figure out what kind of data that is and then add a column to the sqlite3 table. Making it all string seems a bit to floppy, I still want to keep my high precision floats correct....
Also: If something like this already exists, I'd like to know about it.
It seems that your question is: "Is there an easy was to quickly decide what the datatype is of the variable?". This is a simple question, and the answer is:
type(variable).
But the context you provide requires a more careful answer.
Since SQLite3 only provides only a few data types (slightly different ones than what you said), you need to map your input variables to the types provided by SQLite3.
But you may encounter further problems: You may need to change the types of columns as you receive new records, if you do not want to require that the column type be fixed in advance.
For example, for the Goodness of fit column in your example, you get an int (12) first. But you may get a float (e.g. 10.1) the second time, which shows that both values must be interpreted as floats. And if next time you receive a string, then all of them must be strings, right? But then the exact formatting of the numbers also counts: whereas 12 and 12.0 are the same when you interpret them as floats, they are not when you interpret them as strings; and the first value may become "12.0" when you convert all of them to strings.
So either you throw an exception when the type of consecutive values for the same column do not match, or you try to convert the previous values according to the new ones; but occasionally you may need to re-read the input.
Nevertheless, once you make those decision regarding the expected behavior, it should not be a very difficult problem to implement.
Regarding your last question: I personally do not know of an existing implementation to this problem.

pytables - how to copy a row to memory

I have a pytable. I often need to copy the rows to an in-memory object and then insert into another pytable. I am wondering what is the easiest way to do this. The following code does not work as one cannot convert a Row object to a dict.
for row in hf.root.my_table.iterrows():
rec = dict(row)
Also, I want to conditionally copy the data to another file, possibly adding 1-2 new columns.To do this, I will need to extract the table description from one table, modify it, and use the modified table description to create a new table. How can I do that?
This won't work either. In general, I find my way of using pytables a little bit awkward, and would like to know better way of doing things.
As mentioned in the documentation, you can use the Row.fetch_all_fields method to obtain to obtain an independent copy which retains its information after the table file is closed, for example.
Your example code would look like
for row in hf.root.my_table.iterrows():
rec = row.fetch_all_fields()
rec is a numpy void scalar with the same keys as the row; rec['field'] yields the same data as row['field'].

How to store numerical lookup table in Python (with labels)

I have a scientific model which I am running in Python which produces a lookup table as output. That is, it produces a many-dimensional 'table' where each dimension is a parameter in the model and the value in each cell is the output of the model.
My question is how best to store this lookup table in Python. I am running the model in a loop over every possible parameter combination (using the fantastic itertools.product function), but I can't work out how best to store the outputs.
It would seem sensible to simply store the output as a ndarray, but I'd really like to be able to access the outputs based on the parameter values not just indices. For example, rather than accessing the values as table[16][5][17][14] I'd prefer to access them somehow using variable names/values, for example:
table[solar_z=45, solar_a=170, type=17, reflectance=0.37]
or something similar to that. It'd be brilliant if I were able to iterate over the values and get their parameter values back - that is, being able to find out that table[16]... corresponds to the outputs for solar_z = 45.
Is there a sensible way to do this in Python?
Why don't you use a database? I have found MongoDB (and the official Python driver, Pymongo) to be a wonderful tool for scientific computing. Here are some advantages:
Easy to install - simply download the executables for your platform (2 minutes tops, seriously).
Schema-less data model
Blazing fast
Provides map/reduce functionality
Very good querying functionalities
So, you could store each entry as a MongoDB entry, for example:
{"_id":"run_unique_identifier",
"param1":"val1",
"param2":"val2" # etcetera
}
Then you could query the entries as you will:
import pymongo
data = pymongo.Connection("localhost", 27017)["mydb"]["mycollection"]
for entry in data.find(): # this will yield all results
yield entry["param1"] # do something with param1
Whether or not MongoDB/pymongo are the answer to your specific question, I don't know. However, you could really benefit from checking them out if you are into data-intensive scientific computing.
If you want to access the results by name, then you could use a python nested dictionary instead of ndarray, and serialize it in a .JSON text file using json module.
One option is to use a numpy ndarray for the data (as you do now), and write a parser function to convert the query values into row/column indices.
For example:
solar_z_dict = {...}
solar_a_dict = {...}
...
def lookup(dataArray, solar_z, solar_a, type, reflectance):
return dataArray[solar_z_dict[solar_z] ], solar_a_dict[solar_a], ...]
You could also convert to string and eval, if you want to have some of the fields to be given as "None" and be translated to ":" (to give the full table for that variable).
For example, rather than accessing the values as table[16][5][17][14]
I'd prefer to access them somehow using variable names/values
That's what numpy's dtypes are for:
dt = [('L','float64'),('T','float64'),('NMSF','float64'),('err','float64')]
data = plb.loadtxt(argv[1],dtype=dt)
Now you can access the data elements using date['T']['L']['NMSF']
More info on dtypes:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html

Categories