output dictionary data as a table in console - python

I would like to output dict data in the form of a table in the console:
dtc={ "test_case1_short":{"test_step11":"pass","test_step12":"pass","test_step_13":{"status":"failed","details":"ca marche po"},
"test_case2_longest_name":{"test_step21":"ne","test_step22":"ne"},
"test_case3_medium_name":{"test_step31":"ne","test_step32":"ne"} }
note: for french speakers 'dtc' is a shortcut for dict_test_collection (!)
To build this table I would like to determine the size of key names in order to dimension my column headers.
I can get my key names max length doing this:
max = 0
for i in list(dtc.keys()):
if max < len(i):
max = len(i)
print(max)
but I find this not very straighforward ... is there a way to get this information from dict.keys() or another dict feature ?
Besides, I'd like to set separators like "+-----------------------------+" for section headers and "| |" for section bodies, to have a good looking table.
In section bodies, is there a straight and simple way to set the table and columns width (ie. '|' caracter ending up at column 50, whatever the text on the line, like padding the line with spaces until a certain column)
Thank you
Alexandre

is there a way to get this information from dict.keys() or another dict feature ?
This seems straightforward to me:
max_key_len = max(len(key) for key in dtc)
print(max_key_len)
This one seems less straightforward, but it is shorter:
max_key_len = max(map(len, dtc))
print(max_key_len)

Related

Is there a way to "transform" a CSV table into a simple nested if... else block in python?

I'm fairly new to python and I'm looking forward to achieve the following:
I have a table with several conditions as in the image below (maximum 5 conditions) along with various attributes. Each condition comes from a specific set of values, for example Condition 1 has 2 possible values, Condition 2 has 4 possible values, Condition 3 has 2 possible values etc..
What I would like to do: From the example table above, I would like to generate a simple python code so that when I execute my function and import a CSV file containing the table above, I should get the following output saved as a *.py file:
def myFunction(Attribute, Condition):
if Attribute1 & Condition1:
myValue = val_11
if Attribute1 & Condition2:
myValue = val_12
...
...
if Attribute5 & Condition4:
myValue = val_54
NOTE: Each CSV file will contain only one sheet and the titles for the columns do not change.
UPDATE, NOTE#2: Both "Attribute" and "Condition" are string values, so simple string comparisons would suffice.
Is there a simple way to do this? I dove into NLP and realized that it is not possible (at least from what I found in the literature). I'm open to all forms of suggestions/answers.
You can't really use "If"s and "else"s, since, if I understand your question correctly, you want to be able to read the conditions, attributes and values from a CSV file. Using "If"s and "else"s, you would only be able to check a fixed range of conditions and attributes defined in your code. What I would do, is to write a parser (piece of code, which reads the contents of your CSV file and saves it in another, more usable form).
In this case, the parser is the parseCSVFile() function. Instead of the ifs and elses comparing attributes and conditions, you now use the attributes and conditions to access a specific element in a dictionary (similar to an array or list, but you can now use for example string keys instead of the numerical indexes). I used a dictionary containing a dictionary at each position to split the CSV contents into their rows and columns. Since I used dictionaries, you can now use the strings of the Attributes and Conditions to access your values instead of doing lots of comparisons.
#Output Dictionary
ParsedDict = dict()
#This is either ';' or ',' depending on your operating system or you can open a CSV file with notepad for example to check which character is used
CSVSeparator = ';'
def parseCSVFile(filePath):
global ParsedDict
f = open(filePath)
fileLines = f.readlines()
f.close()
#Extract the conditions
ConditionsArray = (fileLines[0].split(CSVSeparator))[1:]
for x in range(len(fileLines)-1):
#Remove unwanted characters such as newline characters
line = fileLines[1 + x].strip()
#Split by the CSV separation character
LineContents = line.split(CSVSeparator)
ConditionsDict = dict()
for y in range(len(ConditionsArray)):
ConditionsDict.update({ConditionsArray[y]: LineContents[1 + y]})
ParsedDict.update({LineContents[0]: ConditionsDict})
def myFunction(Attribute, Condition):
myValue = ParsedDict[Attribute][Condition]
The "[1:]" is to ignore the contents in the first column (empty field at the top left and the "Attribute x" fields) when reading either the conditions or the values
Use the parseCSVFile() function to extract the information from the csv file
and the myFunction() to get the value you want

How to mach and get the value from dict n python?

dict ={"Rahul":"male",
"sahana":"female"
"pavan":"male" }
in a text file we have
rahul|sharma
sahana|jacob
Pavan|bhat
in a python program we have to open the text file and read the all line and "Name" we have to match with dict what we have and make a new text file with gender..
OUTPUT SHOULD BE LIKE
rahul|sharma|male
sahana|jacob|female
Pavan|bhat|male
It would seem to me that this is roughly what you want. Note that your formatting for input and output was slightly off, but I'm pretty sure I've got it.
genders = {"rahul":"male",
"sahana":"female",
"pavan":"male" }
with open("input.txt") as in_file:
for line in in_file:
a, b = line.strip().split("|")
gen = genders[a]
print("{}|{}|{}".format(a, b, gen))
where input.txt contains
rahul|sharma
sahana|jacob
pavan|bhat
will correctly (I think) produce the output
rahul|sharma|male
sahana|jacob|female
pavan|bhat|male
I have changed all of your data to be lowercase, as with your casing, it would have been ambiguous as to how to lookup in the dictionary, and how to end up providing output (only one key was capital-cased, so I couldn't use any kind of reasonable string function to accomodate the keys as they were). I've also had to add a comma to your dictionary.
I've also renamed your dictionary - it's no longer dict, because dict is a Python builtin. It seems a bit strange to me that you will have available in your code a dictionary that can anticipate your input file, but this is what I got from the question.
To get the value for the key in a dict, the syntax is simply:
b = "Rahul"
dict = {"Rahul":"male", "Mahima":"female"}
dict[b]

Add missing dictionary key/value via raw_input

import collections
header_dict = {'account number':'ACCOUNT_name','accountID':'ACCOUNT_name','name':'client','first name':'client','tax id':'tin'}
#header_dict = collections.defaultdict(lambda: 'tin') # attempted use of defaultdict...destroys my dictionary
given_header = ['account number','name','tax id']#,'tax identification number']#,'social security number'
#given_header = ['account number','name','tax identification number']...non working header layout
fileLayout = [header_dict[ting] for ting in given_header if ting] #create if else..if ting exists, add to list...else if not in list, add to dictionary
def getLayout(ting):
global given_header
global fileLayout
return given_header[fileLayout.index(ting)]
print getLayout('ACCOUNT_name')
print getLayout('client')
print getLayout('tin')
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
I am working with many files of random, mixed up layouts/column orders. I have a set template for my db table of 'ACCOUNT_name','client','tin' that I want the files to be ordered in. I have created a dictionary of the possible header/column names I might find in other files as keys and my set header names as values. So, for example, if I wanted to see where to put the column 'account number' from one of my given files, I would type header_dict['account number'].
This would give me the corresponding column from my template, 'ACCOUNT_name'. This works great...I also added another feature. Instead of having to type 'account number'..I made a list comprehension that looks up each value by key.
This list I just created with the 'fileLayout' list comprehension essentially transforms my given file's header into my desired names: ['ACCOUNT_name','client']
That makes life a lot easier...I know that I want to look up 'ACCOUNT_name', or 'client'. Next I run a function 'getLayout' that returns the index of the desired columns I am searching...So if I want to see where my desired column 'ACCOUNT_name' is in the file, I just run the function which is called like this...
getLayout('ACCOUNT_name')
Now at this point, I can easily print the columns to my order...with:
rows = zip((getLayout('ACCOUNT_name'),getLayout('client'),getLayout('tin')))
print rows
The above code gives me [('account number'),('name'),('tax id')], which is exactly what I want...
But what if there is a new header I am not used to ?? Lets use the same example code above but change the list 'given_header' to this:
given_header = ['account number','name','tax identification number']
I most certainly get the key error, KeyError: 'tax identification number' I know I can use defaultdict but when I try to use it with the set value 'tin', I end up overwriting my entire dictionary... What I would ultimately like to end up doing is this...
I would like to create an else within my list comprehension that allows me to standard input dictionary entries if they don't exist. In other words, since 'tax identification number' does not exists as a key, add it as one to my dict and give it the value 'tin' via raw_input. Has anyone ever done or tried anything like this? Any ideas? If you have and have any suggestions, I am all ears. I'm struggling on this issue...
The way I would want to go about this is in the list comprehension..
fileLayout = [header_dict[ting] for ting in given_header if ting else raw_input('add missing key value pair to dictionary')] # or do something of the sort.

From an (ID, number) pair keep only those pairs that contain the largest number

I am new in python and I would like some help for a small problem. I have a file whose each line has an ID plus an associated number. More than one numbers can be associated to the same ID. How is it possible to get only the ID plus the largest number associated with it in python?
Example:
Input: ID_file.txt
ENSG00000133246 2013 ENSG00000133246 540
ENSG00000133246 2010
ENSG00000253626 465
ENSG00000211829 464
ENSG00000158458 2577
ENSG00000158458 2553
What I want is the following:
ENSG00000133246 2013
ENSG00000253626 465
ENSG00000211829 464
ENSG00000158458 2577
Thanks in advance for any help!
I would think there are many ways to do this I would though use a dictionary
from collections import defaultdict
id_value_dict = defaultdict()
for line in open(idfile.txt).readlines():
id, value = line.strip().split()
if id not in id_value_dict:
id_value_dict[id] = int(value)
else:
if id_value_dict[id] < int(value):
id_value_dict[id] = int(value)
Next step is to get the dictionary written out
out_ref = open(outputfile.txt,'w')
for key, value in id_value_dict:
outref.write(key + '\t' + str(value)
outref.close()
There are slicker ways to do this, I think the dictionary could be written in a one-liner using a lamda or a list-comprehension but I like to start simple
Just in case you need the results sorted there are lots of ways to do it but I think it is critical to understand working with lists and dictionaries in python as I have found that the learning to think about the right data container is usually the key to solving many of my problems but I am still a new. Any way if you need the sorted results a straightforward was is to
id_value_dict.keys().sort()
SO this is one of the slick things about python id_value__dict.keys() is a list of the keys of the dictionary sorted
out_ref = open(outputfile.txt,'w')
for key in id_value_dict.keys():
outref.write(key + '\t' + str(id_value_dict[key])
outref.close()
its really tricky because you might want (I know I always want) to code
my_sorted_list = id_value_dict.keys().sort()
However you will find that my_sorted_list does not exist (NoneType)
Given that your input consists of nothing but contiguous runs for each ID—that is, as soon as you see another ID, you never see the previous ID again—you can just do this:
import itertools
import operator
with open('ID_file.txt') as idfile, open('max_ID_file.txt', 'w') as maxidfile:
keyvalpairs = (line.strip().split(None, 1) for line in idfile)
for key, group in itertools.groupby(keyvalpairs, operator.itemgetter(0)):
maxval = max(int(keyval[1]) for keyval in group)
maxidfile.write('{} {}\n'.format(key, maxval))
To see what this does, let's go over it line by line.
A file is just an iterable full of lines, so for line in idfile means exactly what you'd expect. For each line, we're calling strip to get rid of extraneous whitespace, then split(None, 1) to split it on the first space, so we end up with an iterable full of pairs of strings.
Next, we use groupby to change that into an iterable full of (key, group) pairs. Try printing out list(keyvalpairs) to see what it looks like.
Then we iterate over that, and just use max to get the largest value in each group.
And finally, we print out the key and the max value for the group.

Generating a .CSV with Several Columns - Use a Dictionary?

I am writing a script that looks through my inventory, compares it with a master list of all possible inventory items, and tells me what items I am missing. My goal is a .csv file where the first column contains a unique key integer and then the remaining several columns would have data related to that key. For example, a three row snippet of my end-goal .csv file might look like this:
100001,apple,fruit,medium,12,red
100002,carrot,vegetable,medium,10,orange
100005,radish,vegetable,small,10,red
The data for this is being drawn from a couple sources. 1st, a query to an API server gives me a list of keys for items that are in inventory. 2nd, I read in a .csv file into a dict that matches keys with item name for all possible keys. A snippet of the first 5 rows of this .csv file might look like this:
100001,apple
100002,carrot
100003,pear
100004,banana
100005,radish
Note how any key in my list of inventory will be found in this two column .csv file that gives all keys and their corresponding item name and this list minus my inventory on hand yields what I'm looking for (which is the inventory I need to get).
So far I can get a .csv file that contains just the keys and item names for the items that I don't have in inventory. Give a list of inventory on hand like this:
100003,100004
A snippet of my resulting .csv file looks like this:
100001,apple
100002,carrot
100005,radish
This means that I have pear and banana in inventory (so they are not in this .csv file.)
To get this I have a function to get an item name when given an item id that looks like this:
def getNames(id_to_name, ids):
return [id_to_name[id] for id in ids]
Then a function which gives a list of keys as integers from my inventory server API call that returns a list and I've run this function like this:
invlist = ServerApiCallFunction(AppropriateInfo)
A third function takes this invlist as its input and returns a dict of keys (the item id) and names for the items I don't have. It also writes the information of this dict to a .csv file. I am using the set1 - set2 method to do this. It looks like this:
def InventoryNumbers(inventory):
with open(csvfile,'w') as c:
c.write('InvName' + ',InvID' + '\n')
missinginvnames = []
with open("KeyAndItemNameTwoColumns.csv","rb") as fp:
reader = csv.reader(fp, skipinitialspace=True)
fp.readline() # skip header
invidsandnames = {int(id): str.upper(name) for id, name in reader}
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
print missinginvnameswithids
with open(csvfile,'a') as c:
for invname, invid in missinginvnameswithids.iteritems():
c.write(invname + ',' + str(invid) + '\n')
return missinginvnameswithids
Which I then call like this:
InventoryNumbers(invlist)
With that explanation, now on to my question here. I want to expand the data in this output .csv file by adding in additional columns. The data for this would be drawn from another .csv file, a snippet of which would look like this:
100001,fruit,medium,12,red
100002,vegetable,medium,10,orange
100003,fruit,medium,14,green
100004,fruit,medium,12,yellow
100005,vegetable,small,10,red
Note how this does not contain the item name (so I have to pull that from a different .csv file that just has the two columns of key and item name) but it does use the same keys. I am looking for a way to bring in this extra information so that my final .csv file will not just tell me the keys (which are item ids) and item names for the items I don't have in stock but it will also have columns for type, size, number, and color.
One option I've looked at is the defaultdict piece from collections, but I'm not sure if this is the best way to go about what I want to do. If I did use this method I'm not sure exactly how I'd call it to achieve my desired result. If some other method would be easier I'm certainly willing to try that, too.
How can I take my dict of keys and corresponding item names for items that I don't have in inventory and add to it this extra information in such a way that I could output it all to a .csv file?
EDIT: As I typed this up it occurred to me that I might make things easier on myself by creating a new single .csv file that would have date in the form key,item name,type,size,number,color (basically just copying in the column for item name into the .csv that already has the other information for each key.) This way I would only need to draw from one .csv file rather than from two. Even if I did this, though, how would I go about making my desired .csv file based on only those keys for items not in inventory?
ANSWER: I posted another question here about how to implement the solution I accepted (becauseit was giving me a value error since my dict values were strings rather than sets to start with) and I ended up deciding that I wanted a list rather than a set (to preserve the order.) I also ended up adding the column with item names to my .csv file that had all the other data so that I only had to draw from one .csv file. That said, here is what this section of code now looks like:
MyDict = {}
infile = open('FileWithAllTheData.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in missinginvids: #note that this is the list I was using as the keys for my dict which I was zipping together with a corresponding list of item names to make my dict before.
MyDict.setdefault(int(spl_line[0]), list()).append(spl_line[1:])
print MyDict
it sounds like what you need is a dict mapping ints to sets, ie,
MyDict = {100001: set([apple]), 100002: set([carrot])}
you can add with update:
MyDict[100001].update([fruit])
which would give you: {100001: set([apple, fruit]), 100002: set([carrot])}
Also if you had a list of attributes of carrot... [vegetable,orange]
you could say MyDict[100002].update([vegetable, orange])
and get: {100001: set([apple, fruit]), 100002: set([carrot, vegetable, orange])}
does this answer your question?
EDIT:
to read into CSV...
infile = open('MyFile.csv', 'r')
for line in infile.readlines():
spl_line = line.split(',')
if int(spl_line[0]) in MyDict.keys():
MyDict[spl_line[0]].update(spl_line[1:])
This isn't an answer to the question, but here is a possible way of simplifying your current code.
This:
invids = set(invidsandnames.keys())
invnames = set(invidsandnames.values())
invonhandset = set(inventory)
missinginvidsset = invids - invonhandset
missinginvids = list(missinginvidsset)
missinginvnames = getNames(invidsandnames, missinginvids)
missinginvnameswithids = dict(zip(missinginvnames, missinginvids))
Can be replaced with:
invonhandset = set(inventory)
missinginvnameswithids = {k: v for k, v in invidsandnames.iteritems() if k in in inventory}
Or:
invonhandset = set(inventory)
for key in invidsandnames.keys():
if key not in invonhandset:
del invidsandnames[key]
missinginvnameswithids = invidsandnames
Have you considered making a temporary RDB (python has sqlite support baked in) and for reasonable numbers of items I don't think you would have a performance issues.
I would turn each CSV file and the result from the web-api into a tables (one table per data source). You can then do everything you want to do with some SQL queries + joins. Once you have the data you want, you can then dump it back to CSV.

Categories