Format strings to make 'table' in Python 3 - python

Right now I'm using print(), calling the variables I want that are stored in a tuple and then formatting them using: print(format(x,"<10s")+ format(y,"<40s")...) but this gives me output that isn't aligned in a column form. How do I make it so that each row's element is aligned?
So, my code is for storing student details. First, it takes a string and returns a tuple, with constituent parts like: (name,surname,student ID, year).
It reads these details from a long text file on student details, and then it parses them through a tuplelayout function (the bit which will format the tuple) and is meant to tabulate the results.
So, the argument for the tuplelayout function is a tuple, of the form:
surname | name | reg number | course | year

If you are unpacking tuples just use a single str.format and justify the output as required using format-specification-mini-language:
l = [(10,1000),(200,20000)]
for x,y in l:
print("{:<3} {:<6}".format(x,y))
10 1000
200 20000

My shell has the font settings changed so the alignment was off. Back to font: "Courier" and everything is working fine.
Sorry.

Related

Python Pandas Square Brackets LIST from STRING

Warning: I am newbie to Python, Pandas, and PySerial....
I am reading values from an Excel spreadsheet using Pandas.
The values in Excel are stored as Text, but contain both alphabetical and numeric characters.
see Snip of Excel data
I import these using Pandas command mydata = pd.read_excel (*path etc goes here*) <<< (no problems are encountered with this function)
I can then print them using print(mydata) ....and the output looks the same as it appears in the Excel spreadsheet (i.e., there are no extra characters):
0 MW000000007150000300000;
1 MW000100009850000200000;
2 MW000200009860000200000; #<<<<<<<< *Notice that there are NO square brackets and no extra Quotes*.
To send these data via the PySerial function serial.write to my RS-232 linked device, I am looping through the values which must (as I understand it...) be in a LIST format. So, I convert the data-field mydata into a LIST, by using the command Allocation_list=mydata.values.tolist()
If I print(Allocation_list), I find many square brackets and single quotes have been added, as you can see here:
Allocation_list =([['MW000000007150000300000;'], ['MW000100009850000200000;'], ['MW000200009860000200000;'], ['MW000300009870000200000;'], ['MW000400009880000200000;'], ['MW000500009890000200000;']])
These square brackets are NOT ignored when I <<serial.write>> the values in the LIST to my RS-232 device.
In fact, the values are written as (binary versions of....)
0 memory written as ['MW000000007150000300000;']
1 memory written as ['MW000100009850000200000;']
2 memory written as ['MW000200009860000200000;']
3 memory written as ['MW000300009870000200000;']
4 memory written as ['MW000400009880000200000;']
5 memory written as ['MW000500009890000200000;']
Unfortunately, for the RS-232 device to accept each of the lines written to it as a acceptable command, they must be in the precise command format for that device, which looks like
MW000000007150000300000; <<<<< the semi-colon is a required part of the syntax
So, the square brackets and the Quotation marks have to be removed, somehow.
Any help with this peculiar problem would be appreciated, as I have tried several of the methods described in other 'threads', and none of them seem to work properly because my datafield is a set of strings (which are converted to bits ONLY as they are about to be written to the RS-232 device).
M
Even if you have a frame with just one column avoid this:
l = df.values.tolist()
l
#outputs:
[[40], [10], [20], [10], [15], [30]]
To avoid the issue include a column when outputting to a list:
l = df['amount'].to_list()
l
#outputs:
[40, 10, 20, 10, 15, 30]
If you want a range of rows use loc:
#put rows 3 to 5 (note the index starts at 0!) for only column 'amount' into a list
l = df.loc[2:4,'amount'].to_list()
l
#outputs:
[20, 10, 15]
Showing the code in full on a frame with only one column:
First off, values preserves the dimensionality of the object it's called upon, so you have to target the exact column that holds the serials, something like mydata["column_label"] (just check the relevant column label by printing the dataframe).
As for quotes, pyserial write() accepts bytes-like objects, so you might need to pass an encoded version of your string, using either b'string' or 'string'.encode("utf8") notation.

formating problem :TypeError: not enough arguments for format string

i run this formating code print("%15s%.2f"%((heights[j])),end="") but i have this error what is the wrong here ??
TypeError: not enough arguments for format string
What does your heights looks like?
Here is a working example
heights = [("test",3.14)]
print("%15s%.2f"%((heights[0])),end="")
So heights must be a list of tuples or lists with 2 elements.
the first % formats the first value into the string and the second % formats the second value. The problem is you only have one value to format into the string (unless heights[j] is a list or tuple.
if you want heights[j to be formatted in both places, i suggest doing something like this:
print("{0}15s{0}.2f".format(heights[j]), end="")
this will replace every {0} in the string with the first argument passed to format()

Finding row in Dataframe when dataframe is both int or string?

minor problem doing my head in. I have a dataframe similar to the following:
Number Title
12345678 A
34567890-S B
11111111 C
22222222-L D
This is read from an excel file using pandas in python, then the index set to the first column:
db = db.set_index(['Number'])
I then lookup Title based on Number:
lookup = "12345678"
title = str(db.loc[lookup, 'Title'])
However... Whilst anything postfixed with "-Something" works, anything without it doesn't find a location (eg. 12345678 will not find anything, 34567890-S will). My only hunch is it's to do with looking up as either strings or ints, but I've tried a few things (converting the table to all strings, changing loc to iloc,ix,etc) but so far no luck.
Any ideas? Thanks :)
UPDATE: So trying this from scratch doesn't exhibit the same behaviour (creating a test db presumably just sets everything as strings), however importing from CSV is resulting in the above, and...
Searching "12345678" (as a string) doesn't find it, but 12345678 as an int will. Likewise the opposite for the others. So the dataframe is only matching the pure numbers in the index with ints, but anything else with strings.
Also, I can't not search for the postfix, as I have multiple rows with differing postfix eg 34567890-S, 34567890-L, 34567890-X.
If you want to cast all entries to one particular type, you can use pandas.Series.astype:
db["Number"] = df["Number"].astype(str)
db = db.set_index(['Number'])
lookup = "12345678"
title = db.loc[lookup, 'Title']
Interestingly this is actually slower than using pandas.Index.map:
x1 = [pd.Series(np.arange(n)) for n in np.logspace(1, 4, dtype=int)]
x2 = [pd.Index(np.arange(n)) for n in np.logspace(1, 4, dtype=int)]
def series_astype(x1):
return x1.astype(str)
def index_map(x2):
return x2.map(str)
Consider all the indeces as strings, as at least some of them are not numbers. If you want to lookup a specific item that possibly could have a postfix, you could match it by comparing the start of the strings with .str.startswith:
lookup = db.index.str.startswith("34567890")
title = db.loc[lookup, "Title"]

Returning tuple of unknown length from python UDF and then applying hash in Pig

This is a question that has two parts:
First, I have a python UDF that creates a list of strings of unknown length. The input to the UDF is a map (dict in python) and the number of keys is essentially unknown (it is what I'm trying to obtain).
What I don't know is how to output that in a schema that lets me return it as a list (or some other iterable data structure). This is what I have so far:
#outputSchema("?????") #WHAT SHOULD THE SCHEMA BE!?!?
def test_func(input):
output = []
for k, v in input.items():
output.append(str(key))
return output
Now, the second part of the question. Once in Pig I want to apply a SHA hash to each element in the "list" for all my users. Some Pig pseudo code:
USERS = LOAD 'something' as (my_map:map[chararray])
UDF_OUT = FOREACH USERS GENERATE my_udfs.test_func(segment_map)
SHA_OUT = FOREACH UDF_OUT GENERATE SHA(UDF_OUT)
The last line is likely wrong as I want to apply the SHA to each element in the list, NOT to the whole list.
To answer your question, since you are returning a python list who's contents are a string, you will want your decorator to be
#outputSchema('name_of_bag:{(keys:chararray)}')
It can be confusing when specifying this structure because you only need to define what one element in the bag would look like.
That being said, there is a much simpler way to do what you require. There is a function KEYSET() (You can reference this question I answered) that will extract the keys from a Pig Map. So using the data set from that example and adding a few more keys to the first one since you said your map contents are variable in length
maps
----
[a#1,b#2,c#3,d#4,e#5]
[green#sam,eggs#I,ham#am]
Query:
REGISTER /path/to/jar/datafu-1.2.0.jar;
DEFINE datafu.pig.hash.SHA();
A = LOAD 'data' AS (M:[]);
B = FOREACH A GENERATE FLATTEN(KEYSET(M));
hashed = FOREACH B GENERATE $0, SHA($0);
DUMP hashed;
Output:
(d,18ac3e7343f016890c510e93f935261169d9e3f565436429830faf0934f4f8e4)
(e,3f79bb7b435b05321651daefd374cdc681dc06faa65e374e38337b88ca046dea)
(b,3e23e8160039594a33894f6564e1b1348bbd7a0088d42c4acb73eeaed59c009d)
(c,2e7d2c03a9507ae265ecf5b5356885a53393a2029d241394997265a1a25aefc6)
(a,ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb)
(ham,eccfe263668d171bd19b7d491c3ef5c43559e6d3acf697ef37596181c6fdf4c)
(eggs,46da674b5b0987431bdb496e4982fadcd400abac99e7a977b43f216a98127721)
(green,ba4788b226aa8dc2e6dc74248bb9f618cfa8c959e0c26c147be48f6839a0b088)

Formatting dict.items() for wxPython

I have a text box in wxPython that takes the output of dictionary.items() and displays it to the user as items are added to the dictionary. However, the raw data is very ugly, looking like
[(u'BC',45)
(u'CHM',25)
(u'CPM',30)]
I know dictionary.items() is a list of tuples, but I can't seem to figure out how to make a nice format that is also compatible with the SetValue() method of wxPython.
I've tried iterating through the list and tuples. If I use a print statement, the output is fine. But when I replace the print statement with SetValue(), it only seems to get the last value of each tuple, rather than both items in the tuple.
I've also tried creating a string and passing that string to SetValue() but, again, I can only get one item in the tuple or the other, not both.
Any suggestions?
Edit: Yes, I am passing the results of the dictionary.items() to a text field in a wxPython application. Rather than having the results like above, I'm simply looking for something like:
BC 45
CHM 25
CMP 30
Nothing special, just simply pulling each value from each tuple and making a visual list.
I have tried making a string format and passing that to SetValue() but it gets hung up on the two values in the tuple. It will either double print each string and add the integers together or it simply returns the integer, depending on how I format it.
There is no built-in dictionary method that would return your desired result.
You can, however, achieve your goal by creating a helper function that will format the dictionary, e.g.:
def getNiceDictRepr(aDict):
return '\n'.join('%s %s' % t for t in aDict.iteritems())
This will produce your exact desired output:
>>> myDict = dict([(u'BC',45), (u'CHM',25), (u'CPM',30)])
>>> print getNiceDictRepr(myDict)
BC 45
CHM 25
CPM 30
Then, in your application code, you can use it by passing it to SetValue:
self.textCtrl.SetValue(getNiceDictRepr(myDict))
Maybe the pretty print module will help:
>>> import pprint
>>> pprint.pformat({ "my key": "my value"})
"{'my key': 'my value'}"
>>>
text_for_display = '\n'.join(item + u' ' + unicode(value) for item, value in my_dictionary.items())
use % formatting (known in C as sprintf), e.g:
"%10s - %d" % dict.items()[0]
Number of % conversion specifications in the format string should match tuple length, in the dict.items() case, 2. The result of the string formatting operator is a string, so that using it as an argument to SetValue() is no problem. To translate the whole dict to a string:
'\n'.join(("%10s - %d" % t) for t in dict.items())
The format conversion types are specified in the doc.
That data seems much better displayed as a Table/Grid.
I figured out a "better" way of formatting the output. As usual, I was trying to nuke it out when a more elegant method will do.
for key, value in sorted(self.dict.items()):
self.current_list.WriteText(key + " " + str(self.dict[key]) + "\n")
This way also sorts the dictionary alphabetically, which is a big help when identifying items that have already been selected or used.

Categories