I am fairly new to Python and need a little help here.
I have a Python script running on Python 2.6 that parses some JSON.
Example Code:
if "prid" in data[p]["prdts"][n]:
print data[p]["products"][n]["prid"],
if "metrics" in data[p]["prdts"][n]:
lenmet = len(data[p]["prdts"][n]["metrics"])
i = 0
while (i < lenmet):
if (data[p]["prdts"][n]["metrics"][i]["metricId"] == "price"):
print data[p]["prdts"][n]["metrics"][i]["metricValue"]["value"]
break
Now, this prints values in 2 columns:
prid price
123 20
234 40
As you see the fields separator above is ' '. How can I put a field separator like BEL character in the output?
Sample expected output:
prid price
123^G20
234^G40
FWIW, your while loop doesn't increment i, so it will loop forever, but I assume that was just a copy & paste error, and I'll ignore it in the rest of my answer.
If you want to use two separate print statements to print your data on one line you can't avoid getting that space produced by the first print statement. Instead, simply save the prid data until you can print it with the price in one go using string concatenation. Eg,
if "prid" in data[p]["prdts"][n]:
prid = [data[p]["products"][n]["prid"]]
if "metrics" in data[p]["prdts"][n]:
lenmet = len(data[p]["prdts"][n]["metrics"])
i = 0
while (i < lenmet):
if (data[p]["prdts"][n]["metrics"][i]["metricId"] == "price"):
price = data[p]["prdts"][n]["metrics"][i]["metricValue"]["value"]
print str(prid) + '\a' + str(price)
break
Note that I'm explicitly converting the prid and price to string. Obviously, if either of those items is already a string then you don't need to wrap it in str(). Normally, we can let print convert objects to string for us, but we can't do
print prid, '\a', price
here because that will give us an unwanted space between each item.
Another approach is to make use of the new print() function, which we can import using a __future__ import at the top of the script, before other imports:
from __future__ import print_function
# ...
if "prid" in data[p]["prdts"][n]:
print(data[p]["products"][n]["prid"], end='\a')
if "metrics" in data[p]["prdts"][n]:
lenmet = len(data[p]["prdts"][n]["metrics"])
i = 0
while (i < lenmet):
if (data[p]["prdts"][n]["metrics"][i]["metricId"] == "price"):
print(data[p]["prdts"][n]["metrics"][i]["metricValue"]["value"])
break
I don't understand why you want to use BEL as a separator rather than something more conventional, eg TAB. The BEL char may print as ^G in your terminal, but it's invisible in mine, and if you save this output to a text file it may not display correctly in a text viewer / editor.
BTW, It would have been better if you posted a Minimal, Complete, Verifiable Example that focuses on your actual printing problem, rather than all that crazy JSON parsing stuff, which just makes your question look more complicated than it really is, and makes it impossible to test your code or their modifications to it.
Related
Currently working on making a python boardgame representation in the terminal using emoji's, but as soon as my emoji print gets past 50 emoji's, it forces whitespace. This does not happen when printing a string of 50 normal characters or a string of equally lengthed characters. When I do even more emoji's, things sometimes get even weirder with question mark emoji's (see
Terminal picture).
s = "\U0001F7E8"
print("50:")
print(s*50)
print("\n51:")
print(s*51)
print("\n60:")
print(s*60)
print("\n70:")
print(s*70)
print("\n80:")
print(s*80)
print("\n220:")
print(s*220)
print("what I'm currently working on")
#I made string_representation with a lot of code above which I don't feel like matters for my question, so I left that out.
print(string_representation)
print("sadsfdgfhgjgfdssadsfdgfhgjgfdssadsfdgfhgjgfdssadsfdgfhgjgfdssadsfdgfhgjgfdssadsfdgfhgjgfdssadsfdgfhgjgfdssadsfdgfhgjgfds")
I tried locally, it's the problem with the python multiple print(*) and the emoji you take('\U0001F7E8'). You'd better switch to some other emojis.
# s = '\U0001F7E8'
s = '🐍'
# s = '🟨'
s5 = s*5
print(s*60)
for i in range(60):
print(s, end="")
print()
for x in range(6):
print(s*10, end="")
print()
print(s5*12)
print()
for i in range(12):
print(s5, end="")
print()
for x in range(4):
print(s5*3, end="")
print()
Try to print them in one print condition like this
print("50:", s*50, "\n51", s*51, "\n51")
and so on maybe thats your problem
because in the string representation you added them in one print condition
How do I add up incoming individual characters over UART to form a string?
For exmaple:
The characters from UART are printed in following format:
\x02
1
2
3
4
5
6
7
\X03
\x02
a
b
c
d
e
f
g
\x03
and I would like output to be something like:
1234567
abcdefg
I have tried this so far:
#!/usr/bin/env python
import time
import serial
ser = serial.Serial('/dev/ttyUSB0',38400)
txt = ""
ser.flushInput()
ser.flushOutput()
while 1:
bytesToRead = ser.inWaiting()
data_raw = ser.read(1)
while 1:
if data_raw !='\x02' or data_raw !='\x03':
txt += data_raw
elif data_raw == '\x03':
break
print txt
Any ideas on how to do that? I am getting no output using this.
First of all, you don't need to call inWaiting: read will block until data is available unless you explicitly set a read timeout. Secondly, if you do insist on using it, keep in mind that the function inWaiting has been replaced by the property in_waiting.
The string /x03 is a 4-character string containing all printable characters. The string \x03, on the other hand, contains only a single non printable character with ASCII code 3. Backslash is the escape character in Python strings. \x followed by two digits is an ASCII character code. Please use backslashes where they belong. This is the immediate reason you are not seeing output: the four character string can never appear in a one-character read.
That being out of the way, the most important thing to remember is to clear your buffer when you encounter a terminator character. Let's say you want to use the inefficient method of adding to your string in place. When you reach \x03, you should print txt and just reset it back to '' instead of breaking out of the loop. A better way might be to use bytearray, which is a mutable sequence. Keep in mind also that read returns bytes, not strings in Python 3.x. This means that you should be decoding the result if you want text: txt = txt.decode('ascii').
I would suggest a further improvement, and create an infinite generator function to split your steam into strings. You could use that generator to print the strings or do anything else you wanted with them:
def getstrings(port):
buf = bytearray()
while True:
b = port.read(1)
if b == b'\x02':
del buf[:]
elif b == b'\x03':
yield buf.decode('ascii')
else:
buf.append(b)
for item in getstring(Serial(...)):
print(item)
Here's one way that I think would help you a bit
l = []
while 1:
data_raw = ser.read(1)
if data_raw !='/x02' or data_raw !='/x03':
l.append(data_raw)
elif data_raw == '/x03':
txt = "-".join(l)
print txt
I begin by creating an empty list and each time you receive a new raw_data. you append it to the list. Once you reach your ending character, you create your string and print it.
I took the liberty of removing one of the loop here to give you a simpler approach. The code will not break by itself at the end (just add a break after the print if you want to). It will print out the result whenever you reach the ending character and then wait for a next data stream to start.
If you want to see an intermediate result, you can print out each data_raw and could add right after a print of the currently joined list.
Be sure to set a timeout value to None when you open your port. That way, you will simply wait to receive one bit, process it and then return to read one bit at a time.
ser = serial.Serial(port='/dev/ttyUSB0', baudrate=38400, timeout=None)
http://pyserial.readthedocs.io/en/latest/pyserial_api.html for mor information on that.
I have recently been learning some Python and how to apply it to my work. I have written a couple of scripts successfully, but I am having an issue I just cannot figure out.
I am opening a file with ~4000 lines, two tab separated columns per line. When reading the input file, I get an index error saying that the list index is out of range. However, while I get the error every time, it doesn't happen on the same line every time (as in, it will throw the error on different lines everytime!). So, for some reason, it works generally but then (seemingly) randomly fails.
As I literally only started learning Python last week, I am stumped. I have looked around for the same problem, but not found anything similar. Furthermore I don't know if this is a problem that is language specific or IPython specific. Any help would be greatly appreciated!
input = open("count.txt", "r")
changelist = []
listtosort = []
second = str()
output = open("output.txt", "w")
for each in input:
splits = each.split("\t")
changelist = list(splits[0])
second = int(splits[1])
print second
if changelist[7] == ";":
changelist.insert(6, "000")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
elif changelist[8] == ";":
changelist.insert(6, "00")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
elif changelist[9] == ";":
changelist.insert(6, "0")
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
else:
#output.write(str("".join(changelist)))
va = "".join(changelist)
var = va + ("\t") + str(second)
listtosort.append(var)
output.write(var)
output.close()
The error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/home/a/Desktop/sharedfolder/ipytest/individ.ins.count.test/<ipython-input-87-32f9b0a1951b> in <module>()
57 splits = each.split("\t")
58 changelist = list(splits[0])
---> 59 second = int(splits[1])
60
61 print second
IndexError: list index out of range
Input:
ID=cds0;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C 50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C 36
Desired output:
ID=cds0000;Name=NP_414542.1;Parent=gene0;Dbxref=ASAP:ABE-0000006,UniProtKB%2FSwiss-Prot:P0AD86,Genbank:NP_414542.1,EcoGene:EG11277,GeneID:944742;gbkey=CDS;product=thr 12
ID=cds1000;Name=NP_415538.1;Parent=gene1035;Dbxref=ASAP:ABE-0003451,UniProtKB%2FSwiss-Prot:P31545,Genbank:NP_415538.1,EcoGene:EG11735,GeneID:946500;gbkey=CDS;product=deferrrochelatase%2C 50
ID=cds1001;Name=NP_415539.1;Parent=gene1036;Note=PhoB-dependent%2C 36
The reason you're getting the IndexError is that your input-file is apparently not entirely tab delimited. That's why there is nothing at splits[1] when you attempt to access it.
Your code could use some refactoring. First of all you're repeating yourself with the if-checks, it's unnecessary. This just pads the cds0 to 7 characters which is probably not what you want. I threw the following together to demonstrate how you could refactor your code to be a little more pythonic and dry. I can't guarantee it'll work with your dataset, but I'm hoping it might help you understand how to do things differently.
to_sort = []
# We can open two files using the with statement. This will also handle
# closing the files for us, when we exit the block.
with open("count.txt", "r") as inp, open("output.txt", "w") as out:
for each in inp:
# Split at ';'... So you won't have to worry about whether or not
# the file is tab delimited
changed = each.split(";")
# Get the value you want. This is called unpacking.
# The value before '=' will always be 'ID', so we don't really care about it.
# _ is generally used as a variable name when the value is discarded.
_, value = changed[0].split("=")
# 0-pad the desired value to 7 characters. Python string formatting
# makes this very easy. This will replace the current value in the list.
changed[0] = "ID={:0<7}".format(value)
# Join the changed-list with the original separator and
# and append it to the sort list.
to_sort.append(";".join(changed))
# Write the results to the file all at once. Your test data already
# provided the newlines, you can just write it out as it is.
output.writelines(to_sort)
# Do what else you need to do. Maybe to_list.sort()?
You'll notice that this code is reduces your code down to 8 lines but achieves the exact same thing, does not repeat itself and is pretty easy to understand.
Please read the PEP8, the Zen of python, and go through the official tutorial.
This happens when there is a line in count.txt which doesn't contain the tab character. So when you split by tab character there will not be any splits[1]. Hence the error "Index out of range".
To know which line is causing the error, just add a print(each) after splits in line 57. The line printed before the error message is your culprit. If your input file keeps changing, then you will get different locations. Change your script to handle such malformed lines.
I am using this code to find difference between two csv list and hove some formatting questions. This is probably an easy fix, but I am new and trying to learn and having alot of problems.
import difflib
diff=difflib.ndiff(open('test1.csv',"rb").readlines(), open('test2.csv',"rb").readlines())
try:
while 1:
print diff.next(),
except:
pass
the code works fine and I get the output I am looking for as:
Group,Symbol,Total
- Adam,apple,3850
? ^
+ Adam,apple,2850
? ^
bob,orange,-45
bob,lemon,66
bob,appl,-56
bob,,88
My question is how do I clean the formatting up, can I make the Group,Symbol,Total into sperate columns, and the line up the text below?
Also can i change the ? to represent a text I determine? such as test 1 and test 2 representing which sheet it comes from?
thanks for any help
Using difflib.unified_diff gives much cleaner output, see below.
Also, both difflib.ndiff and difflib.unified_diff return a Differ object that is a generator object, which you can directly use in a for loop, and that knows when to quit, so you don't have to handle exceptions yourself. N.B; The comma after line is to prevent print from adding another newline.
import difflib
s1 = ['Adam,apple,3850\n', 'bob,orange,-45\n', 'bob,lemon,66\n',
'bob,appl,-56\n', 'bob,,88\n']
s2 = ['Adam,apple,2850\n', 'bob,orange,-45\n', 'bob,lemon,66\n',
'bob,appl,-56\n', 'bob,,88\n']
for line in difflib.unified_diff(s1, s2, fromfile='test1.csv',
tofile='test2.csv'):
print line,
This gives:
--- test1.csv
+++ test2.csv
## -1,4 +1,4 ##
-Adam,apple,3850
+Adam,apple,2850
bob,orange,-45
bob,lemon,66
bob,appl,-56
So you can clearly see which lines were changed between test1.csv and test1.csv.
To line up the columns, you must use string formatting.
E.g. print "%-20s %-20s %-20s" % (row[0],row[1],row[2]).
To change the ? into any text test you like, you'd use s.replace('any text i like').
Your problem has more to do with the CSV format, since difflib has no idea it's looking at columnar fields. What you need is to figure out into which field the guide is pointing, so that you can adjust it when printing the columns.
If your CSV files are simple, i.e. they don't contain any quoted fields with embedded commas or (shudder) newlines, you can just use split(',') to separate them into fields, and figure out where the guide points as follows:
def align(line, guideline):
"""
Figure out which field the guide (^) points to, and the offset within it.
E.g., if the guide points 3 chars into field 2, return (2, 3)
"""
fields = line.split(',')
guide = guideline.index('^')
f = p = 0
while p + len(fields[f]) < guide:
p += len(fields[f]) + 1 # +1 for the comma
f += 1
offset = guide - p
return f, offset
Now it's easy to show the guide properly. Let's say you want to align your columns by printing everything 12 spaces wide:
diff=difflib.ndiff(...)
for line in diff:
code = line[0] # The diff prefix
print code,
if code == '?':
fld, offset = align(lastline, line[2:])
for f in range(fld):
print "%-12s" % '',
print ' '*offset + '^'
else:
fields = line[2:].rstrip('\r\n').split(',')
for f in fields:
print "%-12s" % f,
print
lastline = line[2:]
Be warned that the only reliable way to parse CSV files is to use the csv module (or a robust alternative); but getting it to play well with the diff format (in full generality) would be a bit of a headache. If you're mainly interested in readability and your CSV isn't too gnarly, you can probably live with an occasional mix-up.
I have a tab delimited text file with the following data:
ahi1
b/se
ahi
test -2.435953
1.218364
ahi2
b/se
ahi
test -2.001858
1.303935
I want to extract the two floating point numbers to a separate csv file with two columns, ie.
-2.435953 1.218264
-2.001858 1.303935
Currently my hack attempt is:
import csv
from itertools import islice
results = csv.reader(open('test', 'r'), delimiter="\n")
list(islice(results,3))
print results.next()
print results.next()
list(islice(results,3))
print results.next()
print results.next()
Which is not ideal. I am a Noob to Python so I apologise in advance and thank you for your time.
Here is the code to do the job:
import re
# this is the same data just copy/pasted from your question
data = """ ahi1
b/se
ahi
test -2.435953
1.218364
ahi2
b/se
ahi
test -2.001858
1.303935"""
# what we're gonna do, is search through it line-by-line
# and parse out the numbers, using regular expressions
# what this basically does is, look for any number of characters
# that aren't digits or '-' [^-\d] ^ means NOT
# then look for 0 or 1 dashes ('-') followed by one or more decimals
# and a dot and decimals again: [\-]{0,1}\d+\.\d+
# and then the same as first..
pattern = re.compile(r"[^-\d]*([\-]{0,1}\d+\.\d+)[^-\d]*")
results = []
for line in data.split("\n"):
match = pattern.match(line)
if match:
results.append(match.groups()[0])
pairs = []
i = 0
end = len(results)
while i < end - 1:
pairs.append((results[i], results[i+1]))
i += 2
for p in pairs:
print "%s, %s" % (p[0], p[1])
The output:
>>>
-2.435953, 1.218364
-2.001858, 1.303935
Instead of printing out the numbers, you could save them in a list and zip them together afterwards..
I'm using the python regular expression framework to parse the text. I can only recommend you pick up regular expressions if you don't already know it. I find it very useful to parse through text and all sorts of machine generated output-files.
EDIT:
Oh and BTW, if you're worried about the performance, I tested on my slow old 2ghz IBM T60 laptop and I can parse a megabyte in about 200ms using the regex.
UPDATE:
I felt kind, so I did the last step for you :P
Maybe this can help
zip(*[results]*5)
eg
import csv
from itertools import izip
results = csv.reader(open('test', 'r'), delimiter="\t")
for result1, result2 in (x[3:5] for x in izip(*[results]*5)):
... # do something with the result
Tricky enough but more eloquent and sequential solution:
$ grep -v "ahi" myFileName | grep -v se | tr -d "test\" " | awk 'NR%2{printf $0", ";next;}1'
-2.435953, 1.218364
-2.001858, 1.303935
How it works: Basically remove specific text lines, then remove unwanted text in lines, then join every second line with formatting. I just added the comma for beautification purposes. Leave the comma out of awks printf ", " if you don't need it.