GNUplot loop - best way to do this? - python

I have a dictionary called tmptwo that looks like this:
{'test-test': [['2014-01-05 01:06:11', 37, 34, 0], ['2014-01-05 01:09:20', 44, 32, 11]], 'another-test': [['2014-01-05 01:06:11', 88, 76, 6], ['2014-01-05 01:09:20', 62, 9, 3]]}
I am trying to pipe this data into GNUplot over stdin. However I am struggling with writing the loop to do this.
I must plot in the following order:
the time-stamp of the first list followed by one of the three integers of key 'test-test'
the time-stamp of the first list followed by one of the three integers of key 'another-test'
As in 1. but with the next list
As in 2. but with the next list
Of course here there are only two keys so it was easy to explain in 4 steps the algorithm for this particular example. But in my actual script, the dictionary will have multiple keys and multiple lists (though all keys will have the same number of lists within their respective value pair) within the value (which itself is a list) of the key. Do you follow?
I know there's probably an easier way of feeding the data in, but this is what I'm working with...
So far my function for feeding this data into GNUplot looks like this:
def startPlotting(selection, gnuplot, tmptwo):
try:
if selection == 'totals':
sel = 1
if selection == 'usable':
sel = 2
if selection == 'inserted':
sel = 3
for items in zip(*tmptwo.values()):
for item in items:
gnuplot.stdin.write("%s,%i\n" % (item[0],item[sel]))
gnuplot.stdin.write("e\n")
gnuplot.stdin.write("reset\n")
gnuplot.stdin.flush()
except (IOError, TypeError, NameError) as e:
raise
And so far, I'm having no joy. Can you help?
EDIT:
Have updated the loop in the code based on the suggestion below, however GNUplot is still error'ing out. I'm clearly unable to find the solution... Any suggestions?

Related

Indexing error after removing line from 2D array

I am facing an 'List Index out of range' error when trying to iterate a for-loop over a table I've created from a CSV extract, but cannot figure out why - even after trying many different methods.
Here is the step by step description of how the error happens :
I'm removing the first line of an imported CSV file, as this
line contains the columns' names but no data. The CSV has the following structure.
columnName1, columnName2, columnName3, columnName4
This, is, some, data
I, have, in, this
very, interesting, CSV, file
After storing the CSV in a first array called oldArray, I want to populate a newArray that will get all values from oldArray but not the first line, which is the column name line, as previously
mentioned. My newArray should then look like this.
This, is, some, data
I, have, in, this
very, interesting, CSV, file
To create this newArray, I'm using the following code with the append() function.
tempList = []
newArray = []
for i in range(len(oldArray)):
if i > 0: #my ugly way of skipping line 0...
for j in range(len(oldArray[0])):
tempList.append(oldArray[i][j])
newArray.append(tempList)
tempList = []
I also stored the columns in their own separate list.
i = 0
for i in range(len(oldArray[0])):
my_columnList[i] = oldArray[0][i]
And the error comes up next : I now want to populate a treeview table from this newArray, using a for-loop and insert (in a function). But I always get the 'Index List out of range error' and I cannot figure out why.
def populateTable(my_tree, newArray, my_columnList):
i = 0
for i in range(len(newArray)):
my_tree.insert('','end', text=newArray[i][0], values = (newArray[i][1:len(newArray[0]))
#(im using the text option to bypass treeview's column 0 problem)
return my_tree
Error message --> " File "(...my working directory...)", line 301, in populateTable
my_tree.insert(parent='', index='end', text=data[i][0], values=(data[i][1:len(data[0])]))
IndexError: list index out of range "
Using that same function with different datasets and columns worked fine, but not for this here newArray.
I'm fairy certain that the error comes strictly from this 'newArray' and is not linked to another parameter.
I've tested the validity of the columns list, of the CSV import in oldArray through some print() functions, and everything seems normal - values, row dimension, column dimension.
This is a great mystery to me...
Thank you all very much for your help and time.
You can find a problem from your error message: File "(...my working directory...)", line 301, in populateTable my_tree.insert(parent='', index='end', text=data[i][0], values=(data[i][1:len(data[0])])) IndexError: list index out of range
It means there is an index out of range in line 301: data[i][0] or data[i][1:len(data[0])]
(i is over len(data)) or (0 or 1 is over len(data[0]))
My guess is there is some empty list in data(maybe data[-1]?).
if data[i] is [] or [some_one_item], then data[i][1:len(data[0])] try to access to second item which not exists.
there is no problem in your "ugly" way to skip line 0 but I recommend having a look on this way
new_array = old_array.copy()
new_array.remove(new_array[0])
now for fixing your issue
looks like you have a problem in the indexing
when you use a for loop using the range of the length of an array you use normal indexing which starts from one while you identify your i variable to be zero
to make it simple
len(oldArray[0])
this is equal to 4 so when you use it in the for loop it's just like saying
for i in range(4):
to fix this you can either subtract 1 from the length of the old array or just identify the i variable to be 1 at the first
i = 1
for i in range(len(oldArray[0])):
my_columnList[i] = oldArray[0][i]
or
i = 0
for i in range(len(oldArray[0])-1):
my_columnList[i] = oldArray[0][i]
this mistake is also repeated in your populateTree function
so in the same way your code would be
def populateTree(my_tree, newArray, my_columnList):
i = 0
for i in range(len(newArray)-1):
my_tree.insert('','end', text=newArray[i][0], values = (newArray[i][1:len(newArray[0]))
#(im using the text option to bypass treeview's column 0 problem)
return my_tree

append function not working in for loop Python 3

I have a for loop running that creates a new list of select Unix times from elements of another list containing multiple Unix times, the index of those elements is in turn given by another list. My problem is that within this for loop the append function is not working and I have no idea why as I get no errors. The print function is simply ignored after the for loop. I'm not sure what I'm doing wrong. Could someone help me out?
Here is my code:
adjusted_exc_pass_numbers = [0, 6, 9, 16, 19, 22, 25, 32, 35, 41, 48]
processed_start_times = [1519275660, 1519287600, 1519325040, 1519336920, 1519360080, 1519365900, 1519371900, 1519409400, 1519415340, 1519421280, 1519450260, 1519456200, 1519499700, 1519534680, 1519540560, 1519546620, 1519584060, 1519596000, 1519619160, 1519624920, 1519630920, 1519668420, 1519674360, 1519680360, 1519709340, 1519715220, 1519758720, 1519793760, 1519799580, 1519805700, 1519843080, 1519855020, 1519878180, 1519884000, 1519890000, 1519927500, 1519939380, 1519968360, 1519974300, 1520017800, 1520052780, 1520058660, 1520064720, 1520102160]
ppst = []
for element in range(len(adjusted_exc_pass_numbers)):
ppst.append(processed_start_times[int(adjusted_exc_pass_numbers[element])])
print(ppst)
When I run this, print is ignored and the rest of the code executes as if the statement was not there. I don't understand why it is not appending or printing.
Thank you for your time.
When I run this code I get an IndexError at the line
ppst.append(processed_start_times[int(adjusted_exc_pass_numbers[element])])
The reason for this is because as you are iterating, element will be 0,1,2,3..., adjusted_exc_pass_numbers[element] will be 0,6,9,16,..., and eventually you will be trying to get index 48 from processed_start_times, a list which only has 44 entries. Not sure why you are completing the loop without any errors but this is the problem as far as I can tell.

using enumerate to iterate over a dictionary of lists to extract information

I got some help earlier today about how to obtain positional information from a dictionary using enumerate(). I will provide the code shortly. However, now that I've found this cool tool, I want to implement it in a different manner to obtain some more information from my dictionary.
I have a dictionary:
length = {'A': [(0,21), (30,41), (70,80), (95,200)] 'B': [(0,42), (70,80)]..etc}
and a file:
A 73
B 15
etc
What I want to do now is to find the difference from the max of the first element in my list from the min from the second element. For example, the difference of 21 and 30. Then I want to add all these differences up until I hit the pair (range) of numbers that the number from my file matches to (if that makes sense).
Here is the code that I've been working on:
import csv
with open('Exome_agg_cons_snps_pct_RefSeq_HGMD_reinitialized.txt') as f:
reader = csv.DictReader(f,delimiter="\t")
for row in reader:
snppos = row['snp_rein']
name = row['isoform']
snpos = int(snppos)
if name in exons:
y = exons[name]
for sd, i in enumerate(exons[name]):
while not snpos<=max(i):
intron = min(i+1) - max(i) #this doesn't work unfortunately. It says I can't add 1 to i
totalintron = 0 + intron
if snpos<=max(i):
exonmin = min(i)
exonnumber = sd+1
print exonnumber,name,totalintron
break
I think it's the sd (indexer) that is confusing me. I don't know how to use it in the this context. The commented out portions are other avenues I've tried but failed to be successful. Any help? I know this is a confusing question and my code might be a little mixed up, but that's because I can't even get an output to correct my other mistakes yet.
I want my output to look like this based on the file provided:
exon name introntotal
3 A 38
1 B 0
To try to provide some help for this question: a critical part of the problem is that I don't think enumerate does what you think it does. Enumerate just numbers the things you are iterating over. So when you go through your for loop, sd will first be 0, then it will be 1... And that's all. In your case, you want to look at adjacent list entries (it seems?), so the more idiomatic ways of looping in python aren't nearly as clean. So you could do something like:
...
y = exons[name]
for index in range(len(y) - 1): # the - 1 is to prevent going out of bounds
first_max = max(y[index])
second_min = min(y[index+1])
... # do more stuff, I didn't completely follow what you're trying to do
I will add for the hardcore pythonistas, you can of course do some clever stuff to write this more idiomatically and avoid the C style loop that I wrote, but I think that getting into zip and so on might be a bit confusing for somebody new to python.
The issue is that you're using the output of enumerate() incorrectly.
enumerate() returns the index (position) first then the item
Ex:
x = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
for i, item in enumerate(x):
print(i, item)
# prints
#(0, 10)
#(1, 11)
#(2, 12)
#(3, 13)
#(4, 14)
#(5, 15)
#(6, 16)
#(7, 17)
#(8, 18)
#(9, 19)
So in your case, you should switch i and sd:
for i, sd in enumerate(exons[name]):
# do something
Like other commenters suggested, reading the python documentation is usually a good place to start resolving issues, especially if you're not sure how a function does what it does :)

select and make new list with specific information

EDIT2: Nevermind this, someone pointed my error. Thanks
first of all, this is an example of results i have
(172, 'Nucleus')
(172, 'Nucleus')
(472, 'Cytoplasm')
(472, 'Cytoplasm')
(472, 'Nucleus')
what i`m trying to do is to match the first number (position 0) and then look if there is a part of the word "nucleus" (here, it would be "nuc") It can happens that in each number there is only word that has nucleus.
i'm trying to make 2 lists : the first list would be only the number containing only "nuc" word. the second list would be containing those with nuc and other things (like cytoplasm in my example)
That is only a little part of my result.
I don't have example of code, because i have really no clue how to include only one valor of my query in the list ( as on the example, i would enter the number 172 two time) (oops i now have an example of code)
EDIT: oops wrote that before i wrote the code i tried...
right now, my code looks like that :
here is how i got my example a little bit higher
def number1(self, position):
self.position = position
List = [self.name()]
for item in List:
for i in range(position, self.c.rowcount):
self.number(i)
def separate_list(self, list_signal):
nuc_list = []
not_nuc_list = []
for i in list_signal:
print(list_signal(i))
if list_signal(i)(0) == list_signal(i+1)(0):
if list_signal(i)(1) and list_signal(i+1)(1) == re.search("nuc"):
nuc_list.append(list_signal(i))
else:not_nuc_list.append(list_signal(i))
return nuc_list and not_nuc_list
dc = connection()
dc.separate_list(dc.number1(0))
error:
Traceback (most recent call last):
File "class vincent.py", line 91, in <module>
dc.separate_list(dc.number1(0))
File "class vincent.py", line 61, in separate_list
for i in list_signal:
TypeError: 'NoneType' object is not iterable
i know this is not cute, i tried doing it the best way i can .. (new to python and programming in itself)
EDIT2: Nevermind this, someone pointed my error. Thanks
A few things, if you are trying to get the index, position 0 of the list as you say, you would use list_name[0], if you are using position to sort, use a different method
Are (172, 'Nucleus') ... (172, 'Nucleus') tuples or are they lists of their own? List you can use index with the [0] method, tuple you can assign it to two variables to work with the data as number, cell_type = (172, 'nucleus')
Also, at the moment dc.number1 doesn't return anything so it cant be used at input to another function. Add a return of some sort or change what you are using as the input to whatever self.number is modifying.
You may want to make a list of all your results, e.g. [(172, 'Nucleus'), ...(172, 'Nucleus')] then you can iterate through with
for item in results_list:
for number, cell_type in item:
print str(number), cell_type
#Should give you "172 Nucleus"

Avoiding variables as variable or list names in Python

I'd like to read in a number of text files that have the following structure:
3 560
7 470
2 680
4 620
3 640
...
The first column specifies conditions of a behavioral experiment, the second column reaction times. What I'd like to do is to create an array/list for each condition that contains the reaction times for this condition only. I've previously used Perl for this. As there are many different conditions, I wanted to avoid writing many separate elsif statements and therefore used the condition name in the array name:
push(#{condition.${cond}}, $rt); # $cond being the condition, $rt being the reaction time
For example, creating an array like this:
#condition3 = (560, 640,...);
As I got interested in Python, I wanted to see how I would accomplish this using Python. I found a number of answers discouraging the use of variables in variable names, but have to admit that from these answers I could not derive how I would create lists as described above without reverting to separate elif's for each condition. Is there a way to do this? Any suggestions would be greatly appreciated!
Thomas
A dictionary would be a good way to do this. Try something like this:
from collections import defaultdict
conditions = defaultdict(list)
for cond, rt in data:
conditions[cond].append(rt)
The following code reads the file data.txt with the format you described and computes a dictionary with the reaction times per experiment:
experiments = {}
with open('data.txt', 'r') as f:
data = [l.strip() for l in f.readlines()]
for line in data:
index, value = line.split()
try:
experiments[int(index)].append(value)
except KeyError:
experiments[int(index)] = [value]
print experiments
# prints: {2: ['680'], 3: ['560', '640'], 4: ['620'], 7: ['470']}
You can now access the reaction times per experiment using experiments[2], experiments[3], et cetera.
This is a perfect application for a dictionary, which is similar to a Perl hash:
data = {}
with open('data.txt') as f:
for line in f:
try:
condition, value = map(int, line.strip().split())
data.setdefault(condition, []).append(value)
except Exception:
print 'Bad format for line'
Now you can access your different conditions by indexing data:
>>> data
{2: [680], 3: [560, 640], 4: [620], 7: [470]}
>>> data[3]
[560, 640]
I am not sure about your question, as to why would you think about using elif conditions.
If you store a list of integers in a dictionary, the key being values of the first column a.k.a condition value, and its corresponding value a list of reaction times.
For example:
The dict would be like:
conditions['3'] -> [560, 640, ...]

Categories