Permutations of a list of input numbers in Python - python

I'm supposed to write a program in Python that will get a list of numbers from the user until the user inputs 0, then print all the permutations of this list of numbers.
I was working on the code, which seemed right at the begining, but for some reason I'm getting weird output.
It seems like the list is static, and every time the function returns it adds objects to the latest list. This doesn't happen the way it was before the function was called recursively.
Here is what I have so far:
def permutation(numberList,array,place):
if (place==len(numberList)):
print array
else:
x=0
while (x < len(numberList)):
array.append(numberList[x])
permutation(numberList,array,place+1)
x+=1
def scanList():
numberList=[];
number=input()
#keep scanning for numbers for the list
while(number!=0):
numberList.append(number)
number=input()
return numberList
permutation(scanList(),[],0)
This is the output for the input 1 2 3 0, for example:
[1, 1, 1]
[1, 1, 1, 2]
[1, 1, 1, 2, 3]
[1, 1, 1, 2, 3, 2, 1]
[1, 1, 1, 2, 3, 2, 1, 2]
[1, 1, 1, 2, 3, 2, 1, 2, 3]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1, 2]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1, 2, 3]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1, 2, 3, 2, 1]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1, 2, 3, 2, 1, 2]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1, 2, 3, 2, 1, 2, 3]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2]
[1, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 2, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3, 3, 1, 1, 2, 3, 2, 1, 2, 3, 3, 1, 2, 3]
I would appreciate any help.

The thing is, the list [] in Python is dynamic. So when you add elements to it with array.append(numberList[x]), they stay there forever. Just remove the added element after your recursive call:
def permutation(numberList,array,place):
if (place==len(numberList)):
print(array)
else:
x=0
while (x < len(numberList)):
array.append(numberList[x])
permutation(numberList,array,place+1)
array.pop()
x+=1
That's actually the common way to write depth-first search algorithms: modify your structure, make recursive call, undo modifications. The result of your program doesn't seem to be permutations of the input though.

The Python way is to use itertools.
from itertools import permutations
for permutation in permutations([1,2,3]):
print(permutation)
Now whats wrong with your algorithm. As you noticed, the list is static (well not really, but your using the same list every time)
A simple fix would be to copy the list each time.
def permutation(numberList,array,place):
if (place==len(numberList)):
print array
else:
x=0
while (x < len(numberList)):
array2 = array[:] // here happens the copy
array2.append(numberList[x])
permutation(numberList,array2,place+1)
x+=1

Related

How to define a constant function defined in intervals in python?

I want to define a simple function which assumes different constant value (y=[1,4,2,3]) for defined intervals.
I implement it in this way:
import numpy as np
def f(x):
if (x>=0 and x<=1900):
return 1
if (x>1900 and x<=3600):
return 4
if (x>3600 and x<=5400):
return 2
if (x>5400 and x<=7200):
return 3
x=np.linspace(0,7200,1000)
y=f(x)
However, when I run the script, an error appears:
"ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()"
Do you know how to fix this?
The reason is that your function is only applicable to a single element rather than vectorization. np.vectorize is a general solution, but its performance is poor. For the example here, you can use np.searchsorted to vectorize:
>>> np.array([1, 1, 4, 2, 3])[np.searchsorted([0, 1900, 3600, 5400, 7200], x)]
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
x is not what you think it is.
Try print(x) and see what it is actually looking like. It is first up a list and inside the list contains
[ 0. 7.20720721 14.41441441 21.62162162 28.82882883 .... 7192.79279279 7200. ]
I am unsure what you are trying to achieve but x is an array and not a single value, therefor you either need to loop over it or point to the exact index you want to test.

Rosalind - Consensus and Profile - Issue with answer formatting

I am working on the Consensus and Profile problem on Rosalind, and I am so close to getting it done. My answer is correct, I have the right consensus string and the correct matrix, but I am having issues formatting my data for the answer. Rosalind expects the answer to look like:
ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6
My raw output looks like this:
{'A': [5, 3, 3, 3, 1, 4, 2, 1, 3, 5, 2, 2, 2, 3, 1, 3, 2, 2, 2, 4, 4, 4, 1, 2, 1, 3, 1, 2, 1, 2, 2, 3, 2, 1, 3, 5, 3, 4, 2, 2, 2, 3, 3, 2, 0, 0, 1, 2, 2, 4, 3, 5, 2, 4, 3, 1, 2, 2, 2, 3], 'C': [2, 1, 3, 2, 1, 2, 2, 1, 3, 2, 1, 2, 3, 2, 6, 3, 4, 1, 2, 0, 3, 2, 4, 2, 1, 3, 3, 3, 6, 2, 2, 1, 5, 5, 3, 0, 1, 1, 2, 3, 3, 5, 3, 2, 1, 2, 3, 5, 0, 2, 3, 2, 3, 2, 5, 3, 4, 3, 2, 4], 'G': [1, 3, 2, 4, 3, 2, 1, 3, 3, 0, 5, 3, 3, 2, 1, 2, 1, 5, 3, 2, 2, 2, 2, 4, 6, 3, 2, 3, 2, 3, 1, 3, 0, 2, 0, 3, 3, 3, 4, 2, 2, 2, 1, 3, 5, 2, 1, 0, 2, 1, 2, 1, 4, 2, 2, 3, 2, 0, 4, 2], 'T': [2, 3, 2, 1, 5, 2, 5, 5, 1, 3, 2, 3, 2, 3, 2, 2, 3, 2, 3, 4, 1, 2, 3, 2, 2, 1, 4, 2, 1, 3, 5, 3, 3, 2, 4, 2, 3, 2, 2, 3, 3, 0, 3, 3, 4, 6, 5, 3, 6, 3, 2, 2, 1, 2, 0, 3, 2, 5, 2, 1]}
And with some simple editing, I submit it as:
'A': [5, 3, 3, 3, 1, 4, 2, 1, 3, 5, 2, 2, 2, 3, 1, 3, 2, 2, 2, 4, 4, 4, 1, 2, 1, 3, 1, 2, 1, 2, 2, 3, 2, 1, 3, 5, 3, 4, 2, 2, 2, 3, 3, 2, 0, 0, 1, 2, 2, 4, 3, 5, 2, 4, 3, 1, 2, 2, 2, 3]
'C': [2, 1, 3, 2, 1, 2, 2, 1, 3, 2, 1, 2, 3, 2, 6, 3, 4, 1, 2, 0, 3, 2, 4, 2, 1, 3, 3, 3, 6, 2, 2, 1, 5, 5, 3, 0, 1, 1, 2, 3, 3, 5, 3, 2, 1, 2, 3, 5, 0, 2, 3, 2, 3, 2, 5, 3, 4, 3, 2, 4]
'G': [1, 3, 2, 4, 3, 2, 1, 3, 3, 0, 5, 3, 3, 2, 1, 2, 1, 5, 3, 2, 2, 2, 2, 4, 6, 3, 2, 3, 2, 3, 1, 3, 0, 2, 0, 3, 3, 3, 4, 2, 2, 2, 1, 3, 5, 2, 1, 0, 2, 1, 2, 1, 4, 2, 2, 3, 2, 0, 4, 2]
'T': [2, 3, 2, 1, 5, 2, 5, 5, 1, 3, 2, 3, 2, 3, 2, 2, 3, 2, 3, 4, 1, 2, 3, 2, 2, 1, 4, 2, 1, 3, 5, 3, 3, 2, 4, 2, 3, 2, 2, 3, 3, 0, 3, 3, 4, 6, 5, 3, 6, 3, 2, 2, 1, 2, 0, 3, 2, 5, 2, 1]
But the editing still doesn't matter because of the metric f**k ton of commas and brackets that would need to be manually deleted as well, especially considering the fact that you only have 5 minutes to submit your answer - I've tried and I have found it impossible to format my answer manually in the five-minute window.
I was wondering if anyone knew of some tips, tricks, or solutions that can help me get over this hurdle. I have seen some other solutions, but they essentially require me to take a different approach to logic, which pisses me off because I spent a lot of time thinking about this answer and also creating my own function that manages the FASTA file format from scratch.
Here is my source code:
data = open('/Users/danielpintard/Downloads/rosalind_cons (1).txt', 'r').read()
if '>' in data :
data_array = data.split('>')
for i in data_array:
if i == '':
data_array.remove(i)
for i in data_array: data_array[data_array.index(i)] = i.split('\n', 2)
#create profile
prof_sequences = []
for i in data_array:
data_array[data_array.index(i)] = i[1]
prof_sequences.append(i[1])
n = len(prof_sequences[0])
profile_matrix = {
'A': [0]*n,
'C': [0]*n,
'G': [0]*n,
'T': [0]*n,
}
for dna in prof_sequences:
for position, nucleotide in enumerate(dna):
profile_matrix[nucleotide][position] += 1
result = []
#still having a hard time understanding this block of code
for position in range(n):
max_count = 0
max_nucleotide = None
for nucleotide in profile_matrix:
if profile_matrix[nucleotide][position] > max_count:
max_count = profile_matrix[nucleotide][position]
max_nucleotide = nucleotide
result.append(max_nucleotide)
print(profile_matrix)
print(result)
And here is the data:
>Rosalind_7283
TATTCATTGATCATATGAAGCCTTGCGACCTGCCCGGTTCTGAAGTCAGCTAGCACATTA
GTGTCAAGGTATTAGTGTAGTTGCTGACTCGAACGTGTGTTAATATTCATGTAGGGGTCT
GGCGACCCAATAGGCGCGTGGTGTACCGAATTGTGCACACACACGTGTATTTCGAACGCA
AGATGCAGCCGAATCAGACCGTAGTAAACCGTTTGAGTGGCGTTTTGGCGTGAGAAGGCT
TAGGTGTTACAAGTGCAGCGCGGGTGCATTTTCTCCGGCTTGGAGCAATAGTCCCTATGC
ATCGGCCCCGTATATGAGGATCGCATTACGCAACATCGTAAGCCTTGCACATCTGGCAAA
TGCACGGCTCTCATTATAGTTGCCAAAAATCAGCCCTACCACACGTAATATTCAAGGCTG
TGCTTGTCCAACTAGTTGGCGAATGATCCTCCAAGATTGCGGCGGGGTATAATCCCGCAC
GTCCGAATACCAATGTTCGAGTGCGGCACTACCAAGATGCGAGTCGGCGTGATATCGAGG
TTCACATAGGGGACGTTTATGTCCTTTGGATGTCTCGCCAACTCCATTCTATCATTAGGT
TGGCGGTCAGCAGGATGGAGCATAGTCATGAGCTTGAGTACTGTGCGGCTCGGAAGAGGG
GCCGTATGGGTCTTCGGACAAACGTAGGTATTACAGGCCAAAAACGCTCAGAAAAAACGC
TATCTTAATGACCATTTGATAAACGTTCCCTTGCCGATTTAGAGTGACTTAGTGCAATGT
TGCGATTTCTACTACACTCAAGCTGTGTTAGGGATAATCCATAGCACAGGCCCGCTCGCC
CGTGCCCTGCCTTGCACGACAGGGCTAAGCGGCTCAAGAAGTTTCTACGCAACGTACCAC
GACCAGCTGGACCTACCGATAGACTTACCATATTCTAAGAATAAAACGGACCCTTATGTG
AGTGAGCGCAAGCAATATGGTTTGCCCGTTTGC
>Rosalind_6559
TGCGGCCTATGTGACGCTCAACCCGGGACGCACATGAGTACATCTTTCTTCAACGTCCGC
GAACACAGCCCAATTCGATTAATTGCCACGATTGTGTGGCACGCACTTACTGAAACCGGT
GGTGATAGGCATAGGTTTAGACCAGGGCTCGGGACAGCTGGTCTAGGTCGTGACTAATCA
ATGGTTTAAATGATGCACCCTTGTATCGGTATGCCTGTGTTTATCAGGAATGCCCATACA
TTTTGAGAAACGCTTATGGTTATTACACAAGCGAGGGAAGCGAGCTAGCGGCGTCCGAGA
ACTATAAAGGCAATCCTGACATGACGAGCGCAGAATCACCCCCTGAATCCCGGTTACGAT
ATGGGCCATTCGGGGAGCAAACGGCTGACTCTTCGGTAAAGTAATTTGCCAGGAACATTG
ATATATGCGCTGACCCTATTGATTATCCAACAAATACTACATTCAGCCCCAGGTCCCACG
TTAGGCGTAAGTTAAGAATTTATGTACGCAATCGCCAATATCCGCAAGACGTCCCCGCTG
ACAGTGAGGCTTAGTGGCGCCGATGGTATTCAGAAATGGAGCGCTCCTCTGTTGCACTCG
GCTCGTCAACCTTCTTGCTATTAACATATAAGAGTGAATGGGTGAGGTAGTAAACGTAAT
TGCCAAGATCGATAGAAGTGTTGGACGGAACATTGGAGCAAGGAACGCCGCTGAGCCAGG
GGACTAATGCCAGAGTGGAACCTGGTGGGAATTAAACACTTGTATACGTTGACAAGCTGA
GACATTCTAAAACACGTAATATAACATGCATCCACTAATGGATTCCCTTTCGCCTCTTGG
CTGGGATACATTTGCGCTTGGGAGCAGGAGATAGGGAGTCAAGGTGACATTGTGGGAATT
CACAAAGCTTCCTATCTAATGTTAGTACTTTAGCCACGGGTTAACCAGGACTGTTCTATA
GTTCCAACTCTCCATTATCCAAAACAAGCAGCA
>Rosalind_3098
GGCATAGGGACGCCGATGTAAGGAAATCCTCTAGTTTGAGCCCGGTGCTTACCGCAGTCC
TCGGCTTTCGTTCTGTTACAGACGTCCTAGGACTCAGTCGCCACCTACGCGGGGTGCATT
AGTCAGGTCCGAAGCCTCTATAGCGCTTTTTAGGAATGGAGCGTTTAAACGAGCCTGCGT
ATATTCGCTACCAAATCTCAGGGGCGGCTCAGATACAACGGGGTTCATCAGTTTGGATAT
CAGTGTTCGCTGGGTAAGTCTGACTCCGGCTCACAGATAGTTAGAAGGTCGCACATGATG
ATTACAACTTCTGCCGCTGACTTGGGAGTCTAGCGCTTGTCACAGACGCGCTAATGCGGC
ACATCTATTTCATAAAAGTACAAGCAATATGCCGCGAGGCCCCGTCGTATTTGATTCGAA
GGATTTAACTCATAACGCGGCCCTCAGCAGCTGCGGGCGTACGGAAGCCTCAACTTTGCG
ATTCTGTCGCACCTGCCTAGCTTAAGGAACCCCGATGCCGGTATCCACCGGACGTTTCGA
TTGCAAGATCTTGGCATGCCGCTACCTGTTGGAATTCAGTTATTAGTCCTACTCAGGAGG
GATAGCCGAACGGCACAAAGGCTTCGTTTGACAAGCACAGATGCATCTACTTAACTCGAT
AGCCTCAAAGAGTGTTTGCTCCGAGAGGGCCATCAGAGTAACTACCACGGCAAGAAGCGC
CTTCTTCCATGGCACACTCAAAAAGGTCATCTGAAGAGCCCATTTTTACCCACGGGATCC
CGCCACTAGACTCGTCACACTAAAACATAGAAGCAAGGCTGTAAGCGTACTCGGGTGTCC
CTAGCTACTTGACCCTGCGCTTTGATTTTCACCCAATCCAGCGCGTTAGCCAAACACCGG
CTCATGTGCGAGACACCTCTTGGACGGTACGAATACGCTTACTCCCACTCAGAACTGCTA
TCCGTGGGGTCCGTGGGGAGCCGGCGCAAAGAA
>Rosalind_2635
AACCTAAGCCACCTCGCGGTGTAACGCGCATCTGCAATCATCAGTTTCAGTCGGCGCAGC
GGAGCCCGGACAGCCTGTGCCGTACAAACCTGAAGCTGCTTACCTCGATTCATGCCAGGT
ATGAAGTATTCCGACGCTAATATCCTTTGGAATGGTTGCCAAGTCTCTACCAGCTACTCC
CATGACCGCATGACATATTCGACACGGTCTCTGAATGAGGTACGGTATTGCTTTCATTCT
AGTACGTTGCCCGACCTATGTACATCCGTCAACCACGGGGTGATCATACCTAAATTTGAA
TTAAAAAGTAGCGGAGCTACCGGACTGGTAGACTCCTCATCGCTCGGTTCAGTAGAAGGG
CTGGCCCTTTTCCTATCACTGTCCGTCCATTTCGTGTGTTTTAGGTGGTTTAGATATACC
TCTCATCGAAGAGTTGACCGTGTGATTAAATGAACGAACATTAAAGAGCGTGTGTTTAAA
TGCACGCAACACTAAAGGTGGAACATGGCGGTCGCCGTTATCGCATGGGTCTACTTGATC
GAAACTCAAGAGCATTGCAGACACAGGGACCCGTCAGGGTTTGTAAGCTGCGCGCTAATA
GTGCAACGTCCTAGGGTCGACTCCATGACGTAATGCAACTCTGGTTGACAATTCGTGAAG
TCGGAGTAAAGCTCCTGGCGCGCTGCACCCCCGGCTTCACCGTAGTTCCTACATTCTCGG
TCTAGTCGTGTGGGAATCACATCTGCTCCGAGGGTAAGGGGATTGGCATATAATGTGAGG
TAGCCGGCTAGGCGTATTAGCAACATCGTTGTCTATTGACTTGGAAGTTCTCTGTAGGAC
GTCGTCAGTCGGTAATCGCTGGTTTTAACTAAGGAGACACTGCTGGCACCGATGGCCGGG
GAGACCATTATGTATTCGGAGTGCCTCCGTTGTGGTGAATAACCAGGACTAATGAGGCCA
ACATAATACTAGACGTATACTATTTAGTGCGCT
>Rosalind_6087
ATTCGATGAATTTCCTCGATAGCGGCTCCGATTTAACACTACCTTGCCTTGACTCTCTAC
ACAGTAAGTACCCCCCGCAACTGGGGGACATTTTAGTGGCCCTTTGCGGAGTAGGGGTGT
TAGGTGTCGGCGTAAAGCGGATTCGATCAAACCCTGATCATCGGCTGAAATGGCCTCGAC
GGTGCTACTCTCAGTGACCTGCTGTTCCCGTAGCCTTTTAATACTCAATCCCTCGATCCG
CTATTCGACCAATCTCGAACTTGAATTCGGTGCGAATGAAACTCCAGTACGGTATGGCTT
GGACCGACGACGGAAGGAACTGCAACGTACCGACTTAATTTGGCTTCAATTCCTACCGAG
CATCATGCGGAAGCTACGCAATTGGATCTCAACAACCCCAAGAGACATTATAGTAGGACA
CACTTTATGGGATGCCGGGGACGGCATCTTCTGCAGGTTGGGAGGGCATCTTGCCTAGGT
GCCAACCTTCGGACGCTCAATGCTCTTACGGTCGGCAGGCTGTTCACGGAGGGCCTTATT
GGAAAAAGGTTATTTCACAAACGTTAAGTCCCTCAGATGACGTCTTGCGTCTCGCCAAGC
CTTTCTAGCTCCCGTCCAGGGCTTGAGCTTTCTTGACACGATAGCTTCCACGTTGACTCT
GAAAATCTCGAAAAACCGAAGGGGAGAGATGCGTCTTGGATCGTCCATAATGCTTCAGAC
GCTTCTAGCCTACCAGGTTGGTTAACAAGTTAATCCGCTAACTTATTGGCGCGTGAGCGA
CAGGACCGCGTCAGACTCATAGATACAGGGCTCATGGGGGCTATGTGTCTAATATGATCG
GCGACAAAGAGTTATGTAATGGCTTGGCTAGGAGACATAAAGGGGGACTTGATAGCGTTT
ACGAGCCTGTTCGGCCTCCCAAAGTTAACTAGATGAGACAGGATGTGCCCCGACACCCAC
GACTTCGTAAGGTAGAATAACGGACATAAGTCC
>Rosalind_4481
AAGGTGCTCAGAGACCTCGTTATGGATTGGTAACTATAGCAATTGCTTAAATCACGTTGT
TCAAATTTTGGGAACTGAATATGCTTCGGGCAATAGTATGAGTAGTCTAAATTGGGGAGT
GTAAGTGCGATTGGACACCACAAAGACAGGTAGTGAATGGGAGAGATTTGTTTGTAGCGC
GTTCGTGCGCGGGACGAGAAATGAATATCCTATTATCTGAAACCCGCCGCTGGGGCTGTA
GCGCCAAGAGCTTTCAGCGGGAGCTCCATGCGTGGAATCTTGCATCTACAATCACATATT
GGTAAGTAGCAACACTGACTGCAAGTACCACTCCCAGGAGAAGACTAGCCATTCAGTGTC
GCCGCTCACAAAGGGCGTAAAATGACATTCATGACGGCTAGCAGCGGACCACGATCCGTG
GCTCGCCGACACTCGGAACCATTCTTGTCTAATAGCTCAGCCCCAGGCTTTTCAACAGGG
GGCGACGCGACGAGCCTAATCGTTACGGATAAGGAGTGCGCACTAACTCGTCATCGGGGA
TAGACCAATTCTTGGAAAAGCAATCCTTAATATGATAGCTACTTGATGCATCTGTCGGCC
GGGGGACTGGACTGTCCTGAAATTGCTTAGGACTATATTTGAGCTTCCACTCCCACCCAG
GGGTGAGCAGATCCTGCCAAACGCGTATCCACTTAGATAAGCTCTTTAGCAAGGGGGCAG
CCTTTTTTCATCATGGTCTGCATTCGTGACTGAAATAATTCATCTCCACTGTACGTTACC
ATACCCTGACCACAATTTTTCCCAATGGGGTCATGCAAACGTACACACGTTTTGCGGCTG
GCTGAATTGCCGACTCATTTGTCCCGTATGCTAGCCCTGCTTGGATTCATAATTGTCTCG
CTCCGGACGTATTCGGGCCTGTGACAATCTTCCCACCTCATAGAACGCCCCAGAATACTC
GTTTTGCTGATGTCGCAGAACATTCTCCTCAGA
>Rosalind_0954
CTAATCTTGCGAATCAATCACAGGTGCGTTGATCCAGAGTCGTAGTTTTACAGTATGCAA
TGTATATTCTTTCTGATGGGACGAGTTTGCATGCAGTAGTTGGGTACTATGCCAGTGCGA
GACCGTCCCTCACCTAAATGCTATGCAGGGTTTCTCTACGATCAAATAGTCAAGTTGCTC
AGCCTCATCACATTGTGAATCACGGACAGACTGTAATTGTCAGCGTGTTCTCTAGGCAAA
TCGCCTTCCTTCTATCGACCTCCTTAGGTCCCCGTGAGGATCTCCTTATCCTGAAAAGTA
CAATCGGATACTTAGATTCTTCGCTCACTCTAATAGGTGGCTATACAGAAGTTTTATGGA
TAAGGGGTGTACGAAATCTTCGAGGGTGTATACCGCTGCTAGAACTCCATACATGATAAC
AACCAATCCTTAGCTAGTATACGAGGGATATGATAACGTTCCACCACCTCTTAAACTTTT
AAATTTGATCGCGGGTGGCCGTCGAAGTGTACGTATGAGATTGGGGCGGTTGTAGTTGCC
AGTGAAAGGCATATGCGGATGGCCTTTGGGTCCTGGTCATTCTTTCTCGCAGGTCGAGCC
AGTGCCTCAAATGAAATTTTCTCCTTAGCAACGACTCCTTAGTTAGAGAAACCAATCCCC
CCATGCCTGCGGATCGTGGTCAGCATGACGTCTGGTTGAACCCTTAGCTGAACAGATGGC
GTATTGCCGTACGAGGGGACCTTATAGGCGGCCTACCACACCAGACGAAGAGTCCGAAGG
TACGCCAAACGCATATTCAGGACGTAAGTGGGAGGACCCTGAGCCTCATTGCCGACTGAA
GGTGAATCGCTGGCCCACTGCTAGTTCCTCCCTTCGCTAATGGTCACGGGAATATCGCCA
CCTCGTCGATGACGCTCGATTAGACCTGTAGGAACACAACATACTAGGTGGACACGGGAC
ACCGATTTACCCACGCCGGACAGTCGTTCTTAT
>Rosalind_3750
ACAGTGTCATGGGATCTGGAGACGTATCCAAGCTAAACGCGCGTTCTATACAGACGTCGA
AACACGGGGGGCGAACTGCTTTAGCGACATGCTCTTACTGAAGTCTAGACGCTAAGGGCT
TTAGACAGCGAATAGTGGTTGATAGGTATTGAGCCATCCGTGTAGAGCGTTAGAAGGCCA
CGGCTTACTTGGTTAAAAGCTGATTTGGGCGGTTACATTCTGGGGTTTAAATACTATCGA
GTATCGATGCTTTTCTATGTATTGAAGACTGGTAAGCTTTCCCCGACCAGGTCGCGCCAT
CGTACCTTCTGGGGAAACTAATGCGGCTGAGTCGGCGACTTCAGGATGTCCCGATACACG
CAGCGTCACAGGTAAACTCGCCTTATAACGCGTCCCCGTCGATAAGGCCGACCCTTTCAG
ATGCGCGGTGCTCCTTCGATTGTTGACGACGCCATCCGAGGTCCAGACGTCTGAGGCCAC
GTGATCGGCCCCCTGTTACTGAGAAGCAGATTACCCCTAAGAATCGTCCGTCGCCTAGTA
GTTGCCGCAACCGACGATACTTCTCCAACATAATCTAGCGTATTTATCAAAGCGTCGTCG
TATCTAGCCTTACGGACGTAATACGAATACCCCCTGCTCAGTGGGCATGTAATACGCCAA
CCAAAAACACGCCAGTTACGAGGAGTGGCACTGCTATAAACCTAGATGAGATCGCTGATG
CCACGAGGAACCTTAGTTGAGTCCGCTGAACCCGCCAGTTGGCTTTGCAGGTCCGCGTTG
TTACTATGACTAAAATATATGATGGATACGCGGACCACTCCTACAGATGCTAAAAGTCAA
ACCGGCACCTATTAGATTTTTAACGGTGCACTTCTAACCGACATAGCCCGCGACCAGGGG
TGAAATTGCATTACATACGATATGATCGCTCCCAGGTCAATGACCACTTGACCTGTGAGT
TTGCTTATTAAGGTGGCTTTAGGCAGCGTAAGC
>Rosalind_9350
ATGAATTTTTAGCGCAAATGAACCGCCTGCTTCCATTAAGTCCCCGCTGCAGAAACCTCG
TTTGTATTCAGAAAGTTCACCTGACAACGGGGCATAGGGTAAATAGATGCTATGTAAATC
TTAGGGCTTACGCGGCGACTTTGACTTTTTCAGCGAACAGAGGCGAAGGCGACCAGCGTC
ATAGGTCTTCATACCGAAACAACAGGGGAGCATGGCCAATCACTGTCACTAACTCACGGG
ACTCCGCCTTGCTCGCCGGTGCCATATCGTACTGACGTAACTCATTGAATTCCATAGAAC
TTGGTTTAGGCCACCTCCGCCGAAACCCGTGGTGGTAAGTCAAGCGAGGACACCGGAAAT
TCCGACCCCGGTTCCCAACACAGGGCTATTCATCACATTTGGTGTACGTATTGATCCTTA
ATTGCCAGAGTCCTACTCGTTGATGTACGATCCACTTAAGTAAGGTCGGGCGTTCTACCG
CGCGGCGCATACCGGACATTATAGCTTAGGCCCCCCAGCTCTATTGTTATTACTATATCC
CTAATTCTAGAAGGGAAATTGTAAGATCAATTCCCGGCAGGTGGGCAGGAACAGACGTCG
AGCACCATTCGTAGTAAAGGTCTTTCTCGGTGTGTAGCGTTGACAAATCTGCAACCCAAC
CTTGTACTCTTCGCTGAACAATAGGTGCATTTCAAGACCGAGCTTGGCGCTGTTTCCTGA
CTGCAGCATGGGCAAAATTCTCGTAGGCAAGTGATCAATTAGCGGAACGCATTGGAAAAA
TTTGTTGGCACAATCCGGCACAGGTACTGATACCCCTCGATGTCGCAGTGCCGAGTCACC
CATCGCATGATCTGAGGTTGGTGCTGCCAGCGCTCTCCGAACAGGAGTCGTAGTTGCACT
CATGGCCGCTTTACGACGGGAGAAACTTACAGTAGCCTTGTAACAACTTTGTAAATCGTT
CATGGACTATCGTGAGGCAGACTTCTATTGTCC
>Rosalind_6074
CGAGGTAACAGTTGTCCGTTCTTTGTAGATTGCCTGGGGTGAAGGTACTAGTTAGCAATG
ATCAGAAGAAAATAGAGCCAGCCGGACTCTCGGGGCGGTACCAGGGTCGAGGAATCTGGG
TAAGTTTCCTATGTGATGAACAGGGTTTTCGATGGTAACGATGTGAACGACCCTGGGTCG
GGTTCAGCCCTCCTAACGAAACACGTGCTTCAGAAAAATAGTTGCAACCTGTTGTTGTCA
ACCTAGTCCTATAGAGTATGTTACTCGGCTATACTCAGGACCTATCCAGACCGCCACTCT
TTCTCTGTGTTAAAACCCCACCATATAAGATCCGTCCTCCCTTTTCACCGCCTTTACAGC
AGGGAGCCGTTGAGCAGGGCCAATGACGCCAAGACTTTACTAAAGTGACTGGTAGGTTCA
TTCTACCTATCCCTTTGCGTATTGATGTTTAGTCTGGTTTCAGGTACAGGTAAACCAGGT
GGCTGGTGCCATACTCGCTAAACAAATGTGGGGGCGCGAAAGATCTGGTGCAGGTTGACT
ACGATTTTATAGAGCAGTACACCGTGCTAGTCAGCATGAGTGGAGACACCTGAAATAAGT
GACGAGGTTGTCCAATGTATAGGACGACAGTTGCAGGGTGCACTGCAACAGAGTTATAAC
CATTACGTTGACTTAACACATGATTGTTAAAATGCTTCGACCCAAGACTCGGCGGGTCAA
AGTAAACCATTACGCGCGGGTGTCTGTAGCTACGGGTCAGCAGGGACCTAGCTATTACGA
GATAGGAAGGCCCACGTACCTAGGGGTCCCTTTTTCGGGTCTTTACCTGGTCAGCGAAGC
CCCGAAACGTGAACTCCAGTGATAACAGGTTAACGGCTTCTGGTGACGACTCTATCGAGT
TGTCAATGTAGCTTACAGGTACTATCGGGAATAATGTCGGGGGTGAACGTTGCGGTTTAA
AGTGGCTCAGCAAGCATATACACCTAGGTTGCG
Try using format strings:
f'{expression}'
str.join()
dict.items()
Code:
d = {'A': [5, 3, 3, 3, 1, 4, 2, 1, 2, 3], 'C': [2, 1, 3, 2, 1, 2, 2, 1, 3, 3], 'G': [1, 3, 2, 4, 3, 2, 1, 3, 3, 0], 'T': [2, 3, 2, 1, 5, 2, 5, 5, 1, 3]}
for k, v in d.items(): #loop over your output
g = " ".join(str(v) for v in v) #join list values
print(f'{k:}: {g:2}') #format text
Result:
A: 5 3 3 3 1 4 2 1 2 3
C: 2 1 3 2 1 2 2 1 3 3
G: 1 3 2 4 3 2 1 3 3 0
T: 2 3 2 1 5 2 5 5 1 3

Nested looping a Python list

I have a list of length 6 with L[5] = 3 fixed. I need to loop it around 243 times( or something) to get the combination in a particular order.
The L[4] will keep going 1,2,3 and L[3] would be the same for that time and then it moves right to left. I know it will be some nested loop but can't do it properly. The list can be initialized as anything but L[5] needs to be 3 and the first combination will be L = [1,1,1,1,1,3] and the last combination will be [3,3,3,3,3,3]
IIUC you can use itertools.product:
from itertools import product
vals = [[1, 2, 3]] * 5 + [[3]]
for c in product(*vals):
print(c)
Prints:
(1, 1, 1, 1, 1, 3)
(1, 1, 1, 1, 2, 3)
(1, 1, 1, 1, 3, 3)
(1, 1, 1, 2, 1, 3)
(1, 1, 1, 2, 2, 3)
(1, 1, 1, 2, 3, 3)
(1, 1, 1, 3, 1, 3)
(1, 1, 1, 3, 2, 3)
(1, 1, 1, 3, 3, 3)
(1, 1, 2, 1, 1, 3)
...
(3, 3, 3, 3, 1, 3)
(3, 3, 3, 3, 2, 3)
(3, 3, 3, 3, 3, 3)
Something like this should work:
list = []
for i in range(1,4):
for j in range(1,4):
for k in range(1,4):
for l in range(1,4):
for m in range(3,4):
list.append([i,j,k,l,m])
for elem in list:
print(elem)
Output:
[1, 1, 1, 1, 3]
[1, 1, 1, 2, 3]
[1, 1, 1, 3, 3]
[1, 1, 2, 1, 3]
[1, 1, 2, 2, 3]
[1, 1, 2, 3, 3]
[1, 1, 3, 1, 3]
[1, 1, 3, 2, 3]
[1, 1, 3, 3, 3]
[1, 2, 1, 1, 3]
[1, 2, 1, 2, 3]
[1, 2, 1, 3, 3]
[1, 2, 2, 1, 3]
[1, 2, 2, 2, 3]
[1, 2, 2, 3, 3]
[1, 2, 3, 1, 3]
[1, 2, 3, 2, 3]
[1, 2, 3, 3, 3]
[1, 3, 1, 1, 3]
[1, 3, 1, 2, 3]
[1, 3, 1, 3, 3]
[1, 3, 2, 1, 3]
[1, 3, 2, 2, 3]
[1, 3, 2, 3, 3]
[1, 3, 3, 1, 3]
[1, 3, 3, 2, 3]
[1, 3, 3, 3, 3]
[2, 1, 1, 1, 3]
[2, 1, 1, 2, 3]
[2, 1, 1, 3, 3]
[2, 1, 2, 1, 3]
[2, 1, 2, 2, 3]
[2, 1, 2, 3, 3]
[2, 1, 3, 1, 3]
[2, 1, 3, 2, 3]
[2, 1, 3, 3, 3]
[2, 2, 1, 1, 3]
[2, 2, 1, 2, 3]
[2, 2, 1, 3, 3]
[2, 2, 2, 1, 3]
[2, 2, 2, 2, 3]
[2, 2, 2, 3, 3]
[2, 2, 3, 1, 3]
[2, 2, 3, 2, 3]
[2, 2, 3, 3, 3]
[2, 3, 1, 1, 3]
[2, 3, 1, 2, 3]
[2, 3, 1, 3, 3]
[2, 3, 2, 1, 3]
[2, 3, 2, 2, 3]
[2, 3, 2, 3, 3]
[2, 3, 3, 1, 3]
[2, 3, 3, 2, 3]
[2, 3, 3, 3, 3]
[3, 1, 1, 1, 3]
[3, 1, 1, 2, 3]
[3, 1, 1, 3, 3]
[3, 1, 2, 1, 3]
[3, 1, 2, 2, 3]
[3, 1, 2, 3, 3]
[3, 1, 3, 1, 3]
[3, 1, 3, 2, 3]
[3, 1, 3, 3, 3]
[3, 2, 1, 1, 3]
[3, 2, 1, 2, 3]
[3, 2, 1, 3, 3]
[3, 2, 2, 1, 3]
[3, 2, 2, 2, 3]
[3, 2, 2, 3, 3]
[3, 2, 3, 1, 3]
[3, 2, 3, 2, 3]
[3, 2, 3, 3, 3]
[3, 3, 1, 1, 3]
[3, 3, 1, 2, 3]
[3, 3, 1, 3, 3]
[3, 3, 2, 1, 3]
[3, 3, 2, 2, 3]
[3, 3, 2, 3, 3]
[3, 3, 3, 1, 3]
[3, 3, 3, 2, 3]
[3, 3, 3, 3, 3]

How to get matplotlib bar chart to match numeric count in python terminal

My main objective is to be consistent with both my numeric output and my visual output. However, I can't seem to get to them to match.
Here is my setup using python 3.x:
df = pd.DataFrame([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],columns=['Expo'])
Followed by my setup for the bar chart in matplotlib:
x = df['Expo']
N = len(x)
y = range(N)
width = 0.125
plt.bar(x, y, width, color="blue")
fig = plt.gcf();
A Nice pretty graph produced:
However, using this snippet code to check and see what the actual numeric counts of both classes are...
print("Class 1: "+str(df['Expo'].value_counts()[1]),"Class 2: "+str(df['Expo'].value_counts()[2]))
I get the below:
Class 1: 85 Class 2: 70
Since I have 155 records in the data frame, numerically this makes sense. Having a single bar in the bar chart be at 155 does not.
I appreciate any help in advance.
I guess something like this is what you're after:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2],columns=['Expo'])
# Count number of '1' and '2' elements in df
N1, N2 = len(df[df['Expo'] == 1]), len(df[df['Expo'] == 2])
width = 0.125
# Plot the lengths in x positions [1, 2]
plt.bar([1, 2], [N1, N2], width, color="blue")
fig = plt.gcf()
plt.show()
Which produces
You may use a histogram,
plt.hist(df["Expo"])
or specifying the bins
plt.hist(df["Expo"], bins=[0.5,1.5,2.5], ec="k")
plt.xticks([1,2])

Create histogram from dict of lists

I am trying to graph the frequency of the numbers 1, 2, and 3 that occur for certain keys in a dictionary (titled 'hat1' through 'hat10') and am having trouble converting my data (shown below) into a format that I might be able to graph.
data = {'hat9': [[1, 2, 3, 1, 2]], 'hat8': [[1, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], 'hat1': [[1, 2, 3]], 'hat3': [[1, 2, 3, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1]], 'hat2': [[1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'hat5': [[1, 2, 3, 2, 3, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2, 3, 2, 3, 2, 3, 3, 3, 3, 2, 3, 1, 3, 3, 3, 3]], 'hat4': [[1, 2, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 3, 1, 2, 1, 3, 2, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1]], 'hat7': [[1, 2, 3, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]], 'hat6': [[1, 2, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 1, 3]], 'hat10': [[1, 2, 3, 3, 3, 3, 3, 3, 1, 2, 2, 1, 2, 3, 3, 2, 3, 3, 3, 3, 3, 2, 1, 1, 3, 3, 1, 2, 2, 3, 3, 1, 3, 3, 3, 3, 3, 2, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 2, 1, 3, 2, 1, 3, 2, 3, 3, 1, 2, 1, 2, 3, 3, 1, 3, 2, 2, 1, 2, 3, 3, 1, 2, 3, 2, 3, 3, 1, 3, 3, 3, 3]]}
When I ran DataFrame.from_dict(data) I received output that looked like this:
In [100]: DataFrame.from_dict(data)
Out[100]:
hat1 hat10 \
0 [1, 2, 3] [1, 2, 3, 3, 3, 3, 3, 3, 1, 2, 2, 1, 2, 3, 3, ...
hat2 \
0 [1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
hat3 \
0 [1, 2, 3, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, ...
hat4 \
0 [1, 2, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, ...
hat5 \
0 [1, 2, 3, 2, 3, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, ...
hat6 \
0 [1, 2, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
hat7 \
0 [1, 2, 3, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]
hat8 hat9
0 [1, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ... [1, 2, 3, 1, 2]
I was hoping someone might be able to help me get the data into a more workable format that can be converted into a graph relatively easily. Thanks for your help.
If you want to create you histogram with Matplotlib, you don't really need to do much more than call its hist method with each hat you want to show. For example,
import pylab
pylab.hist(data['hat4'][0], bins=(1,2,3,4), align='left')
(You need to index at [0] because for some reason each of your dictionary values is a list of length 1, the single item itself being a list of data values).
If you need to aggregrate the hats in some way, you need to say how.
You can do the same with a pandas DataFrame if you prefer:
import pandas as pd
df = pd.DataFrame(data)
pylab.hist(df['hat4'], bins=(1,2,3,4), align='left')
Try this out:
data = {'hat9': [[1, 2, 3, 1, 2]], 'hat8': [[1, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3]], 'hat1': [[1, 2, 3]], 'hat3': [[1, 2, 3, 1, 2, 2, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1]], 'hat2': [[1, 2, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], 'hat5': [[1, 2, 3, 2, 3, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 3, 1, 3, 2, 3, 2, 3, 2, 3, 3, 3, 3, 2, 3, 1, 3, 3, 3, 3]], 'hat4': [[1, 2, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 2, 1, 1, 2, 1, 1, 2, 3, 1, 2, 1, 3, 2, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1]], 'hat7': [[1, 2, 3, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]], 'hat6': [[1, 2, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 1, 1, 3]], 'hat10': [[1, 2, 3, 3, 3, 3, 3, 3, 1, 2, 2, 1, 2, 3, 3, 2, 3, 3, 3, 3, 3, 2, 1, 1, 3, 3, 1, 2, 2, 3, 3, 1, 3, 3, 3, 3, 3, 2, 3, 1, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 2, 1, 3, 2, 1, 3, 2, 3, 3, 1, 2, 1, 2, 3, 3, 1, 3, 2, 2, 1, 2, 3, 3, 1, 2, 3, 2, 3, 3, 1, 3, 3, 3, 3]]}
keys = []
values = []
for key,value in data.iteritems():
keys.append(key)
a = 0
b = 0
c = 0
for x in value[0]:
if x==1: a+=1;
elif x ==2: b+=1;
elif x==3: c+=1;
values.append([a,b,c])
print keys
print values
Hopefully that helps. Keys is ['hat9', 'hat8', etc.,..] and values = [[freq of 1 in 'hats9', freq of 2 in 'hats9', freq of 3 in 'hats9'], [freq of 1 in 'hats8', freq of 2 in 'hats8', freq of 3 in 'hats8'],..] (a list of 3 item lists)

Categories