Python if/else statement confusion - python

How can you create an if else statement in python when you have a file with both text and numbers. Let's say I want to replace the values from the third to last column in the file below. I want to create an if else statement to replace values <5 or if there's a dot "." with a zero, and if possible to use that value as integer for a sum.
A quick and dirty solution using awk would look like this, but I'm curious on how to handle this type of data with python:
awk -F"[ :]" '{if ( (!/^#/) && ($9<5 || $9==".") ) $9="0" ; print }'
So how do you solve this problem?
Thanks
Input file:
\##Comment1
\#Header
sample1 1 2 3 4 1:0:2:1:.:3
sample2 1 4 3 5 1:3:2:.:3:3
sample3 2 4 6 7 .:0:6:5:4:0
Desired output:
\##Comment1
\#Header
sample1 1 2 3 4 1:0:2:0:0:3
sample2 1 4 3 5 1:3:2:0:3:3
sample3 2 4 6 7 .:0:6:5:4:0
SUM = 5
Result so far
['sample1', '1', '2', '3', '4', '1', '0', '2', '0', '0', '3\n']
['sample2', '1', '4', '3', '5', '1', '3', '2', '0', '3', '3\n']
['sample3', '2', '4', '6', '7', '.', '0', '6', '5', '4', '0']
Here's what I have tried so far:
import re
data=open("inputfile.txt", 'r')
for line in data:
if not line.startswith("#"):
nodots = line.replace(":.",":0")
final_nodots=re.split('\t|:',nodots)
if (int(final_nodots[8]))<5:
final_nodots[8]="0"
print (final_nodots)
else:
print(final_nodots)

data=open("inputfile.txt", 'r')
import re
sums = 0
for line in data:
if not line.startswith("#"):
nodots = line.replace(".","0")
final_nodots=list(re.findall('\d:.+\d+',nodots)[0])
if (int(final_nodots[6]))<5:
final_nodots[6]="0"
print(final_nodots)
sums += int(final_nodots[6])
print(sums)
You were pretty close but you your final_nodots returns a split on : instead of a split on the first few numbers, so your 8 should have been a 3. After that just add a sums counter to keep track of that slot.
['sample1 1 2 3 4 1', '0', '2', '0', '0', '3\n']
There are better ways to achieve what you want but I just wanted to fix your code.

Related

Get the number of specific dictionary values being stored in a 2-dimensional list in Python

I have a dictionary that contains fruits as keys and a 2-dimensional list including the line number and the timestamp, in which the fruit name occurs in the transcript file, as values. The 2-dimensional list is needed because the fruits appear several times in the file and I need to consider each single occurrence. The dictionary looks like this:
mydict = {
'apple': [['1', '00:00:03,950'], # 1
['1', '00:00:03,950'], # 2
['9', '00:00:24,030'], # 3
['11', '00:00:29,640']], # 4
'banana': [['20', '00:00:54,449']], # 5
'cherry': [['14', '00:00:38,629']], # 6
'orange': [['2', '00:00:06,840'], # 7
['2', '00:00:06,840'], # 8
['3', '00:00:09,180'], # 9
['4', '00:00:10,830']], # 10
}
Now, I would like to print the number of all fruits in total, so my desired solution is 10. Hence I want to count the number of the values, but not of each single list item, though... only of the whole list, so to say (see the comments which should clarify what I mean).
For this purpose, I tried:
print(len(mydict.values()))
But this code line just gives me the number 4 as result.
And the following code does not work for me either:
count = 0
for x in mydict:
if isinstance(mydict[x], list):
count += len(mydict[x])
print(count)
Has anyone an idea how to get the number 10?
You can obtain the lengths of the sub-lists by mapping the them to the len function and then add them up by passing the resulting sequence of lengths to the sum function:
sum(map(len, mydict.values()))
#blhsing solution is the best. If you want to keep it with loops, you can do:
mydict = {
'apple': [['1', '00:00:03,950'], # 1
['1', '00:00:03,950'], # 2
['9', '00:00:24,030'], # 3
['11', '00:00:29,640']], # 4
'banana': [['20', '00:00:54,449']], # 5
'cherry': [['14', '00:00:38,629']], # 6
'orange': [['2', '00:00:06,840'], # 7
['2', '00:00:06,840'], # 8
['3', '00:00:09,180'], # 9
['4', '00:00:10,830']], # 10
}
n_fruits = 0
for fruit, occurences_of_fruit in mydict.items():
# increment n_fruits by the number of occurence of the fruit
# BTW occurences_of_fruit and mydict[fruit] are the same thing
n_fruits += len(occurences_of_fruit)
print(n_fruits) # 10

Create a dictionary from lists, overwrite duplicate keys

I have my code below. I am trying to create a dictionary from my lists extracted from a txt file but the loop overwrites the previous information:
f = open('data.txt','r')
lines = f.readlines()
lines = [line.rstrip('\n') for line in open('data.txt')]
columns=lines.pop(0)
for i in range(len(lines)):
lines[i]=lines[i].split(',')
dictt={}
for line in lines:
dictt[line[0]]=line[1:]
print('\n')
print(lines)
print('\n')
print(dictt)
I know I have to play with:
for line in lines:
dictt[line[0]] = line[1:]
part but what can I do , do I have to use numpy? If so, how?
My lines list is :
[['USS-Enterprise', '6', '6', '6', '6', '6'],
['USS-Voyager', '2', '3', '0', '4', '1'],
['USS-Peres', '10', '4', '0', '0', '5'],
['USS-Pathfinder', '2', '0', '0', '1', '2'],
['USS-Enterprise', '2', '2', '2', '2', '2'],
['USS-Voyager', '2', '1', '0', '1', '1'],
['USS-Peres', '8', '5', '0', '0', '4'],
['USS-Pathfinder', '4', '0', '0', '2', '1']]
My dict becomes:
{'USS-Enterprise': ['2', '2', '2', '2', '2'],
'USS-Voyager': ['2', '1', '0', '1', '1'],
'USS-Peres': ['8', '5', '0', '0', '4'],
'USS-Pathfinder': ['4', '0', '0', '2', '1']}
taking only the last ones, I want to add the values together. I am really confused.
You are trying to append multiple values for the same key. You can use defaultdict for that, or modify your code and utilize the get method for dictionaries.
for line in lines:
dictt[line[0]] = dictt.get(line[0], []).extend(line[1:])
This will look for each key, assign the line[1:] if the key is unique, and if it is duplicate, simply append those values onto the previous values.
dict_output = {}
for line in list_input:
if line[0] not in dict_output:
dict_output[line[0]] = line[1:]
else:
dict_output[line[0]] += line[1:]
EDIT: You subsequently clarified in comments that your input has duplicate keys, and you want later rows to overwrite earlier ones.
ORIGINAL ANSWER: The input is not a dictionary, it's a CSV file. Just use pandas.read_csv() to read it:
import pandas as pd
df = pd.read_csv('my.csv', sep='\s+', header=None)
df
0 1 2 3 4 5
0 USS-Enterprise 6 6 6 6 6
1 USS-Voyager 2 3 0 4 1
2 USS-Peres 10 4 0 0 5
3 USS-Pathfinder 2 0 0 1 2
4 USS-Enterprise 2 2 2 2 2
5 USS-Voyager 2 1 0 1 1
6 USS-Peres 8 5 0 0 4
7 USS-Pathfinder 4 0 0 2 1
Seems your input didn't have a header row. If your input columns had names, you can add them with df.columns = ['Ship', 'A', 'B', 'C', 'D', 'E'] or whatever.
If you really want to write a dict output (beware of duplicate keys being suppressed), see df.to_dict()

Python list folders list by numeric order

I'm getting list of all the folders in specific directory with this code:
TWITTER_FOLDER = os.path.join('TwitterPhotos')
dirs = [d for d in os.listdir(TWITTER_FOLDER) if os.path.isdir(os.path.join(TWITTER_FOLDER, d))]
This is the array: ['1','2','3','4','5','6','7','8','9','10','11'].
And I want to get the array in this order:['11','10','9','8','7','6','5','4','3','2','1']
So I use this code for that:
dirs.sort(key=lambda f: int(filter(str.isdigit, f)))
and when I use it I get this error:
int() argument must be a string, a bytes-like object or a number, not 'filter'
Any idea what is the problem? Or how I can sort it in another way?
It's important that the array will be sort by numeric order like:
12 11 10 9 8 7 6 5 4 3 2 1
And not:
9 8 7 6 5 4 3 2 12 11 1
Thanks!
Filter returns an iterator, you need to join them back into a string before you can convert it to an integer
dirs.sort(key=lambda f: int(''.join(filter(str.isdigit, f))))
Use sorted with key:
In [4]: sorted(f, key=lambda x: int(x), reverse=True)
Out[4]: ['11', '10', '9', '8', '7', '6', '5', '4', '3', '2', '1']
Or you can do f.sort(key=lambda x:int(x), reverse=True) for inplace sort.

Separate a file in paragraphs

I have a file like this:
cluster number 1
1
2
3
cluster number 2
1
2
3
cluster number x
1
2
3
I want to split this file in paragraph of cluster numbers, like this
cluster number 1
1
2
3
I try to search for an answer but I can't handle it.
Thanks for your help!
user regular expression
import re
input_text = "..."
r = re.findall(r"(cluster number (\d+)\n\n(\d+)\n\n(\d+)\n\n(\d+))", input_text)
print r
this code return below list
[('cluster number 1\n\n1\n\n2\n\n3', '1', '1', '2', '3'),
('cluster number 2\n\n1\n\n2\n\n3', '2', '1', '2', '3')]
you can also see the detail explanation from here
As recommended, you should use regular expressions. Perhaps the re.split function would be suitable here:
>>> l = re.split('cluster number (?:\d+)', x)[1:]
>>> [a.split() for a in l]
[['1', '2', '3'], ['1', '2', '3'], ...]

Python: split by (different) n spaces

I have lines like this:
2 20 164 "guid" Some name^7 0 ip.a.dd.res:port -21630 25000
6 30 139 "guid" Other name^7 0 ip.a.dd.res:port 932 25000
I would like to split this, but the problem is that there is different number of spaces between this "words"...
How can I do this?
Python's split function doesn't care about the number of spaces:
>>> ' 2 20 164 "guid" Some name^7 0 ip.a.dd.res:port -21630 25000'.split()
['2', '20', '164', '"guid"', 'Some', 'name^7', '0', 'ip.a.dd.res:port', '-21630', '25000']
Have you tried split()? It will "compress" spaces, so after split you will get:
'2', '20', '164', '"guid'" etc.
>>> l = "1 2 4 'ds' 5 66"
>>> l
"1 2 4 'ds' 5 66"
>>> l.split(' ')
['1', '', '', '2', '', '', '4', "'ds'", '5', '', '66']
>>> [x for x in l.split()]
['1', '2', '4', "'ds'", '5', '66']
Just use split() function. The delimiter is \s+ that is any kind and any number of space

Categories