How to extract certain elements from a string? - python

I have a lot of files and I have saved all filenames to filelists.txt. Here is an example file:
cpu_H1_M1_S1.out
cpu_H1_M1_S2.out
cpu_H2_M1_S1.out
cpu_H2_M1_S2.out
When the program detects _H, _M, _S in the file name. I need to output the numbers that appear afterwards. For example:
_H _M _S
1 1 1
1 1 2
2 1 1
2 1 2
Thank you.

You could use a regexp:
>>> s = 'cpu_H2_M1_S2.out'
>>> re.findall(r'cpu_H(\d+)_M(\d+)_S(\d+)', s)
[('2', '1', '2')]
If it doesn't match the format exactly, you'll get an empty list as a result, which can be used to ignore the results. You could adapt this to convert the str's to int's if you wished:
[int(i) for i in re.findall(...)]

something like this using regex:
In [13]: with open("filelists.txt") as f:
for line in f:
data=re.findall(r"_H\d+_M\d+_S\d+",line)
if data:
print [x.strip("HMS") for x in data[0].split("_")[1:]]
....:
['1', '1', '1']
['1', '1', '2']
['2', '1', '1']
['2', '1', '2']

Though I have nothing against regex itself, I think it's overkill for this problem. Here's a lighter solution:
five = operator.itemgetter(5)
seven = operator.itemgetter(7)
nine = operator.itemgetter(9)
with open("filelists.txt") as f:
for line in f:
return [(int(five(line)), int(seven(line)), int(nine(nine))) for line in f]
Hope that helps

Related

Python: Inserting into a list using length as index

All,
I've recently picked up Python and currently in the process of dealing with lists. I'm using a test file containing several lines of characters indented by a tab and then passing this into my python program.
The aim of my python script is to insert each line into a list using the length as the index which means that the list would be automatically sorted. I am considering the most basic case and am not concerned about any complex cases.
My python code below;
newList = []
for line in sys.stdin:
data = line.strip().split('\t')
size = len(data)
newList.insert(size, data)
for i in range(len(newList)):
print ( newList[i])
My 'test' file below;
2 2 2 2
1
3 2
2 3 3 3 3
3 3 3
My expectation of the output of the python script is to print the contents of the list in the following order sorted by length;
['1']
['3', '2']
['3', '3', '3']
['2', '2', '2', '2']
['2', '3', '3', '3', '3']
However, when I pass in my test file to my python script, I get the following;
cat test | ./listSort.py
['2', '2', '2', '2']
['1']
['3', '2']
['3', '3', '3']
['2', '3', '3', '3', '3']
The first line of the output ['2', '2', '2', '2'] is incorrect. I'm trying to figure out why it isn't being printed at the 4th line (because of length 4 which would mean that it would have been inserted into the 4th index of the list). Could someone please provide some insight into why this is? My understanding is that I am inserting each 'data' into the list using 'size' as the index which means when I print out the contents of the list, they would be printed in sorted order.
Thanks in advance!
Inserting into lists work quite differently than what you think:
>>> newList = []
>>> newList.insert(4, 4)
>>> newList
[4]
>>> newList.insert(1, 1)
>>> newList
[4, 1]
>>> newList.insert(2, 2)
>>> newList
[4, 1, 2]
>>> newList.insert(5, 5)
>>> newList
[4, 1, 2, 5]
>>> newList.insert(3, 3)
>>> newList
[4, 1, 2, 3, 5]
>>> newList.insert(0, 0)
>>> newList
[0, 4, 1, 2, 3, 5]
Hopefully you can see two things from this example:
The list indices are 0-based. That is to say, the first entry has index 0, the second has index 1, etc.
list.insert(idx, val) inserts things into the position which currently has index idx, and bumps everything after that down a position. If idx is larger than the current length of the list, the new item is silently added in the last position.
There are several ways to implement the functionality you want:
If you can predict the number of lines, you can allocate the list beforehand, and simply assign to the elements of the list instead of inserting:
newList = [None] * 5
for line in sys.stdin:
data = line.strip().split('\t')
size = len(data)
newList[size - 1] = data
for i in range(len(newList)):
print ( newList[i])
If you can predict a reasonable upper bound of the number of lines, you can also do this, but you need to have some way to remove the None entries afterwards.
Use a dictionary:
newList = {}
for line in sys.stdin:
data = line.strip().split('\t')
size = len(data)
newList[size - 1] = data
for i in range(len(newList)):
print ( newList[i])
Add elements to the list as necessary, which is probably a little bit more involved:
newList = []
for line in sys.stdin:
data = line.strip().split('\t')
size = len(data)
if len(newList) < size: newList.extend([None] * (size - len(newList)))
newList[size - 1] = data
for i in range(len(newList)):
print ( newList[i])
I believe I've figured out the answer to my question, thanks to mkrieger1. I append to the list and then sort it using the length as the key;
newList = []
for line in sys.stdin:
data = line.strip().split('\t')
newList.append(data)
newList.sort(key=len)
for i in range(len(newList)):
print (newList[i])
I got the output I wanted;
/listSort.py < test
['1']
['3', '2']
['3', '3', '3']
['2', '2', '2', '2']
['2', '3', '3', '3', '3']

Parse every column of a .csv file into a single list python

I have a following type of csv
a,b,c
1,2,3
4,5,6
7,8,9
I would like to parse every column of this csv file into a list with out columns so the end result would be
myList = ["1","4","7","2","5","8","3","6","9"]
I have found many solutions for one column but i need to be flexible to be able to read every column of the file. I'm using an older version of python so i can't use any solutions with pandas library.
You could read the file fully and then zip the rows to transpose them, then chain the result to flatten the list. Standalone example (using a list of strings as input):
import csv,itertools
text="""a,b,c
1,2,3
4,5,6
7,8,9
""".splitlines()
myList = list(itertools.chain.from_iterable(zip(*csv.reader(text[1:]))))
print(myList)
result:
['1', '4', '7', '2', '5', '8', '3', '6', '9']
from a file it would read:
with open("test.csv") as f:
cr = csv.reader(f,separator=",") # comma is by default, but just in case...
next(cr) # skip title
myList = list(itertools.chain.from_iterable(zip(*cr)))
Simple approach:
d = """a,b,c
1,2,3
4,5,6
7,8,9
"""
cells = []
for line in d.split("\n"):
if line:
cells.append(line.strip().split(','))
print(cells)
for n in range(len(cells[0])):
for r in cells:
print(r[n])
Same iteration, but as generator:
def mix(t):
for n in range(len(t[0])):
for r in t:
yield r[n]
print( list( mix(cells) ) )
Using csv and chain to flatten the list
import csv
from itertools import chain
l = list(csv.reader(open('text.csv', 'r')))
mylist = map(list, zip(*l[1:])) # transpose list
list(chain.from_iterable(mylist)) # output ['1', '4', '7', '2', '5', '8', '3', '6', '9']

How to make a 3D list from a txt file? python

I've been stuck on this for almost 3 days now, and I have tried so many different ways, but none of them worked!
The txt file (num.txt) looks like:
1234
4321
3214
3321
4421
2341
How can I put this file into a 3D list made up of 2 rows and 3 columns?
The output I am trying to achieve is:
[ [['1','2','3','4']['4','3','2','1']['3','2','1','4']], [['3','3','2','1']['4','4','2','1']['2','3','4','1']] ]
(I've spaced it out a little more in an attempt to make it easier to see!)
I thought it would be similar to making a 2D list, but nothing I tried worked! Can anyone please help?
Thank you!
Here's a very simple way to do it with some easy arithmetic:
with open('num.txt') as infile: # open file
answer = []
for i,line in enumerate(infile): # get the line number (starting at 0) and the actual line
if not i%3: answer.append([])
answer[-1].append(list(line.strip()))
You have to open the file and type-cast each string of line to list like:
my_list = []
sublist_size = 3
with open('/path/to/num.txt') as f:
file_lines = list(f)
for i in range(0, len(file_lines), sublist_size):
my_list.append([list(line.rstrip()) for line in file_lines[i:i+sublist_size]])
# ^ Remove `\n` from right of each line
Here my_list will hold the value you desire:
[[['1','2','3','4']['4','3','2','1']['3','2','1','4']],
[['3','3','2','1']['4','4','2','1']['2','3','4','1']]]
The solution using range() function and simple list comprehension:
with open('./text_files/num.txt', 'r') as fh: # change to your current file path
l = [list(l.strip()) for l in fh]
n = 3 # chunk size
result = [l[i:i + n] for i in range(0, len(l), n)] # splitting into chunks of size 3
print(result)
The output:
[[['1', '2', '3', '4'], ['4', '3', '2', '1'], ['3', '2', '1', '4']], [['3', '3', '2', '1'], ['4', '4', '2', '1'], ['2', '3', '4', '1']]]
Another options which I think is clearer and does not require loading the entire file into memory:
inner_size = 3
inner_range = range(inner_size) # precompute this since we'll be using it a lot
with open('/home/user/nums.txt') as f:
result = []
try:
while True:
subarr = []
for _ in inner_range:
subarr.append(list(f.next().rstrip()))
result.append(subarr)
except StopIteration:
pass
Using the built-in __iter__ on the file object, we build sub arrays and append them onto the resultant array, and use the StopIteration exception to know we're finished, discarding any extra data. You could easily if subarr: result.append(subarr) in the exception if you want to keep any partial subarr at the end.
Written as a list comprehension (although without the ability to recover any final, partial sublist):
inner_size = 3
inner_range = range(inner_size)
with open('/home/user/nums.txt') as f:
result = []
try:
while True:
result.append([list(f.next().rstrip()) for _ in inner_range])
except StopIteration:
pass

Adding dictionary keys and values after line split?

If I have for instance the file:
;;;
;;;
;;;
A 1 2 3
B 2 3 4
C 3 4 5
And I want to read it into a dictionary of {str: list of str} :
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']
I have the following code:
d = {}
with open('file_name') as f:
for line in f:
while ';;;' not in line:
(key, val) = line.split(' ')
#missingcodehere
return d
What should I put in after the line.split to assign the keys and values as a str and list of str?
To focus on your code and what you are doing wrong.
You are pretty much in an infinite loop with your while ';;;' not in line. So, you want to change your logic with how you are trying to insert data in to your dictionary. Simply use a conditional statement to check if ';;;' is in your line.
Then, when you get your key and value from your line.strip().split(' ') you simply just assign it to your dictionary as d[key] = val. However, you want a list, and val is currently a string at this point, so call split on val as well.
Furthermore, you do not need to have parentheses around key and val. It provides unneeded noise to your code.
The end result will give you:
d = {}
with open('new_file.txt') as f:
for line in f:
if ';;;' not in line:
key, val = line.strip().split(' ')
d[key] = val.split()
print(d)
Using your sample input, output is:
{'C': ['3', '4', '5'], 'A': ['1', '2', '3'], 'B': ['2', '3', '4']}
Finally, to provide an improvement to the implementation as it can be made more Pythonic. We can simplify this code and provide a small improvement to split more generically, rather than counting explicit spaces:
with open('new_file.txt') as fin:
valid = (line.split(None, 1) for line in fin if ';;;' not in line)
d = {k:v.split() for k, v in valid}
So, above, you will notice our split looks like this: split(None, 1). Where we are providing a maxsplit=1.
Per the docstring of split, it explains it pretty well:
Return a list of the words in S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
Finally, we simply use a dictionary comprehension to obtain our final result.
Why not simply:
def make_dict(f_name):
with open(f_name) as f:
d = {k: v.split()
for k, v in [line.strip().split(' ')
for line in f
if ';;;' not in line]}
return d
Then
>>> print(make_dict('file_name'))
{'A': ['1', '2', '3'], 'B': ['2', '3', '4'], 'C': ['3', '4', '5']}

Python: Very Basic, Can't figure out why it is not splitting into the larger number listed but rather into individual integers

Really quick question here, some other people helped me on another problem but I can't get any of their code to work because I don't understand something very fundamental here.
8000.5 16745 0.1257
8001.0 16745 0.1242
8001.5 16745 0.1565
8002.0 16745 0.1595
8002.5 16745 0.1093
8003.0 16745 0.1644
I have a data file as such, and when I type
f1 = open(sys.argv[1], 'rt')
for line in f1:
fields = line.split()
print list(fields [0])
I get the output
['1', '6', '8', '2', '5', '.', '5']
['1', '6', '8', '2', '6', '.', '0']
['1', '6', '8', '2', '6', '.', '5']
['1', '6', '8', '2', '7', '.', '0']
['1', '6', '8', '2', '7', '.', '5']
['1', '6', '8', '2', '8', '.', '0']
['1', '6', '8', '2', '8', '.', '5']
['1', '6', '8', '2', '9', '.', '0']
Whereas I would have expected from trialling stuff like print list(fields) to get something like
[16825.5, 162826.0 ....]
What obvious thing am I missing here?
thanks!
Remove the list; .split() already returns a list.
You are turning the first element of the fields into a list:
>>> fields = ['8000.5', '16745', '0.1257']
>>> fields[0]
'8000.5'
>>> list(fields[0])
['8', '0', '0', '0', '.', '5']
If you want to have the first column as a list, you can build a list as you go:
myfirstcolumn = []
for line in f1:
fields = line.split()
myfirstcolumn.append(fields[0])
This can be simplified into a list comprehension:
myfirstcolumn = [line.split()[0] for line in f1]
The last command is the problem.
print list(fields[0]) takes the zero'th item from your split list, then takes it and converts it into a list.
Since you have a list of strings already ['8000.5','16745','0.1257'], the zero'th item is a string, which converts into a list of individual elements when list() is applied to it.
Your first problem is that you apply list to a string:
list("123") == ["1", "2", "3"]
Secondly, you print once per line in the file, but it seems you want to collect the first item of each line and print them all at once.
Third, in Python 2, there's no 't' mode in the call to open (text mode is the default).
I think what you want is:
with open(sys.argv[1], 'r') as f:
print [ line.split()[0] for line in f ]
The problem was you were converting the first field which you correctly extracted into a list.
Here's a solution to print the first column:
with open(sys.argv[1]) as f1:
first_col = []
for line in f1:
fields = line.split()
first_col.append(fields[0])
print first_col
gives:
['8000.5', '8001.0', '8001.5', '8002.0', '8002.5', '8003.0']
Rather than doing f1 = open(sys.argv[1], 'rt') consider using with which will close the file when you are done or in case of an exception. Also, I left off rt since open() defaults to read and text mode.
Finally, this could also be written using list comprehension:
with open(sys.argv[1]) as f1:
first_col = [line.split()[0] for line in f1]
Others have already done a great job answering this question, the behavior that your seeing is because you're using list on a string. list will take any object that you can iterate over and turn it into a list -- one element at a time. This isn't really surprising except that the object doesn't even have to have an __iter__ method (which is the case with strings) -- There are a number of posts on SO about __iter__ so I won't focus on that part.
In any event, try the following code and see what it prints out:
>>> def enlighten_me(obj):
... print (list(obj))
... print (hasattr(obj))
...
>>> enlighten_me("Hello World")
>>> enlighten_me( (1,2,3,4) )
>>> enlighten_me( {'red':'wagon',1:5} )
Of course, you can try the example with sets, lists, generators ... Anything you can iterate over.
Levon posted a nice answer about how to create a column while reading your file. I will demonstrate the same thing using the built-in zip function.
rows=[]
for row in myfile:
rows.append(row.split())
#now rows is stored as [ [col1,col2,...] , [col1,col2,...], ... ]
At this point we could get the first column by (Levon's answer):
column1=[]
for row in rows:
column1.append(row[0])
or more succinctly:
column1=[row[0] for row in rows] #<-- This is called a list comprehension
But what if you want all the columns? (and what if you don't know how many columns there are?). This is a job for zip.
zip takes iterables as input and matches them up. In other words:
zip(iter1,iter2)
will take iter1[0] and match it with iter2[0], and match iter1[1] with iter2[1] and so on -- kind of like a zipper if you think about it. But, zip can take more than just 2 arguments ...
zip(iter1,iter2,iter3) #results in [ [iter1[0],iter2[0],iter3[0]] , [iter1[1],iter2[1],iter3[1]], ... ]
Now, the last piece of the puzzle that we need is argument unpacking with the star operator.
If I have a function:
def foo(a,b,c):
print a
print b
print c
I can call that function like this:
A=[1,2,3]
foo(A[0],A[1],A[2])
Or, I can call it like this:
foo(*A)
Hopefully this makes sense -- the star takes each element in the list and "unpacks" it before passing it to foo.
So, putting the pieces together (remember back to the list of rows), we can unpack the list of rows and pass it to zip which will match corresponding indices in each row (i.e. columns).
columns=zip(*rows)
Now to get the first column, we just do:
columns[0] #first column
for lists of lists, I like to think of zip(*list_of_lists) as a sort of poor-man's transpose.
Hopefully this has been helpful.

Categories