python how to split text into new list - python

Have numerous lines of text I would like to put into a list:
123456 123456 123456 234567 234567 4567890
243564 194563 432423 764575 542354 6564536
I think you get the idea. Space separated values, each value should be it's own value. 73 values per line and something like 144 lines. I know how to split based on the column:
d = list(zip(*(e.split() for e in b)))
How I split based on the row. I want d[0] = '123456,123456,123456,234567,234567,4567890'
not d[0] = '123456,243564'
The above line splits the list up the way I don't want it split up.
EXTRA: Let me add one more thing in.
The data in the list are decimal numbers. Is there a way when I go to separate out the list that is can also round the numbers.
f = np.round(float([e.split() for e in d]),2)
That only gives me the error 'float() argument must be a string or a number'

Remove the zip(); a list comprehension is enough here:
d = [e.split() for e in b]
If you need integers, you could use:
d = [[int(v) for v in e.split()] for e in b]

If you're insistent on the commas:
with open('data.txt', 'r') as f:
d = [",".join(var.rstrip().split()) for var in f.readlines()]
print(d[0])
print(d[1])
Output:
123456,123456,123456,234567,234567,4567890
243564,194563,432423,764575,542354,6564536

Related

How to extract specific data from a text file in python

I am trying to extract specific data out of a text file to use it in another function. I already looked this up and found something but it doesn't work although it seems like it should work. Is there anything I do wrong or is there a better way to do this? I am basically trying to extract the first column of data in the text file the "distances", without the km of course.
This is the text file:
Distances Times Dates Total distance & time
00 km 00:00:00 h 0000-00-00 00 km ; 00:00:00 h
28 km 01:30:21 h 2020-3-2 28 km ; 01:30:21 h
50 km 02:12:18 h 2020-4-8 78 km ;
This is the code:
all_distances = []
with open("Bike rides.txt", "r") as f:
lines = f.readlines()
for l in lines[1:]:
all_distances.append(l.split()[0])
print(all_distances)
The error I get is this:
IndexError: list index out of range
try this one:
#!/usr/bin/env python3
all_distances = []
for l in open ( "rides.txt" ).readlines () [ 1: ]:
l = l.split ( " " ) [ 0 ]
all_distances.append ( l )
print(all_distances)
Considering you have whitespace delimiters, you can separate the columns using string.split() method. Below is an example of its application.
column = 0 # First column
with open("data.txt") as file:
data = file.readlines()
columns = list(map(lambda x: x.strip().split()[column], data))
The error i get is this: "IndexError: list index out of range"
This suggest problem with blank line(s), so .split() give empty list, to ignore such lines in place:
all_distances.append(l.split()[0])
do:
splitted = l.split()
if splitted:
all_distances.append(splitted[0])
Explanation: in python empty lists are considersing false-y and non-empty truth-y, thus code inside if block will execute if list has at least one element.

Get Number after a text in Python

I need some help with getting numbers after a certain text
For example, I have a list:
['Martin 9', ' Leo 2 10', ' Elisabeth 3']
Now I need to change the list into variables like this:
Martin = 9
Leo2 =10
Elisabeth = 3
I didn't tried many things, because I'm new to Python.
Thanks for reading my question and probably helping me
I suppose you get some list with using selenium so getting integer from list works without using re like :
list = ["Martin 9","Leo 10","Elisabeth 3"]
for a in list:
print ''.join(filter(str.isdigit, a))
Output :
9
10
3
Lets assume you loaded it from a file and got it line by line:
numbers = []
for line in lines:
line = line.split()
if line[0] == "Martin": numbers.append(int(line[1]))
I don't think you really want a bunch of variables, especially if the list grows large. Instead I think you want a dictionary where the names are the key and the number is the value.
I would start by splitting your string into a list as described in this question/answer.
s=
"Martin 9
Leo 10
Elisabeth 3"
listData = data.splitlines()
Then I would turn that list into a dictionary as described in this question/answer.
myDictionary = {}
for listItem in listData:
i = listItem.split(' ')
myDictionary[i[0]] = int(i[1])
Then you can access the number, in Leo's case 10, via:
myDictionary["Leo"]
I have not tested this syntax and I'm not used to python, so I'm sure a little debugging will be involved. Let me know if I need to make some corrections.
I hope that helps :)
s="""Leo 1 9
Leo 2 2"""
I shortened the list here
look for the word in the list:
if "Leo 1" in s:
then get the index of it:
i = s.index("Leo 1")
then add the number of character of the word
lenght=len("Leo 1")
And then add it: (+1 because of the space)
b = i + lenght + 1
and then get the number on the position b:
x = int(s[b]
so in total:
if "Leo 1" in s:
i = s.index("Leo 1")
lenght = len("Leo 1")
b = i + lenght + 1
x = int(s[b])

tricky string parsing with python

I have a text file like this:
ID = 31
Ne = 5122
============
List of 104 four tuples:
1 2 12 40
2 3 4 21
.
.
51 21 41 42
ID = 34
Ne = 5122
============
List of 104 four tuples:
3 2 12 40
4 3 4 21
.
.
The four-tuples are tab delimited.
For each ID, I'm trying to make a dictionary with the ID being the key and the four-tuples (in list/tuple form) as elements of that key.
dict = {31: (1,2,12,40),(2,3,4,21)....., 32:(3,2,12,40), (4,3,4,21)..
My string parsing knowledge is limited to adding using a reference object for file.readlines(), using str.replace() and str.split() on 'ID = '. But there has to be a better way. Here some beginnings of what I have.
file = open('text.txt', 'r')
fp = file.readlines()
B = [];
for x in fp:
x.replace('\t',',')
x.replace('\n',')')
B.append(x)
something like this:
ll = []
for line in fp:
tt = tuple(int(x) for x in line.split())
ll.append(tt)
that will produce a list of tuples to assign to the key for your dictionary
Python's great for this stuff, why not write up a 5-10 liner for it? It's kind of what the language is meant to excel at.
$ cat test
ID = 31
Ne = 5122
============
List of 104 four tuples:
1 2 12 40
2 3 4 21
ID = 34
Ne = 5122
============
List of 104 four tuples:
3 2 12 40
4 3 4 21
data = {}
for block in open('test').read().split('ID = '):
if not block:
continue #empty line
lines = block.split('\n')
ID = int(lines[0])
tups = map(lambda y: int(y), [filter(lambda x: x, line.split('\t')) for line in lines[4:]])
data[ID] = tuple(filter(lambda x: x, tups))
print(data)
# {34: ([3, 2, 12, 40], [4, 3, 4, 21]), 31: ([1, 2, 12, 40], [2, 3, 4, 21])}
Only annoying thing is all the filters - sorry, that's just the result of empty strings and stuff from extra newlines, etc. For a one-off little script, it's no biggie.
I think this will do the trick for you:
import csv
def parse_file(filename):
"""
Parses an input data file containing tags of the form "ID = ##" (where ## is a
number) followed by rows of data. Returns a dictionary where the ID numbers
are the keys and all of the rows of data are stored as a list of tuples
associated with the key.
Args:
filename (string) name of the file you want to parse
Returns:
my_dict (dictionary) dictionary of data with ID numbers as keys
"""
my_dict = {}
with open(filename, "r") as my_file: # handles opening and closing file
rows = my_file.readlines()
for row in rows:
if "ID = " in row:
my_key = int(row.split("ID = ")[1]) # grab the ID number
my_list = [] # initialize a new data list for a new ID
elif row != "\n": # skip rows that only have newline char
try: # if this fails, we don't have a valid data line
my_list.append(tuple([int(x) for x in row.split()]))
except:
my_dict[my_key] = my_list # stores the data list
continue # repeat until done with file
return my_dict
I made it a function so that you can it from anywhere, just passing the filename. It makes assumptions about the file format, but if the file format is always what you showed us here, it should work for you. You would call it on your data.txt file like:
a_dictionary = parse_file("data.txt")
I tested it on the data that you gave us and it seems to work just fine after deleting the "..." rows.
Edit: I noticed one small bug. As written, it will add an empty tuple in place of a new line character ("\n") wherever that appears alone on a line. To fix this, put the try: and except: clauses inside of this:
elif row != "\n": # skips rows that only contain newline char
I added this to the full code above as well.

python regex to construct a structured data structure

I have some data which looks like:
key abc key
value 1
value 2
value 3
key bcd key
value 2
value 3
value 4
...
...
Based on it, what I want is to construct a data structure like:
{'abc':[1,2,3]}
{'bcd':[2,3,4]}
...
Is regular expression a good choice to do that? If so, how to write the regular expression so that the process behaves like a for loop (inside the loop, I can do some job to construct a data structure with the data I got) ?
Thanks.
Using regular expression can be more robost relative to using string slicing to identify values in text file. If you have confidence in the format of your data, using string slicing will be fine.
import re
keyPat = re.compile(r'key (\w+) key')
valuePat = re.compile(r'value (\d+)')
result = {}
for line in open('data.txt'):
if keyPat.search(line):
match = keyPat.search(line).group(1)
tempL = []
result[match] = tempL
elif valuePat.search(line):
match = valuePat.search(line).group(1)
tempL.append(int(match))
else:
print('Did not match:', line)
print(result)
x="""key abc key
value 1
value 2
value 3
key bcd key
value 2
value 3
value 4"""
j= re.findall(r"key (.*?) key\n([\s\S]*?)(?=\nkey|$)",x)
d={}
for i in j:
k=map(int,re.findall(r"value (.*?)(?=\nvalue|$)",i[1]))
d[i[0]]=k
print d
The following code should work if the data is always in that format.
str=""
with open(FILENAME, "r") as f:
str =f.read()
regex = r'key ([^\s]*) key\nvalue (\d)+\nvalue (\d)+\nvalue (\d+)'
matches=re.findall(regex, str)
dic={}
for match in matches:
dic[match[0]] = map(int, match[1:])
print dic
EDIT: The other answer by meelo is more robust as it handles cases where values might be more or less than 3.

Splitting a list and then splitting the last element of that list

For my assignment, I have to split a list twice:
I need to split the address string from the input line using ’+’, and then split the last part of the resulting list at the ’,’
in_file = open('yelp-short.txt')
def parse_line(text_file):
a = text_file.strip('\n')
b = a.split('+')
c = b.split(',')
print c
I get the error: 'list' object has no attribute 'split'
What other methods could I use to do this?
The hint is that you split the last part of the resulting list.
Therefore, you want to pull out the last part and split it:
def parse_line(line):
line = line.strip('\n')
parts = line.split('+')
addrs = parts[-1].split(',')
I would rpartition:
>>> 'a+b+c,d,e'.rpartition('+')[-1].split(',')
['c', 'd', 'e']
The problem is that you are trying to split up a list, not a string. You need to get a particular item out of that list:
b = a.split('+')
c = b[-1].split(',')
You apply split on strings, and it results in a list. Thus, a is a string, b is a list. You can't split a list. Let's say a is "X+Y,Z". b will be the list ["X", "Y,Z"]. What you want to split is the 1st (normal people's 2nd) element of the list b - b[1].split(','). This way there is no error. You can also say "last", by saying b[-1]. It is the same element.

Categories