How can I read a file and convert the data to numbers?

How can I read a file and convert the data to numbers? - python

I am trying to read a txt file that contains
1,2 20000
and potentially read other text files with the same type, only with more numbers like:
1,2,3 30000
or
2,3,4,5 2000000.
with open('coordinate.txt','r') as file:
for line in file:
line = line.strip()
pieces = line.split()
data.append(pieces)
and then assign coord = data[0] and trial = data[1]
but coord becomes ["1,2"] which I just have no clue how to separate 1 and 2 by getting rid of the comma and making into numpy form. How can I appropriately read the file and assign it as the format I want it to be?

You can split on multiple delimiters by using a regular expression. Something like:
import re
txt = "1,2 20000"
txt_arr = re.split(r'[,\s]\s*', txt)
More info here

The easier way to do it is by using the split method:
pieces = []
lines = None
with open('cooridnate.txt', 'r') as fp:
lines = fp.readlines() # Get lines
for line in lines:
splitted_line = line.rstrip().split(',') # Split line with "," as a delimiter
pieces.append({
'coord': int(splited_line[0]),
'trial': splitted_line[1:]
}) # Append the splitted in pieces
print(pieces)
You may need to create a loop for trial too (if you want to convert it too).
Is it what you were looking for?

Related

read keyword in txt file, and print add text + keyword

I got many keywords in txt file to python using f = open().
And I want to add text before each keywords.
example,
(http://www.google.com/) + (abcdefg)
add text keywords imported
It have tried it, I can't result I want.
f = open("C:/abc/abc.txt", 'r')
data = f.read()
print("http://www.google.com/" + data)
f.close()
I tried it using "for".
But, I can't it.
Please let me know the solution.
many thanks.

Your original code has some flaws:
you only read the first line of the file, with data = f.read(). If you want to read all the lines from the file, use a for;
data is a str-type variable, which may have more than one word. Thus, you must split this line into words, using data.split()
To solve your problem, you need to read each line from the file, split the line into the words it has, then loop through the list with the words, add the desired text then the word itself.
The correct program is this:
f = open("C:/abc/abc.txt", 'r')
for data in f:
words = data.split()
for i in words:
print("http://www.google.com/" + i)
f.close()

with open('text.txt','r') as f:
for line in f:
print("http://www.google.com/" + line)

Nested lists in python containing a single string and not single letters

I need to load text from a file which contains several lines, each line contains letters separated by coma, into a 2-dimensional list. When I run this, I get a 2 dimensional list, but the nested lists contain single strings instead of separated values, and I can not iterate over them. how do I solve this?
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split()
matrix.append(line)
return matrix
result:
[['a,p,p,l,e'], ['a,g,o,d,o'], ['n,n,e,r,t'], ['g,a,T,A,C'], ['m,i,c,s,r'], ['P,o,P,o,P']]
I need each letter in the nested lists to be a single string so I can use them.
thanks in advance

split() function splits on white space by default. You can fix this by passing the string you want to split on. In this case, that would be a comma. The code below should work.
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split(',')
matrix.append(line)
return matrix

The input format you described conforms to CSV format. Python has a library just for reading CSV files. If you just want to get the job done, you can use this library to do the work for you. Here's an example:
Input(test.csv):
a,string,here
more,strings,here
Code:
>>> import csv
>>> lines = []
>>> with open('test.csv') as file:
... reader = csv.reader(file)
... for row in reader:
... lines.append(row)
...
>>>
Output:
>>> lines
[['a', 'string', 'here'], ['more', 'strings', 'here']]

Using the strip() function will get rid of the new line character as well:
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split(',')
line[-1] = line[-1].strip()
matrix.append(line)
return matrix

python writelines from a list made from .split()

I have a very long string with vertical and horizontal delimiters in this format:
[|Bob Hunter|555-5555|B|Polycity|AK|55555||#|Rob Punter|999-5555|B|Bolycity|AZ|55559|rpunter#email.com|#|....and so on...]
I would like to generate a list from this long string using split('#') and then write each element as a line to a new text file like so:
|Bob Hunter|555-5555|B|Polycity|AK|55555||
|Rob Punter|999-5555|B|Bolycity|AZ|55559|rpunter#email.com|
I will then import it into excel and delimit by the pipes.
f1 = open(r'C:\Documents\MyData.html','r')
f2 = open(r'C:\Documents\MyData_formatted.txt','w')
lines = f1.read().split("#")
for i in lines:
f2.writelines(i)
f2.close()
f1.close()
However, the txt file remains one line and only a partial amount of the data is written to the file (only about 25% is there). How can I get python to split the data by the # symbol and write each element of the resulting list to a file as a new line?

This is your corrected code, I changed line variable to records, because we're not dealing with lines and just to avoid confusion:
records = f1.read()
records = records[1:] # remove [
records = records[:-1] # remove ]
records = records.split("#")
for rec in records:
f2.write(rec + "\n")
And since you mentioned you need this data in excel, use csv files and from excel open your csv output file and excel will format your output as needed without you having to do that manually:
import csv
w = csv.writer(f2, dialect="excel")
records = [line.replace("|", ",") +"\n" for line in records]
for rec in records:
w.writerow([rec])

I think that before every # we should also delete |, because without that, after every splitted rocord we will get || as first characters in every line. That's why we should split |#, not only #.
Try this:
with open('input.txt','r') as f1:
text = f1.read().lstrip('[').rstrip(']').split("|#") #remove '[' and ']' from each side
with open('output.txt','w') as f2:
for line in text:
f2.write('%s\n' % line) #write to file as string with new line sign

for loop file read line and filter based on list remove unnecessary empty lines

I am reading a file and getting the first element from each start of the line, and comparing it to my list, if found, then I append it to the new output file that is supposed to be exactly like the input file in terms of the structure.
my_id_list = [
4985439
5605471
6144703
]
input file:
4985439 16:0.0719814
5303698 6:0.09407 19:0.132581
5605471 5:0.0486076
5808678 8:0.130536
6144703 5:0.193785 19:0.0492507
6368619 3:0.242678 6:0.041733
my attempt:
output_file = []
input_file = open('input_file', 'r')
for line in input_file:
my_line = np.array(line.split())
id = str(my_line[0])
if id in my_id_list:
output_file.append(line)
np.savetxt("output_file", output_file, fmt='%s')
Question is:
It is currently adding an extra empty line after each line written to the output file. How can I fix it? or is there any other way to do it more efficiently?
update:
output file should be for this example:
4985439 16:0.0719814
5605471 5:0.0486076
6144703 5:0.193785 19:0.0492507

try something like this
# read lines and strip trailing newline characters
with open('input_file','r') as f:
input_lines = [line.strip() for line in f.readlines()]
# collect all the lines that match your id list
output_file = [line for line in input_lines if line.split()[0] in my_id_list]
# write to output file
with open('output_file','w') as f:
f.write('\n'.join(output_file))

I don't know what numpy does to the text when reading it, but this is how you could do it without numpy:
my_id_list = {4985439, 5605471, 6144703} # a set is faster for membership testing
with open('input_file') as input_file:
# Your problem is most likely related to line-endings, so here
# we read the inputfile into an list of lines with intact line endings.
# To preserve the input, exactly, you would need to open the files
# in binary mode ('rb' for the input file, and 'wb' for the output
# file below).
lines = input_file.read().splitlines(keepends=True)
with open('output_file', 'w') as output_file:
for line in lines:
first_word = line.split()[0]
if first_word in my_id_list:
output_file.write(line)
getting the first word of each line is wasteful, since this:
first_word = line.split()[0]
creates a list of all "words" in the line when we just need the first one.
If you know that the columns are separated by spaces you can make it more efficient by only splitting on the first space:
first_word = line.split(' ', 1)[0]

How to get 2nd thing out of every line using python and file parsing

i'm trying to parse through a file with structure:
0 rs41362547 MT 10044
1 rs28358280 MT 10550
...
and so forth, where i want the second thing in each line to be put into an array. I know it should be pretty easy, but after a lot of searching, I'm still lost. I'm really new to python, what would be the script to do this?
THanks!

You can split the lines using str.split:
with open('file.txt') as infile:
result = []
for line in infile: #loop through the lines
data = line.split(None, 2)[1] #split, get the second column
result.append(data) #append it to our results
print data #Just confirming

This will work:
with open('/path/to/file') as myfile: # Open the file
data = [] # Make a list to hold the data
for line in myfile: # Loop through the lines in the file
data.append(line.split(None, 2)[1]) # Get the data and add it to the list
print (data) # Print the finished list
The important parts here are:
str.split, which breaks up the lines based on whitespace.
The with-statement, which auto-closes the file for you when done.
Note that you could also use a list comprehension:
with open('/path/to/file') as myfile:
data = [line.split(None, 2)[1] for line in myfile]
print (data)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I read a file and convert the data to numbers? - python

You can split on multiple delimiters by using a regular expression. Something like: import re txt = "1,2 20000" txt_arr = re.split(r'[,\s]\s*', txt) More info here

Related

read keyword in txt file, and print add text + keyword

Nested lists in python containing a single string and not single letters

python writelines from a list made from .split()

for loop file read line and filter based on list remove unnecessary empty lines

How to get 2nd thing out of every line using python and file parsing

Categories

Resources