I'm absolute beginner in python, and I'd like to get field i.e. from 2nd column, 3rd row from text file like this:
176a AUGCACGUACGUA ACGUA AGUCU
156b GACUACAUGCAUG GCAUA AGCUA
172e AGCUCAGCUAGGC CGAGA CGACU
(text is separated by spaces). is there any simple way to do that?
You could split the text and have a list of lists, where each sub list is a row, then pluck whatever you need from the list using rows[row - 1][column - 1].
f = open('test.txt', 'r')
lines = f.readlines()
f.close()
rows = []
for line in lines:
rows.append(line.split(' '))
print rows[2][1]
if your file isn't too big I would read it once then split each line and get the part that I want :
with open(myfile) as file_in :
lines = file_in.readlines()
third_line = lines[2]
second_column = third_line.split(' ')[1]
print second_column
If I have a file test which contains your example data the following will doing the job:
def extract_field(data, row, col):
'''extract_field -> string
`data` must be an iterable file object or an equivalent
data structure which elements contains space delimited
fields.
`row` and `col` declares the wished field position which
will be returned. '''
# cause first list element is 0
col -= 1
# jump to requested `row`
for _ in xrange(row):
line = next(data)
# create list with space delimited elements of `line`
# and return the `col`'s element of these list
return line.split()[col]
Use it like this:
>>> with open('test') as f:
... extract_field(f, row=3, col=2)
...
'AGCUCAGCUAGGC'
Related
Goal: Open the text file. Check whether the first 3 characters of each line are the same in subsequent lines. If yes, delete the bottom one.
The contents of the text file:
cat1
dog4
cat3
fish
dog8
Desired output:
cat1
dog4
fish
Attempt at code:
line = open("text.txt", "r")
for num in line.readlines():
a = line[num][0:3] #getting first 3 characters
for num2 in line.readlines():
b = line[num2][0:3]
if a in b:
line[num2] = ""
Open the file and read one line at a time. Note the first 3 characters (prefix). Check if the prefix has been previously observed. If not, keep that line and add the prefix to a set. For example:
with open('text.txt') as infile:
out_lines = []
prefixes = set()
for line in map(str.strip, infile):
if not (prefix := line[:3]) in prefixes:
out_lines.append(line)
prefixes.add(prefix)
print(out_lines)
Output:
['cat1', 'dog4', 'fish']
Note:
Requires Python 3.8+
You can use a dictionary to store the first 3 char and then check while reading. Sample check then code below
line = open("text.txt", "r")
first_three_char_dict = {}
for num in line.readlines():
a = line[num][0:3] # getting first 3 characters
if first_three_char_dict.get(a):
line[num] = ""
else:
first_three_char_dict[a] = num
pass;
try to read line and add the word (first 3 char)into a dict. The key of dict would be the first 3 char of word and value would be the word itself. At the end you will have dict keys which are unique and their values are your desired result.
You just need to check if the data is already exist or not in temporary list
line = open("text.txt", "r")
result = []
for num in line.readlines():
data = line[num][0:3] # getting first 3 characters
if data not in result: # check if the data is already exist in list or not
result.append(data) # if the data is not exist in list just append it
any idea how should I get the largest age from the text file and print it?
The text file:
Name, Address, Age,Hobby
Abu, “18, Jalan Satu, Penang”, 18, “Badminton, Swimming”
Choo, “Vista Gambier, 10-3A-88, Changkat Bukit Gambier Dua, 11700, Penang”, 17, Dancing
Mutu, Kolej Abdul Rahman, 20, “Shopping, Investing, Youtube-ing”
This is my coding:
with open("iv.txt",encoding="utf8") as file:
data = file.read()
splitdata = data.split('\n')
I am not getting what I want from this.
This works! I hope it helps. Let me know if there are any questions.
This approach essentially assumes that values associated with Hobby do not have numbers in them.
import csv
max_age = 0
with open("iv.txt", newline = '', encoding = "utf8") as f:
# spamreader returns reader object used to iterate over lines of f
# delimiter=',' is the default but I like to be explicit
spamreader = csv.reader(f, delimiter = ',')
# skip first row
next(spamreader)
# each row read from file is returned as a list of strings
for row in spamreader:
# reversed() returns reverse iterator (start from end of list of str)
for i in reversed(row):
try:
i = int(i)
break
# ValueError raised when string i is not an int
except ValueError:
pass
print(i)
if i > max_age:
max_age = i
print(f"\nMax age from file: {max_age}")
Output:
18
17
20
Max age from file: 20
spamreader from the csv module of Python's Standard Library returns a reader object used to iterate over lines of f. Each row (i.e. line) read from the file f is returned as a list of strings.
The delimiter (in our case, ',', which is also the default) determines how a raw line from the file is broken up into mutually exclusive but exhaustive parts -- these parts become the elements of the list that is associated with a given line.
Given a raw line, the string associated with the start of the line to the first comma is an element, then the string associated with any part of the line that is enclosed by two commas is also an element, and finally the string associated with the last comma to the end of the line is also an element.
For each line/list of the file, we start iterating from the end of the list, using the reversed built-in function, because we know that age is the second-to-last category. We assume that the hobby category does not have numbers in them such that the number would appear as an element of the list for the raw line. For example, for the line associated with Abu, if instead of "Badminton, Swimming" we had "Badminton, 30, Swimming", then the code would not have the desired effect as 30 would be treated as Abu's age.
I'm sure there is a built-in feature to parse a composite string like the one you posted, but as I don't know, I've created a CustomParse class to do the job:
class CustomParser():
def __init__(self, line: str, delimiter: str):
self.line = line
self.delimiter = delimiter
def split(self):
word = ''
words = []
inside_string = False
for letter in line:
if letter in '“”"':
inside_string = not inside_string
continue
if letter == self.delimiter and not inside_string:
words.append(word.strip())
word = ''
continue
word += letter
words.append(word.strip())
return words
with open('people_data.csv') as file:
ages = []
for line in file:
ages.append(CustomParser(line, ',').split()[2])
print(max(ages[1:]))
Hope that helps.
This is data from a lab experiment (around 717 lines of data). Rather than trying to excell it, I want to import and graph it on either python or matlab. I'm new here btw... and am a student!
""
"Test Methdo","exp-l Tensile with Extensometer.msm"
"Sample I.D.","Sample108.mss"
"Speciment Number","1"
"Load (lbf)","Time (s)","Crosshead (in)","Extensometer (in)"
62.638,0.900,0.000,0.00008
122.998,1.700,0.001,0.00012
more numbers : see Screenshot of more data from my file
I just can't figure out how to read the line up until a comma. Specifically, I need the Load numbers for one of my arrays/list, so for example on the first line I only need 62.638 (which would be the first number on my first index on my list/array).
How can I get an array/list of this, something that iterates/reads the list and ignores strings?
Thanks!
NOTE: I use Anaconda + Jupyter Notebooks for Python & Matlab (school provided software).
EDIT: Okay, so I came home today and worked on it again. I hadn't dealt with CSV files before, but after some searching I was able to learn how to read my file, somewhat.
import csv
from itertools import islice
with open('Blue_bar_GroupD.txt','r') as BB:
BB_csv = csv.reader(BB)
x = 0
BB_lb = []
while x < 7: #to skip the string data
next(BB_csv)
x+=1
for row in islice(BB_csv,0,758):
print(row[0]) #testing if I can read row data
Okay, here is where I am stuck. I want to make an arraw/list that has the 0th index value of each row. Sorry if I'm a freaking noob!
Thanks again!
You can skip all lines till the first data row and then parse the data into a list for later use - 700+ lines can be easily processd in memory.
Therefor you need to:
read the file line by line
remember the last non-empty line before number/comma/dot ( == header )
see if the line is only number/comma/dot, else increase a skip-counter (== data )
seek to 0
skip enough lines to get to header or data
read the rest into a data structure
Create test file:
text = """
""
"Test Methdo","exp-l Tensile with Extensometer.msm"
"Sample I.D.","Sample108.mss"
"Speciment Number","1"
"Load (lbf)","Time (s)","Crosshead (in)","Extensometer (in)"
62.638,0.900,0.000,0.00008
122.998,1.700,0.001,0.00012
"""
with open ("t.txt","w") as w:
w.write(text)
Some helpers and the skipping/reading logic:
import re
import csv
def convert_row(row):
"""Convert one row of data into a list of mixed ints and others.
Int is the preferred data type, else string is used - no other tried."""
d = []
for v in row:
try:
# convert to int && add
d.append(float(v))
except:
# not an int, append as is
d.append(v)
return d
def count_to_first_data(fh):
"""Count lines in fh not consisting of numbers, dots and commas.
Sideeffect: will reset position in fh to 0."""
skiplines = 0
header_line = 0
fh.seek(0)
for line in fh:
if re.match(r"^[\d.,]+$",line):
fh.seek(0)
return skiplines, header_line
else:
if line.strip():
header_line = skiplines
skiplines += 1
raise ValueError("File does not contain pure number rows!")
Usage of helpers / data conversion:
data = []
skiplines = 0
with open("t.txt","r") as csvfile:
skip_to_data, skip_to_header = count_to_first_data(csvfile)
for _ in range(skip_to_header): # skip_to_data if you do not want the headers
next(csvfile)
reader = csv.reader(csvfile, delimiter=',',quotechar='"')
for row in reader:
row_data = convert_row(row)
if row_data:
data.append(row_data)
print(data)
Output (reformatted):
[['Load (lbf)', 'Time (s)', 'Crosshead (in)', 'Extensometer (in)'],
[62.638, 0.9, 0.0, 8e-05],
[122.998, 1.7, 0.001, 0.00012]]
Doku:
re.match
csv.reader
Method of file objekts (i.e.: seek())
With this you now have "clean" data that you can use for further processing - including your headers.
For visualization you can have a look at matplotlib
I would recommend reading your file with python
data = []
with open('my_txt.txt', 'r') as fd:
# Suppress header lines
for i in range(6):
fd.readline()
# Read data lines up to the first column
for line in fd:
index = line.find(',')
if index >= 0:
data.append(float(line[0:index]))
leads to a list containing your data of the first column
>>> data
[62.638, 122.998]
The MATLAB solution is less nice, since you have to know the number of data lines in your file (which you do not need to know in the python solution)
n_header = 6
n_lines = 2 % Insert here 717 (as you mentioned)
M = csvread('my_txt.txt', n_header, 0, [n_header 0 n_header+n_lines-1 0])
leads to:
>> M
M =
62.6380
122.9980
For the sake of clarity: You can also use MATLABs textscan function to achieve what you want without knowing the number of lines, but still, the python code would be the better choice in my opinion.
Based on your format, you will need to do 3 steps. One, read all lines, two, determine which line to use, last, get the floats and assign them to a list.
Assuming you file name is name.txt, try:
f = open("name.txt", "r")
all_lines = f.readlines()
grid = []
for line in all_lines:
if ('"' not in line) and (line != '\n'):
grid.append(list(map(float, line.strip('\n').split(','))))
f.close()
The grid will then contain a series of lists containing your group of floats.
Explanation for fun:
In the "for" loop, i searched for the double quote to eliminate any string as all strings are concocted between quotes. The other one is for skipping empty lines.
Based on your needs, you can use the list grid as you please. For example, to fetch the first line's first number, do
grid[0][0]
as python's list counts from 0 to n-1 for n elements.
This is super simple in Matlab, just 2 lines:
data = dlmread('data.csv', ',', 6,0);
column1 = data(:,1);
Where 6 and 0 should be replaced by the row and column offset you want. So in this case, the data starts at row 7 and you want all the columns, then just copy over the data in column 1 into another vector.
As another note, try typing doc dlmread in matlab - it brings up the help page for dlmread. This is really useful when you're looking for matlab functions, as it has other suggestions for similar functions down the bottom.
I want my program to read from a .txt file, which has data in its lines arranged like this:
NUM NUM NAME NAME NAME. How could I read its lines into a list so that each line becomes an element of the list, and each element would have its first two values as ints and the other three as strings?
So the first line from the file: 1 23 Joe Main Sto should become lst[0] = [1, 23, "Joe", "Main", "Sto"].
I already have this, but it doesn't work perfectly and I'm sure there must be a better way:
read = open("info.txt", "r")
line = read.readlines()
text = []
for item in line:
fullline = item.split(" ")
text.append(fullline)
Use str.split() without an argument to have whitespace collapsed and removed for you automatically, then apply int() to the first two elements:
with open("info.txt", "r") as read:
lines = []
for item in read:
row = item.split()
row[:2] = map(int, row[:2])
lines.append(row)
Note what here we loop directly over the file object, no need to read all lines into memory first.
with open(file) as f:
text = [map(int, l.split()[:2]) + l.split()[2:] for l in f]
I have this code to check indexes in a file to see if they match, but to start off I am having trouble being able to select an index. What do I have to do in order to be able to do so, because at this moment it doesn't show the values as being in a list.
def checkOS():
fid = open("C:/Python/NSRLOS.txt", 'r')
fhand = open("C:/Python/sha_sub_hashes.out", 'r')
sLine = fhand.readline()
line = fid.readline()
outdata = []
print line
checkOS()
Right now it prints:
"190","Windows 2000","2000","609"
I only want it to print: (so index[0])
190
And when I try index[0], I just get ' " '. So the first value in the whole string, I want a list to be able to select the index.
Try using line.split(",") to split the line by the commas, then strip out the quotation marks by slicing the result.
Example:
>>> line = '"190","Windows 2000","2000","609"'
>>> sliced = line.split(',')
>>> print sliced
['"190"', '"Windows 2000"', '"2000"', '"609"']
>>> first_item = sliced[0][1:-1]
>>> print first_item
190
...and here's the whole thing, abstracted into a function:
def get_item(line, index):
return line.split(',')[index][1:-1]
(This is assuming, of course, that all the items in the line are divided by commas, that they're all wrapped by quotation marks, that there's no spaces after the commas (although you could take care of that by doing item.strip() to remove whitespace). It also fails if the quoted items contains commas, as noted in the comments.)
And if you try using split() to split each comma and return first value? Try this.
[0] applied to a string only returns the first character.
You want the first item of a comma-separated list. You could write your own parsing code, or you could use the csv module which already handles this.
import csv
def get_first_row(fname):
with open(fname, 'rb') as inf:
incsv = csv.reader(inf)
try:
row = incsv.next()
except StopIteration:
row = [None]
return row
def checkOS():
fid = get_first_row("C:/Python/NSRLOS.txt")[0]
fhand = get_first_row("C:/Python/sha_sub_hashes.out")[0]
print fid
csv.reader would be a good start.
import csv
from itertools import izip
with open('file1.csv') as fid, open('file2.csv') as fhand:
fidcsv = csv.reader(fid)
fhandcsv = csv.reder(fhand)
for row1, row2 in izip(fidcsv, fhandcsv):
print row1, row2, row[1] # etc...
Using csv.reader will handle CSV formatted files better than pure str methods. The izip will read line1 then 2, then 3 etc.. from both files (it will stop at the shortest number of rows in the file though), then line2 from both files etc... (not sure if this is what you want though). row1 and row2 will end up being a list of columns, and then just index if row1[0] == row2[0]: or whatever logic you wish to use.