Finding the Characters in a Specific Line in a File - python

I am trying to find the characters in one specific line of my code. Say the line is 4.
My file consists of:
1. randomname
2. randomname
3.
4. 34
5. 12202018
My code consists of:
with open('/Users/eviemcmahan/PycharmProjects/eCYBERMISSION/eviemcmahan', 'r') as my_file:
data = my_file.readline(4)
characters = 0
for data in my_file:
words = data.split(" ")
for i in words:
characters += len(i)
print(characters)
I am not getting an error, I am just getting the number "34"
I would appreciate any help on how to get a correct amount of characters for line 4.

my_file.readline(4) does not read the 4th line, instead reads the next line but only the 4 firsts characters. To read a specific line you need to, for example, read all the lines and put them in a list. Then is easy to get the line you want. You could also read line by line and stop whenever you find yourself in the line you desired.
Going with the first approach and using the count method of strings, it is straight-forward to count any character at a specific line. For example:
line_number = 3 # Starts with 0
with open('test.txt', 'r') as my_file:
lines = my_file.readlines() # List containing all the lines as elements of the list
print(lines[line_number ].count('0')) # 0
print(lines[line_number ].count('4')) # 2

Related

Error in Python Code Trying To Open and Access a File

Here is info from the .txt file I am trying to access:
Movies: Drama
Possession, 2002
The Big Chill, 1983
Crimson Tide, 1995
Here is my code:
fp = open("Movies.txt", "r")
lines = fp.readlines()
for line in lines:
values = line.split(", ")
year = int(values[1])
if year < 1990:
print(values[0])
I get an error message "IndexError: list index out of range". Please explain why or how I can fix this. Thank you!
Assuming your .txt file includes the "Movies: Drama" line, as you listed, it's because the first line of the text file has no comma in it. Therefore splitting that first line on a comma only results in 1 element (element 0), NOT 2, and therefore there is no values[1] for the first line.
It's not unusual for data files to have a header line that doesn't contain actual data. Import modules like Pandas will typically handle this automatically, but open() and readlines() don't differentiate.
The easiest thing to do is just slice your list variable (lines) so you don't include the first line in your loop:
fp = open("Movies.txt", "r")
lines = fp.readlines()
for line in lines[1:]:
values = line.split(", ")
year = int(values[1])
if year < 1990:
print(values[0])
Note the "lines[1:]" modification. This way you only loop starting from the second line (the first line is lines[0]) and go to the end.
The first line of the text file does not have a ", ", so when you split on it, you get a list of size 1. When you access the 2nd element with values[1] then you are accessing outside the length of the array, hence the IndexError. You need to do a check on the line before making the assumption about the size of the list. Some options:
Check the length of values and continue if it's too short.
Check that ', ' is in the line before splitting on it.
Use a regex which will ensure the ', ' is there as well as can ensure that the contents after the comma represent a number.
Preemptively strip off the first line in lines if you know that it's the header.
Your first line of your txt file has wrong index
Just simple change your code to:
fp = open("Movies.txt", "r")
lines = fp.readlines()
for line in lines:
try: #<---- Here
values = line.split(", ")
year = int(values[1])
if year < 1990:
print(values[0])
except: #<--------And here
pass

Read each line from a file and if that line length is smaller than 9 add that line to an array

words = []
for line in f:
if len(line) <= 9:
words.append(line)
#words = f.readlines(250000)
f.close()
return words
I am trying to read each line from a text file which contains one word. I want to be able to compare the length of that word to a condition and if it meets that condition then add it to a list to save the words that are under 9 characters long. The code should go through the entire file and the words that are under 9 characters should be added to the array called words. I tried using f.readlines()but I dont know how to filter the results as this just gives all of the words in the file.
You can use file.readlines as this:
with open('path/to/file') as f:
for line in f.readlines():
if len(line.strip()) <= 9:
words.append(line)
see that using context manager to open file is a good practice so you also dont need to close it at the end and you wont forget to :)

How to remove lines that start with the same characters (but are random) in python?

I am trying to remove lines in a file that start with the same 5 characters, however, the first 5 characters are random (I don't know what they will be)?
I have a code that reads the last 5 characters of the first line of a file and matches them to the FIRST 5 characters on a random line in the file that has the same 5 characters. The problem is, when there are two or more matches that have the same first 5 characters the code messes up. I need something that reads all the lines in the file and removes one of the two lines that have the same 5 first characters.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)
I will greatly appreciate it if you could explain how I could go about this with words as well.
You can do this for example like so:
FILE_NAME = "data.txt" # the name of the file to read in
NR_MATCHING_CHARS = 5 # the number of characters that need to match
lines = set() # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF: # open the file
for line in inF: # for every line
line = line.strip() # that is
if line == "": continue # not empty
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines): # and the beginning of this line was not printed yet
print(line) # print the line
lines.add(beginOfSequence) # remember that the beginning of the line

extract the dimensions from the head lines of text file

Please see following attached image showing the format of the text file. I need to extract the dimensions of data matrix indicated by the first line in the file, here 49 * 70 * 1 for the case shown by the image. Note that the length of name "gd_fac" can be varying. How can I extract these numbers as integers? I am using Python 3.6.
Specification is not very clear. I am assuming that the information you want will always be in the first line, and always be in parenthesis. After that:
with open(filename) as infile:
line = infile.readline()
string = line[line.find('(')+1:line.find(')')]
lst = string.split('x')
This will create the list lst = [49, 70, 1].
What is happening here:
First I open the file (you will need to replace filename with the name of your file, as a string. The with ... as ... structure ensures that the file is closed after use. Then I read the first line. After that. I select only the parts of that line that fall after the open paren (, and before the close paren ). Finally, I break the string into parts, with the character x as the separator. This creates a list that contains the values in the first line of the file, which fall between parenthesis, and are separated by x.
Since you have mentioned that length of 'gd_fac' van be variable, best solution will be using Regular Expression.
import re
with open("a.txt") as fh:
for line in fh:
if '(' in line and ')' in line:
dimension = re.findall(r'.*\((.*)\)',line)[0]
break
print dimension
Output:
'49x70x1'
What this does is it looks for "gd_fac"
then if it's there is removes all the unneeded stuff and replaces it with just what you want.
with open('test.txt', 'r') as infile:
for line in infile:
if("gd_fac" in line):
line = line.replace("gd_fac", "")
line = line.replace("x", "*")
line = line.replace("(","")
line = line.replace(")","")
print (line)
break
OUTPUT: "49x70x1"

extracting data from a text file (python)

i have two columns of numbers in a text file which is the columns of time and stress respectively which i get it from an analysis in abaqus finite element package ! i want to extract the time column and stress column in seperate lists ( a list for time and another list for stress ) . and then use this lists to do some other mathematical operations and . . .
my problem is how to create this lists ! my text file is as follows : (the first line of the text file and the four lines from the bottom of that is empty!)
X FORCE-1
0. 0.
10.E-03 98.3479E+03
12.5E-03 122.947E+03
15.E-03 147.416E+03
18.75E-03 183.805E+03
22.5E-03 215.356E+03
26.25E-03 217.503E+03
30.E-03 218.764E+03
33.75E-03 219.724E+03
37.5E-03 220.503E+03
43.125E-03 221.938E+03
51.5625E-03 228.526E+03
61.5625E-03 233.812E+03
You can read your file line by line
time = []
stress = []
count =0
with open("textfile.txt") as file:
for line in file:
line = line.strip() #removing extra spaces
temp = line.split(" ")
if count>=3 and temp[0].strip() : #checking empty string as well
time.append(temp[0].strip()) #removing extra spaces and append
stress.append(temp[len(temp)-1].strip()) #removing extra spaces and append
count+=1
print time
Output running above script
['0.', '10.E-03', '12.5E-03', '15.E-03', '18.75E-03', '22.5E-03', '26.25E-03', '30.E-03', '33.75E-03', '37.5E-03', '43.125E-03', '51.5625E-03', '61.5625E-03']

Categories