Removing selected characters from text file - python

I have long a text file where each line looks something like /MM0001 (Table(12,)) or /MM0015 (Table(11,)). I want to keep only the four-digit number next to /MM. If it weren't for the "table(12,)" part I could just strip all the non-numeric characters, but I don't know how to extract the four-digit numbers only. Any advice on getting started?

If it's exactly that format, you could just print out line[3:7]

You could parse text line by line and then use 4th to 7th char of every line.
ln[3:7]

import re
R=re.compile(r'/MM(\d+)')
for line in file:
L=R.match(line)
if L:
print L.group(1)
or, more succinctly...
lines=[R.match(line).group(1) for line in file] #works if the lines are guaranteed to start with \MM
This should give you only the integers following a /MM and should work no matter how long the strings of integers are. If they're guaranteed to be a certain length, then you're better off with one of the other examples (which don't use regex).

if each line starts with /MM then just go through the file and print out line[3:7] e.g.
for line in file:
print line[3:7]

Related

Amend list from file - Correct syntax and file format?

I currently have a list hard coded into my python code. As it keeps expanding, I wanted to make it more dynamic by reading the list from a file. I have read through many articles about how to do this, but in practice I can't get this working. So firstly, here is an example of the existing hardcoded list:
serverlist = []
serverlist.append(("abc.com", "abc"))
serverlist.append(("def.com", "def"))
serverlist.append(("hji.com", "hji"))
When I enter the command 'print serverlist' the output is shown below and my list works perfectly when I access it:
[('abc.com', 'abc'), ('def.com', 'def'), ('hji.com', 'hji')]
Now I've replaced the above code with the following:
serverlist = []
with open('/server.list', 'r') as f:
serverlist = [line.rstrip('\n') for line in f]
With the contents of server.list being:
'abc.com', 'abc'
'def.com', 'def'
'hji.com', 'hji'
When I now enter the command print serverlist, the output is shown below:
["'abc.com', 'abc'", "'def.com', 'def'", "'hji.com', 'hji'"]
And the list is not working correctly. So what exactly am I doing wrong? Am I reading the file incorrectly or am I formatting the file incorrectly? Or something else?
The contents of the file are not interpreted as Python code. When you read a line in f, it is a string; and the quotation marks, commas etc. in your file are just those characters as parts of a string.
If you want to create some other data structure from the string, you need to parse it. The program has no way to know that you want to turn the string "'abc.com', 'abc'" into the tuple ('abc.com', 'abc'), unless you instruct it to.
This is the point where the question becomes "too broad".
If you are in control of the file contents, then you can simplify the data format to make this more straightforward. For example, if you just have abc.com abc on the line of the file, so that your string ends up as 'abc.com abc', you can then just .split() that; this assumes that you don't need to represent whitespace inside either of the two items. You could instead split on another character (like the comma, in your case) if necessary (.split(',')). If you need a general-purpose hammer, you might want to look into JSON. There is also ast.literal_eval which can be used to treat text as simple Python literal expressions - in this case, you would need the lines of the file to include the enclosing parentheses as well.
If you are willing to let go of the quotes in your file and rewrite it as
abc.com, abc
def.com, def
hji.com, hji
the code to load can be reduced to a one liner using the fact that files are iterables
with open('servers.list') as f:
servers = [tuple(line.split(', ')) for line in f]
Remember that using a file as an iterator already strips off the newlines.
You can allow arbitrary whitespace by doing something like
servers = [tuple(word.strip() for word in line.split(',')) for line in f]
It might be easier to use something like regex to parse the original format. You could use an expression that captures the parts of the line you care about and matches but discards the rest:
import re
pattern = re.compile('\'(.+)\',\\s*\'(.+)\'')
You could then extract the names from the matched groups
with open('servers.list') as f:
servers = [pattern.fullmatch(line).groups() for line in f]
This is just a trivialized example. You can make it as complicated as you wish for your real file format.
Try this:
serverlist = []
with open('/server.list', 'r') as f:
for line in f:
serverlist.append(tuple(line.rstrip('\n').split(',')))
Explanation
You want an explicit for loop so you cycle through each line as expected.
You need list.append for each line to append to your list.
You need to use split(',') in order to split by commas.
Convert to tuple as this is your desired output.
List comprehension method
The for loop can be condensed as below:
with open('/server.list', 'r') as f:
serverlist = [tuple(line.rstrip('\n').split(',')) for line in f]

Stuff spaces at end of lines in file

I am trying to read a fixed with file that has lines/records of different lengths. I need to stuff spaces at the end of the lines which are less than the standard length specified.
Any help appreciated.
enter image description here
You can use string.format to pad a string to a specific length.
The documentation says that < pads to the right so to pad a string with spaces to the right to a specific length you can do something like this:
>>> "{:<30}".format("foo")
'foo '
You could consider to use ljust string method.
If line is a line read from your file:
line = line.ljust(50)
will stuff the end of the line with spaces to get a 50 characters long line. If line is longer that 50, line is simply copied without any change.

Adding numbers from a file to a list

Ok so I have a .txt file wich I need to add the contents on it to a list, the problem is that there is only one character per row, for example, if I need to have "2+3", in the .txt it would look like this:
2
+
3
and then I have to add it to a list in order for it to look like this [2,+,3]
In the code I have right now it adds the contents, in string and adds up a "\n" at the end of every list element.I can't find a way to make it so that it adds the character as a int and without the \n.
This is the code:
def readlist():
count=0
file=open("readfile.txt","r")
list1=[]
line=file.readlines()
list1.append(line)
print(list1)
file.close
(the file is reading has 1(2+3) into it)
thanks in advance for the help
The safest way is to use a try/except:
out = []
with open("in.txt") as f:
for line in f:
try:
out.append(int(line))
except ValueError:
out.append(line.rstrip())
print(out)
[2, '+', 3]
You don't need to strip whitespace or newline characters when casting to int, python is forgiving in that regard so we only need rstrip he new line when we catch an exception because then we have an operator.
Also with will automatically close your files, something you are actually not doing in your own code as your are missing parens to call the method file.close should be file.close()
This problem can be fixed with a few additions.
First every line has a \n in it's string because it's a new line in the file. To remove this you can use the rstrip method explained here very well on how it works.
From here you're going to want to convert the string into a int using int(line). This will turn the line into a integer that you can then add to your list as wanted.
The problem now is going to be choosing which line to convert into an int and which ones are arithmetic operations such as the + you have in your example file.
u can do a
line.split('\n')

Python: Read in Data from File

I have to read data from a text file from the command line. It is not too difficult to read in each line, but I need a way to separate each part of the line.
The file contains the following in order for several hundred lines:
String (Sometimes more than 1 word)
Integer
String (Sometimes more than 1 word)
Integer
So for example the input could have:
Hello 5 Sample String 10
The current implementation I have for reading in each line is as follows... how can I modify it to separate it into what I want? I have tried splitting the line, but I always end up getting only one character of the first string this way with no integers or any part of the second string.
with open(sys.argv[1],"r") as f:
for line in f:
print(line)
The desired output would be:
Hello
5
Sample String
10
and so on for each line in the file. There could be thousands of lines in the file. I just need to separate each part so I can work with them separately.
The program can't magically split lines the way you want. You will need to read in one line at a time and parse it yourself based on the format.
Since there are two integers and an indeterminate number of (what I assume are) space-delimited words, you may be able to use a regular expression to find the integers then use them as delimiters to split up the line.

Python 2 - How do I import a text file containing a long sequence of digits and convert it to a string of individual numbers?

I want to take a text file that is a long sequence of digits with line breaks,
i.e. something like
38482406847387
85869153438194
96531040384827
43157689643163
but much larger, and convert it to a string that would just read
[3,8,4,8,...,1,6,3]
so that I can iterate over it, manipulate it, visualise it and so on.
I have had a look at the open() function but so far I can only get it to break up the file into separate lines. I know I can use a for loop to go through the giant string of the whole document and form a list that way, but then I get '/n' and spaces showing up everywhere, which is undesirable.
For context, I grabbed a text file from the web of some preposterous number of digits of pi, and I thought it would be instructive and interesting to go through it and look for patterns, plot the distribution of digits, convert to ASCII and other such nonsense. I figured it would be a fun way for me to learn a bit more about Python.
import re
print re.findall('\d', open('file.txt', 'r').read())
with open("/path/to/file") as f:
print [int(x) for x in f.read() if x.isdigit()]
This is shorter.
If you get all the numbers as a string then you can use
# here, digits is the numbers as a string including the \n character
list = [digit for digit in digits.replace('\n', '')]
For small files:
with open('path/to/file') as infile:
answer = list(int(i) for i in ''.join(line.strip() for line in infile))
For larger files:
answer = []
with open('path/to/file') as infile:
for line in infile:
answer.extend([int(i) for i in line.strip()])

Categories