Find numbers of 8 characters inside a txt file - python

I have a text file of the size of 1 megabyte which contains some numeric strings and some letters strings with the characters length of 3, 5, 9, 8, 10. How can I find all the numbers that have only the length of 8 characters? And after finding the numbers with the length of 8 characters these numbers must be extracted and saved in this file extracted.txt . How i can do this?
Example...
file.txt
91664356
1665
00643
qouytyi
15790008
1567065
abcdeigf
qoiyytgxf
931467846
00851685
150033561246788
074226899
extracted.txt
91664356
15790008
15670654
00851685

Use -
with open('data.txt', 'r') as myfile:
data=myfile.read()
numbers = re.findall(r'\D(\d{8})\D', data)
It will catch numbers that have 8-length - which does not includes numbers like 478319.3
It will output a list of such numbers.
Example
Let
123.32 is a good number 12 also 12345678 478319.3
be the contents of the file.
Output will be -
['12345678']

source file like this?
12345678 123456789 1234567 abcdefg abcdefgh abcdefghi
then,maybe this script can help you
f = open('stack.txt')
s = open('save.txt','w')
for i in f.read().split(' '):
if(len(i)!=8):
continue
else:
try:
print(int(i))#if not number,int() will make an error
s.write(i)
except:
pass
f.close()
s.close()

Related

Formatted strings, decimals and commas question

I have a .txt file that I read in and wish to create formatted strings using these values. Columns 3 and 4 need decimals and the last column needs a percent sign and 2 decimal places. The formatted string will say something like "The overall attendance at Bulls was 894659, average attendance was 21,820 and the capacity was 104.30%’
the shortened .txt file has these lines:
1 Bulls 894659 21820 104.3
2 Cavaliers 843042 20562 100
3 Mavericks 825901 20143 104.9
4 Raptors 812863 19825 100.1
5 NY_Knicks 812292 19812 100
So far my code looks like this and its mostly working, minus the commas and decimal places.
file_1 = open ('basketball.txt', 'r')
count = 0
list_1 = [ ]
for line in file_1:
count += 1
textline = line.strip()
items = textline.split()
list_1.append(items)
print('Number of teams: ', count)
for line in list_1:
print ('Line: ', line)
file_1.close()
for line in list_1: #iterate over the lines of the file and print the lines with formatted strings
a, b, c, d, e = line
print (f'The overall attendance at the {b} game was {c}, average attendance was {d}, and the capacity was {e}%.')
Any help with how to format the code to show the numbers with commas (21820 ->21,828) and last column with 2 decimals and a percent sign (104.3 -> 104.30%) is greatly appreciated.
You've got some options for how to tackle this.
Option 1: Using f strings (Python 3 only)
Since your provided code already uses f strings, this solution should work for you. For others reading here, this will only work if you are using Python 3.
You can do string formatting within f strings, signified by putting a colon : after the variable name within the curly brackets {}, after which you can use all of the usual python string formatting options.
Thus, you could just change one of your lines of code to get this done. Your print line would look like:
print(f'The overall attendance at the {b} game was {int(c):,}, average attendance was {int(d):,}, and the capacity was {float(e):.2f}%.')
The variables are getting interpreted as:
The {b} just prints the string b.
The {int(c):,} and {int(d):,} print the integer versions of c and d, respectively, with commas (indicated by the :,).
The {float(e):.2f} prints the float version of e with two decimal places (indicated by the :.2f).
Option 2: Using string.format()
For others here who are looking for a Python 2 friendly solution, you can change the print line to the following:
print("The overall attendance at the {} game was {:,}, average attendance was {:,}, and the capacity was {:.2f}%.".format(b, int(c), int(d), float(e)))
Note that both options use the same formatting syntax, just the f string option has the benefit of having you write your variable name right where it will appear in the resulting printed string.
This is how I ended up doing it, very similar to the response from Bibit.
file_1 = open ('something.txt', 'r')
count = 0
list_1 = [ ]
for line in file_1:
count += 1
textline = line.strip()
items = textline.split()
items[2] = int(items[2])
items[3] = int(items[3])
items[4] = float(items[4])
list_1.append(items)
print('Number of teams/rows: ', count)
for line in list_1:
print ('Line: ', line)
file_1.close()
for line in list_1:
print ('The overall attendance at the {:s} games was {:,}, average attendance was {:,}, and the capacity was {:.2f}%.'.format(line[1], line[2], line[3], line[4]))

Why the size of this binary files are equal although they should not?

By writing simple python script, I encoutered a weird problem: Two files with a different content have same size.
So, I have a two same list of some binary data, one in string, one in int:
char_list = '10101010'
int_list = [1, 0, 1, 0, 1, 0, 1, 0]
Then, I convert lists to bytearray:
bytes_from_chars = bytearray(char_list, "ascii")
bytes_from_ints = bytearray(int_list)
Printing this out, give me this result:
bytearray(b'10101010')
bytearray(b'\x01\x00\x01\x00\x01\x00\x01\x00')
but, this is ok.
Writing this data to disk:
with open("from_chars.hex", "wb") as f:
f.write(bytes_from_chars)
with open("from_ints.hex", "wb") as f:
f.write(bytes_from_ints)
And the size of files are same, but files contains different data!
ls -l:
hexdump of files:
And my question is, why the size of file are equal? As I now, to write value of 0 or 1 we need 1 bit, and to write hex value of 30 or 31 we need 5 bits (1 1110 and 1 1111)
To write the value of 0 or 1 you do not need a single bit. How could you tell the difference between 3 = 11 or having two 1?
You are writing in both cases an array of 8 bytes, Just in the first case your using the whole byte to write the char.
Think of it as writing a word from the letters 0 and 1, the word 1 is 0000 0001 , Without the 0s in the start, you wont be able to tell what the word is.

Looking at a list of numbers and getting that number from another file>

I don't really know how to word the question, but I have this file with a number and a decimal next to it, like so(the file name is num.txt):
33 0.239
78 0.298
85 1.993
96 0.985
107 1.323
108 1.000
I have this string of numbers that I want to find the certain numbers from the file, take the decimal numbers, and append it to a list:
['78','85','108']
Here is my code so far:
chosen_number = ['78','85','108']
memory_list = []
for line in open(path/to/num.txt):
checker = line[0:2]
if not checker in chosen_number: continue
dec = line.split()[-1]
memory_list.append(float(dec))
The error they give to me is that it is not in a list and they only account for the 3 digit numbers. I don't really understand why this is happening and would like some tips to know how to fix it. Thanks.
As for the error, there is no actual error. The only problem is that they ignore the two digit numbers and only get the three digit numbers. I want them to get both the 2 and 3 digit numbers. For example, the script would pass 78 and 85, going to the line with '108'.
Your checker is undefined. The below code works.
N.B. I have used startswith because, the number might appear elsewhere in the line.
chosen_number = ['78','85','108']
memory_list = []
with open('path/to/num.txt') as f:
for line in f:
if any(line.startswith(i) for i in chosen_number):
memory_list.append(float(line.split()[1]))
print(memory_list)
Output:
[0.298, 1.993, 1.0]
The following would should work:
chosen_number = ['78','85','108']
memory_list = []
with open('num.txt') as f_input:
for line in f_input:
v1, v2 = line.split()
if v1 in chosen_number:
memory_list.append(float(v2))
print memory_list
Giving you:
[0.298, 1.993, 1.0]
Also, it is better to use a with statement when dealing with files so that the file is automatically closed afterwards.
Try to use this code:
chosen_number = ['78 ', '85 ', '108 ']
memory_list = []
for line in open("num.txt"):
for num in chosen_number:
if num in line:
dec = line.split()[-1]
memory_list.append(float(dec))
In chosen number, I declared numbers with a space after: '85 '. Otherwise when 0.985 is found, the if condition would be true, as they're used as string. I hope, I'm clear enough.

How does len function actually work for files?

The python docs say: Return the length (the number of items) of an object. The argument may be a sequence (string, tuple or list) or a mapping (dictionary).
Code:
from sys import argv
script, from_file = argv
input = open(from_file)
indata = input.read()
print "The input file is %d bytes long" % len(indata)
Contents of the file:
One two three
Upon running this simple program I get as output: The input file is 14 bytes long
Qutestion:
I don't understand, if my file has written in it only 11 characters(One two three) how can len return me 14 bytes and not just simply 11?(what's with the bytes by the way?) In the python interpreter if I type s = "One two three" and then len(s) I get 13, so I am very confused.
"One two three" is indeed 13 chars (11 letters + 2 spaces).
>>> open("file.txt", 'w').write("one two three")
>>> len(open("file.txt").read())
13
Most likely you have an extra char for the endline, which explains the 14.
One two three
one = 3 characters
two = 3 characters
three = 5 characters
and than you have two spaces. So a total of 13 characters.
when reading from file there is an extra space in your file so 14 characters.
In your python interpreter do this:
st = " "
len(st)
output = 1
I used your code and created file similar to your by content. Result of running: indeed you have extra non-printable character in your "One two three" file. It's the only explanation. Space and line break - most obvious things to look up for.

Read integers from file using struct.unpack in Python

Suppose I have a file name num.txt as below:
1 2 3 4 5
6 7 8 9 0
I want to read 3 integers from this file, that is 1 2 3.
I know that struct.unpack might do the trick, but I just cannot get it right.
Here is how I did it:
fp = open('num.txt', 'rb')
print struct.unpack('iii', fp.read(12)) #right?
Anyone can help me with this?
PS
This is how I got file num.txt:
fp = open('num.txt', 'wb')
fp.write('1 2 3 4 5\n6 7 8 9 0')
fp.close()
You don't use struct to read numbers from a text file. It is for reading data from a binary file -- where the first byte is actually 0x01 rather than a byte order mark or the encoded value of the character '1'.
You just want
three_ints = [int(x) for x in numfile.readline().strip().split(' ')[:3]]
If you're only interested in the first three numbers, or
all_ints = [[int(x) for x in line.split()] for line in numfile]
if you want a list of lists of the ints on each line.
struct is used for C-style binary representations of numbers. If you have text representations instead then you should just pass them to int().
>>> [int(x) for x in '1 2 3 4 5'.split()]
[1, 2, 3, 4, 5]

Categories