I am relatively new to python and got stuck on the below:
Below is the code I am working with
import re
handle = open ('RegExWeek2.txt')
for line in handle:
line = line.rstrip()
x = re.findall('[0-9]+', line)
if len(x) > 0:
print x
The return from this code looks like this:
['7430']
['9401', '9431']
['2248', '2047']
['5517']
['3184', '1241']
['9939']
['2185', '9450', '8428']
['369']
['3683', '6442', '7654']
Question: how do I combine this to one list and sum up the numbers?
Please help
You may change your code like this,
handle = open ('RegExWeek2.txt')
num = []
for line in handle:
num.extend(re.findall('[0-9]+', line))
print sum(int(i) for i in num)
Since you're using re.findall, this line.rstrip() line is not necessary.
And also there won't be possible for x to be an empty list, since we are using + next to [0-9] (repeats the previous token one or more times) not * (zero or more times)
There's no need to rstrip, and you should open files using with:
import re
all_numbers = []
with open('RegExWeek2.txt') as file:
for line in file:
numbers = re.findall('[0-9]+', line)
for number in numbers:
all_numbers.append(int(number))
print(sum(all_numbers))
This is really beginner code, and a direct translation of yours. Here's how I would write it:
with open('RegExWeek2.txt') as file:
all_numbers = [int(num) for num in re.findall('[0-9]+', file.read())]
print(sum(all_numbers))
Related
I'm a total noob to Python and need some help with my code.
The code is meant to take Input.txt [http://pastebin.com/bMdjrqFE], split it into seperate Pokemon (in a list), and then split that into seperate values which I use to reformat the data and write it to Output.txt.
However, when I run the program, only the last Pokemon gets outputted, 386 times. [http://pastebin.com/wkHzvvgE]
Here's my code:
f = open("Input.txt", "r")#opens the file (input.txt)
nf = open("Output.txt", "w")#opens the file (output.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
num = 0
tab = """ """
newl = """NEWL
"""
slash = "/"
while num != 386:
current = pokeData
current.append(line)
print current[num]
for tab in current:
words = tab.split()
print words
for newl in words:
nf.write('%s:{num:%s,species:"%s",types:["%s","%s"],baseStats:{hp:%s,atk:%s,def:%s,spa:%s,spd:%s,spe:%s},abilities:{0:"%s"},{1:"%s"},heightm:%s,weightkg:%s,color:"Who cares",eggGroups:["%s"],["%s"]},\n' % (str(words[2]).lower(),str(words[1]),str(words[2]),str(words[3]),str(words[4]),str(words[5]),str(words[6]),str(words[7]),str(words[8]),str(words[9]),str(words[10]),str(words[12]).replace("_"," "),str(words[12]),str(words[14]),str(words[15]),str(words[16]),str(words[16])))
num = num + 1
nf.close()
f.close()
There are quite a few problems with your program starting with the file reading.
To read the lines of a file to an array you can use file.readlines().
So instead of
f = open("Input.txt", "r")#opens the file (input.txt)
pokeData = []
for line in f:
#print "%r" % line
pokeData.append(line)
You can just do this
pokeData = open("Input.txt", "r").readlines() # This will return each line within an array.
Next you are misunderstanding the uses of for and while.
A for loop in python is designed to iterate through an array or list as shown below. I don't know what you were trying to do by for newl in words, a for loop will create a new variable and then iterate through an array setting the value of this new variable. Refer below.
array = ["one", "two", "three"]
for i in array: # i is created
print (i)
The output will be:
one
two
three
So to fix alot of this code you can replace the whole while loop with something like this.
(The code below is assuming your input file has been formatted such that all the words are split by tabs)
for line in pokeData:
words = line.split (tab) # Split the line by tabs
nf.write ('your very long and complicated string')
Other helpers
The formatted string that you write to the output file looks very similar to the JSON format. There is a builtin python module called json that can convert a native python dict type to a json string. This will probably make things alot easier for you but either way works.
Hope this helps
The basic outline of this problem is to read the file, look for integers using the re.findall(), looking for a regular expression of '[0-9]+' and then converting the extracted strings to integers and summing up the integers.
I have finished the problem, but I would like to go extra and condense the code down to two lines.
This is my original code:
import re
fh = raw_input("Enter filename: ")
#returns regex_sum_241882.txt as default when nothing is entered
if len(fh)<1 : fh = "regex_sum_241882.txt"
file = open(fh)
sums = list()
#goes through each line in the file
for line in file:
#finds the numbers in each line and puts them in a list
nums = re.findall('[0-9]+',line)
#adds the numbers to an existing list
for num in nums:
sums.append(int(num))
#sums the list
print sum(sums)
Now here's my current compact code:
import re
lst = list()
print sum(for num in re.findall('[0-9]+',open("regex_sum_241882.txt").read())): int(num))
It doesn't work and gives me SyntaxError: invalid syntax
Can anyone point me in the right direction?
I feel I'm doing the same thing, but I'm not sure what the syntaxerror is about.
Try this way:
print sum(int(num) for num in re.findall('[0-9]+', open("regex_sum_241882.txt").read()))
In Python, I'm reading a large file with many many lines. Each line contains a number and then a string such as:
[37273738] Hello world!
[83847273747] Hey my name is James!
And so on...
After I read the txt file and put it into a list, I was wondering how I would be able to extract the number and then sort that whole line of code based on the number?
file = open("info.txt","r")
myList = []
for line in file:
line = line.split()
myList.append(line)
What I would like to do:
since the number in message one falls between 37273700 and 38000000, I'll sort that (along with all other lines that follow that rule) into a separate list
This does exactly what you need (for the sorting part)
my_sorted_list = sorted(my_list, key=lambda line: int(line[0][1:-2]))
Use tuple as key value:
for line in file:
line = line.split()
keyval = (line[0].replace('[','').replace(']',''),line[1:])
print(keyval)
myList.append(keyval)
Sort
my_sorted_list = sorted(myList, key=lambda line: line[0])
How about:
# ---
# Function which gets a number from a line like so:
# - searches for the pattern: start_of_line, [, sequence of digits
# - if that's not found (e.g. empty line) return 0
# - if it is found, try to convert it to a number type
# - return the number, or 0 if that conversion fails
def extract_number(line):
import re
search_result = re.findall('^\[(\d+)\]', line)
if not search_result:
num = 0
else:
try:
num = int(search_result[0])
except ValueError:
num = 0
return num
# ---
# Read all the lines into a list
with open("info.txt") as f:
lines = f.readlines()
# Sort them using the number function above, and print them
lines = sorted(lines, key=extract_number)
print ''.join(lines)
It's more resilient in the case of lines without numbers, it's more adjustable if the numbers might appear in different places (e.g. spaces at the start of the line).
(Obligatory suggestion not to use file as a variable name because it's a builtin function name already, and that's confusing).
Now there's an extract_number() function, it's easier to filter:
lines2 = [L for L in lines if 37273700 < extract_number(L) < 38000000]
print ''.join(lines2)
basically i'm trying to complete a read file. i have made the "make" file that will generate 10 random numbers and write it to a text file. here's what i have so far for the "read" file...
def main():
infile = open('mynumbers.txt', 'r')
nums = []
line = infile.readline()
print ('The random numbers were:')
while line:
nums.append(int(line))
print (line)
line = infile.readline()
total = sum(line)
print ('The total of the random numbers is:', total)
main()
i know it's incomplete, i'm still a beginner at this and this is my first introduction to computer programming or python. basically i have to use a loop to gather up the sum of all the numbers that were listed in the mynumbers.txt. any help would be GREATLY appreciated. this has been driving me up a wall.
You don't need to iterate manually in Python (this isn't C, after all):
nums = []
with open("mynumbers.txt") as infile:
for line in infile:
nums.append(int(line))
Now you just have to take the sum, but of nums, of course, not of line:
total = sum(nums)
The usual one-liner:
total = sum(map(int, open("mynumbers.txt")))
It does generate a list of integers (albeit very temporarily).
Although I would go with Tim's answer above, here's another way if you want to use readlines method
# Open a file
infile = open('mynumbers.txt', 'r')
sum = 0
lines = infile.readlines()
for num in lines:
sum += int(num)
print sum
Just another solution... :-)
with open("x.txt") as file:
total = sum(int(line) for line in file)
This solution sums the "results" of a generator object so it isn't memory intensive yet short and elegant (pythonic).
I just started learning python 3 weeks ago, I apologize if this is really basic. I needed to open a .txt file and print the length of the longest line of code in the file. I just made a random file named it myfile and saved it to my desktop.
myfile= open('myfile', 'r')
line= myfile.readlines()
len(max(line))-1
#the (the "-1" is to remove the /n)
Is this code correct? I put it in interpreter and it seemed to work OK.
But I got it wrong because apparently I was supposed to use a while loop. Now I am trying to figure out how to put it in a while loop. I've read what it says on python.org, watched videos on youtube and looked through this site. I just am not getting it. The example to follow that was given is this:
import os
du=os.popen('du/urs/local')
while 1:
line= du.readline()
if not line:
break
if list(line).count('/')==3:
print line,
print max([len(line) for line in file(filename).readlines()])
Taking what you have and stripping out the parts you don't need
myfile = open('myfile', 'r')
max_len = 0
while 1:
line = myfile.readline()
if not line:
break
if len(line) # ... somethin
# something
Note that this is a crappy way to loop over a file. It relys on the file having an empty line at the end. But homework is homework...
max(['b','aaa']) is 'b'
This lexicographic order isn't what you want to maximise, you can use the key flag to choose a different function to maximise, like len.
max(['b','aaa'], key=len) is 'aaa'
So the solution could be: len ( max(['b','aaa'], key=len) is 'aaa' ).
A more elegant solution would be to use list comprehension:
max ( len(line)-1 for line in myfile.readlines() )
.
As an aside you should enclose opening a file using a with statement, this will worry about closing the file after the indentation block:
with open('myfile', 'r') as mf:
print max ( len(line)-1 for line in mf.readlines() )
As other's have mentioned, you need to find the line with the maximum length, which mean giving the max() function a key= argument to extract that from each of lines in the list you pass it.
Likewise, in a while loop you'd need to read each line and see if its length was greater that the longest one you had seen so far, which you could store in a separate variable and initialize to 0 before the loop.
BTW, you would not want to open the file with os.popen() as shown in your second example.
I think it will be easier to understand if we keep it simple:
max_len = -1 # Nothing was read so far
with open("filename.txt", "r") as f: # Opens the file and magically closes at the end
for line in f:
max_len = max(max_len, len(line))
print max_len
As this is homework... I would ask myself if I should count the line feed character or not. If you need to chop the last char, change len(line) by len(line[:-1]).
If you have to use while, try this:
max_len = -1 # Nothing was read
with open("t.txt", "r") as f: # Opens the file
while True:
line = f.readline()
if(len(line)==0):
break
max_len = max(max_len, len(line[:-1]))
print max_len
For those still in need. This is a little function which does what you need:
def get_longest_line(filename):
length_lines_list = []
open_file_name = open(filename, "r")
all_text = open_file_name.readlines()
for line in all_text:
length_lines_list.append(len(line))
max_length_line = max(length_lines_list)
for line in all_text:
if len(line) == max_length_line:
return line.strip()
open_file_name.close()