Reading txt file with number and suming them python - python

I have txt file witht the following txt in it:
2
4 8 15 16 23 42
1 3 5
6
66 77
77
888
888 77
34
23 234 234
1
32
3
23 23 23
365
22 12
I need a way to read the file and sum all the numbers.
i have this code for now but not sure what to do next. Thx in advance
`lstComplete = []
fichNbr = open("nombres.txt", "r")
lstComplete = fichNbr
somme = 0
for i in lstComplete:
i = i.split()`

Turn them into a list and sum them:
with open('nombres.txt', 'r') as f:
num_list = f.read().split()
print sum([int(n) for n in num_list])
Returns 3227

Open the file and use read() method to get the content and then convert string to int, use sum() to get the result:
>>> sum(map(int,open('nombres.txt').read().split()))
3227

Related

How to read one column data as one by one row in csv file using python

Here I have a dataset with three inputs. Three inputs x1,x2,x3. Here I want to read just x2 column and in that column data stepwise row by row.
Here I wrote a code. But it is just showing only letters.
Here is my code
data = pd.read_csv('data6.csv')
row_num =0
x=[]
for col in data:
if (row_num==1):
x.append(col[0])
row_num =+ 1
print(x)
result : x1,x2,x3
What I expected output is:
expected output x2 (read one by one row)
65
32
14
25
85
47
63
21
98
65
21
47
48
49
46
43
48
25
28
29
37
Subset of my csv file :
x1 x2 x3
6 65 78
5 32 59
5 14 547
6 25 69
7 85 57
8 47 51
9 63 26
3 21 38
2 98 24
7 65 96
1 21 85
5 47 94
9 48 15
4 49 27
3 46 96
6 43 32
5 48 10
8 25 75
5 28 20
2 29 30
7 37 96
Can anyone help me to solve this error?
If you want list from x2 use:
x = data['x2'].tolist()
I am not sure I even get what you're trying to do from your code.
What you're doing (after fixing the indentation to make it somewhat correct):
Iterate through all columns of your dataframe
Take the first character of the column name if row_num is equal to 1.
Based on this guess:
import pandas as pd
data = pd.read_csv("data6.csv")
row_num = 0
x = []
for col in data:
if row_num == 1:
x.append(col[0])
row_num = +1
print(x)
What you probably want to do:
import pandas as pd
data = pd.read_csv("data6.csv")
# Make a list containing the values in column 'x2'
x = list(data['x2'])
# Print all values at once:
print(x)
# Print one value per line:
for val in x:
print(val)
When you are using pandas you can use it. You can try this to get any specific column values by using list to direct convert into a list.For loop not needed
import pandas as pd
data = pd.read_csv('data6.csv')
print(list(data['x2']))

Text file parsing fastest as possible

I have a very large file with lines like follows:
....
0.040027 a b c d e 12 34 56 78 90 12 34 56
0.050027 f g h i l 12 34 56 78 90 12 34 56
0.060027 a b c d e 12 34 56 78 90 12 34 56
0.070027 f g h i l 12 34 56 78 90 12 34 56
0.080027 a b c d e 12 34 56 78 90 12 34 56
0.090027 f g h i l 12 34 56 78 90 12 34 56
....
I need to have a dictionary as follows in the fastest way possible.
I using the following code:
ascFile = open('C:\\eample.txt', 'r', encoding='UTF-8')
tag1 = ' a b c d e '
tag2 = ' f g h i l '
tags = [tag1, tag2]
temp = {'k1':[], 'k2':[]}
key_tag = {'k1':tag1, 'k2':tag2 }
t1 = time.time()
for line in ascFile:
for path, tag in key_tag.items():
if tag in line:
columns = line.strip().split(tag, 1)
temp[path].append([columns[0], columns[-1].replace(' ', '')])
t2 = time.time()
print(t2-t1)
I have the following result in 6 second parsing a file of 360MB, I'd like to improve the time.
temp = {'k1':[['0.040027', '1234567890123456'], ['0.060027', '1234567890123456'], ['0.080027', '1234567890123456']], 'k2':[['0.050027', '1234567890123456'], ['0.070027', '1234567890123456'], ['0.090027', '1234567890123456']]
}
I assume you have a fixed number of words in the file that are your keys. Use split to break the string, then take a slice of the split list to compute your key directly:
import collections
# raw strings don't need \\ for backslash:
FILESPEC = r'C:\example.txt'
lines_by_key = collections.defaultdict(list)
with open(FILESPEC, 'r', encoding='UTF-8') as f:
for line in f:
cols = line.split()
key = ' '.join(cols[1:6])
pair = (cols[0], ''.join(cols[6:]) # tuple, not list, could be changed
lines_by_key[key].append(pair)
print(lines_by_key)
I used partition instead of split so that the 'in' test and splitting can be done in a single pass.
for line in ascFile:
for path, tag in key_tag.items():
val0, tag_found, val1 = line.partition(tag)
if tag_found:
temp[path].append([val0, val1.replace(' ', '')])
break
Is this any better with your 360MB file?
You might also do a simple test where all you do is loop through the file a line at a time:
for line in ascFile:
pass
This will tell you what your best possible time will be.

Python adding cr to lf in binary data?

I was writing a python script which converts an ascii file containing one pair numbers per line to a straight binary representation. Here is my script:
in_file = open("p02_0609.bin", 'r')
out_file = open("sta013.bin", 'w')
out_data = bytearray()
for line in in_file:
addr, i2c_data = [int(x) for x in line.split(" ")]
out_data.append(addr)
out_data.append(i2c_data)
out_file.write(out_data)
out_file.close()
in_file.close()
and a sample of the file it's reading (about 2000 lines total)
58 1
42 4
40 0
41 0
32 0
33 0
34 0
35 0
36 0
37 0
38 0
39 0
40 1
40 2
33 143
40 3
33 0
40 4
40 5
40 6
40 7
40 8
40 9
40 10
40 11
The output file ends on an odd byte, which it shouldn't since all the data is in pairs, and is about 80 bytes longer than expected. After poking around with a hex editor, I finally found the culprit. Every instance of "10" (Ascii LF) has had a CR appended in front of it. How do I make it stop doing that?
Tl;dr: Python is being a dumbass and adding CR to LF in binary data where that makes no sense. How to fix?
You are working with text files so line endings are automatically added by open function. You need to use the mode 'wb' in open for reading and writing bytes.

Add columns from a file to a specific position in another file

I have 3 files. In there header they all have Id's. What I want to do is to find the intersecting ID's (template is file 1) and then copy the columns with the correct ID behind the ID in the template-file.
Here is an example:
File 1 is the template:
name 123 124 125 128 131 145 156
rdt4 35 12 23 21 36 34 37
gtf2 24 18 18 29 26 12 40
hzt7 40 23 26 25 13 21 28
File 2:
name 123 124 125 126 127 128 131 132 133 145 156
rdt4 F F F T T F T T T F T
gtf2 F F F T T F T T T F T
hzt7 F F F T T F T T T F T
File 3:
name 123_a 123_b 123_c 124_a 124_b 124_c 125_a 125_b 125_c 126_a 126_b 126_c 127_a 127_b 127_c 128_a 128_b 128_c and so on
rdt4 0,087 0,265 0,632 0,220 0,851 0,271 0,436 0,148 0,080 0,899 0,636 0,467 0,508 0,460 0,393 0,689 0,427 0,798
gtf2 0,770 0,971 0,231 0,969 0,494 0,181 0,989 0,155 0,351 0,131 0,204 0,553 0,581 0,138 0,982 0,287 0,702 0,522
hzt7 0,185 0,535 0,093 0,807 0,487 0,786 0,886 0,905 0,966 0,283 0,490 0,190 0,688 0,714 0,577 0,643 0,476 0,738
The final file should look like this:
name 123 123 123_b 124 124 124_b 125 125 125_b 128 128 128_b 131 131 131_b 145 145 145_b 156 156 156_b
rdt4 35 F 0,265 12 F 0,851 23 F 0,148 21 F 0,427 36 T 34 F 37 T
gtf2 24 F 0,971 18 F 0,494 18 F 0,155 29 F 0,702 26 T 12 F 40 T
hzt7 40 F 0,535 23 F 0,487 26 F 0,905 25 F 0,476 13 T 21 F 28 T
Note: I skipped to type in everything for File 3 because it has the same numbers of ID's like file 2 but in file 3 every ID have 3 columns and I need only one of these columns (in the example column b).
What I tried so far:
I started first to do everything only with file 1 and file 2.
I copied the ID's to a new list and then find the positions of these ID's in file 2 to extract the data. But this seems to be very tricky (at least for me). The appending works so far but the problem is that every list which is stored in the list final is the same. It would be nice if you can help me with this.
This is my code so far:
try:
Expr_Matrix_1="file1.txt"
#Expr_Matrix_1=raw_input('Name file with expression data: ')
Expr_Matrix_2=open(Expr_Matrix_1)
Expr_Matrix_3=open(Expr_Matrix_1)
except:
print 'This is a wrong filename!'
exit()
try:
Probe_Detect_1="file2.txt"
#Probe_Detect_1=raw_input('Name of file with probe detection: ')
Probe_Detect_2=open(Probe_Detect_1)
Probe_Detect_3=open(Probe_Detect_1)
except:
print 'This is a wrong filename!'
exit()
find_list=list()
for b, line2 in enumerate(Expr_Matrix_2):
line2 = line2.rstrip()
line2 = line2.split("\t")
if b == 0:
for item in line2:
find_list.append(item)
find_list=find_list[7:]
find_list2=list()
for i, line in enumerate(Probe_Detect_2):
line = line.rstrip()
line = line.split("\t")
if i == 0:
for item in find_list:
find_list2.append(line.index(item))
#print find_list2
index1=8
final=list()
for b, line2 in enumerate(Expr_Matrix_3):
line2 = line2.rstrip()
line2 = line2.split("\t")
for c, line in enumerate(Probe_Detect_3):
line = line.rstrip()
line = line.split("\t")
if line2[b]==line[c]:
for item in find_list2:
if len(line2)<1551:
line2.insert(index1, line[item])
index1=index1+2
final.append(line2)
print final[1]
The fist ID-column in file 1 is column 7 that's why I used the 7 for slicing.
The 1551 means the number of rows to which it should be copied but I think this is a complete wrong approach. However, I wanted to show you my try!
Another note: All files start with the name-column but between this column and the first ID-column there are some columns which shouldn't be considered. Because file 1 is the template those columns should also be in the final file.
What is the solution?

Python: How to write values to a csv file from another csv file

For index.csv file, its fourth column has ten numbers ranging from 1-5. Each number can be regarded as an index, and each index corresponds with an array of numbers in filename.csv.
The row number of filename.csv represents the index, and each row has three numbers. My question is about using a nesting loop to transfer the numbers in filename.csv to index.csv.
from numpy import genfromtxt
import numpy as np
import csv
import collections
data1 = genfromtxt('filename.csv', delimiter=',')
data2 = genfromtxt('index.csv', delimiter=',')
out = np.zeros((len(data2),len(data1)))
for row in data2:
for ch_row in range(len(data1)):
if (row[3] == ch_row + 1):
out = row.tolist() + data1[ch_row].tolist()
print(out)
writer = csv.writer(open('dn.csv','w'), delimiter=',',quoting=csv.QUOTE_ALL)
writer.writerow(out)
For example, the fourth column of index.csv contains 1,2,5,3,4,1,4,5,2,3 and filename.csv contains:
# filename.csv
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
What I need is to write the indexed row from filename.csv to index.csv and store these number in 5th, 6th and 7th column:
# index.csv
# 4 5 6 7
... 1 20 30 50
... 2 70 60 45
... 5 13 08 55
... 3 35 26 77
... 4 93 37 68
... 1 20 30 50
... 4 93 37 68
... 5 13 08 55
... 2 70 60 45
... 3 35 26 77
If I do "print(out)", it comes out a correct answer. However, when I input "out" in the shell, there are only one row appears like [1.0, 1.0, 1.0, 1.0, 20.0, 30.0, 50.0]
What I need is to store all the values in the "out" variables and write them to the dn.csv file.
This ought to do the trick for you:
Code:
from csv import reader, writer
data = list(reader(open("filename.csv", "r"), delimiter=" "))
out = writer(open("output.csv", "w"), delimiter=" ")
for row in reader(open("index.csv", "r"), delimiter=" "):
out.writerow(row + data[int(row[3])])
index.csv:
0 0 0 1
0 0 0 2
0 0 0 3
filename.csv:
20 30 50
70 60 45
35 26 77
93 37 68
13 08 55
This produces the output:
0 0 0 1 70 60 45
0 0 0 2 35 26 77
0 0 0 3 93 37 68
Note: There's no need to use numpy here. The stadard library csv module will do most of the work for you.
I also had to modify your sample datasets a bit as what you showed had indexes out of bounds of the sample data in filename.csv.
Please also note that Python (like most languages) uses 0th indexes. So you may have to fiddle with the above code to exactly fit your needs.
with open('dn.csv','w') as f:
writer = csv.writer(f, delimiter=',',quoting=csv.QUOTE_ALL)
for row in data2:
idx = row[3]
out = [idx] + [x for x in data1[idx-1]]
writer.writerow(out)

Categories