The text file looks like this:
421 2 1 8 34 27
421 0 0 8 37 27
435 0 1 9 8 44
435 4 0 9 10 50
for row in file_content[0:]:
id, place, inout, hour, min, sec = row.split(" ")
print (id)
In the code I wanted to separate the rows, the first column contains the ids of persons, the second is ids of places, third is the person go in or out (0/1), and the last 3 is time (hour:min:sec)
Could someone help me correct this code so I could continue the practicing for my exam? (I'm a beginner)
with open("Text.txt", "r") as f:
id, place, inout, hour, min, sec = zip(*map(str.split, f))
print(id)
# [OUT] ('421', '421', '435', '435')
Zip()
>>> filecontent =open("test.txt",'r')
>>> for row in filecontent:
... id, place, inout, hour, min, sec = row.split(" ")
... print("id is", id)
...
id is 421
id is 421
id is 435
id is 435
Related
I have a pandas data frame like this (represent a investment portfolio):
data = {'category':['stock', 'bond', 'cash', 'stock',’cash’],
'name':[‘AA’ , ‘BB’, ‘CC’, ‘DD’, ’EE’],
'quantity':[2, 2, 10, 4, 3],
'price':[10, 15, 4, 2, 4],
'value':[ 20, 30, 40,8, 12],
df = pd.DataFrame(data)
I would like to generate a report in a text file that looks like this :
Stock: Total: 60
Name quantity price value
AA 2 10 20
CC 10 4 40
Bond: Total: 60
Name quantity price value
BB 2 15 30
Cash: Total: 52
Name quantity price value
CC 10 4 40
EE 3 4 12
I found a way to do this by looping through a list of dataframe but it is kind of ugly, I think there should be a way with iterrow or iteritem, but I can’t make it work.
Thank you for your help !
You can loop by groupby object and write custom header with data:
for i, g in df.groupby('category', sort=False):
with open('out.csv', 'a') as f:
f.write(f'{i}: Total: {g["value"].sum()}\n')
(g.drop('category', axis=1)
.to_csv(f, index=False, mode='a', sep='\t', line_terminator='\n'))
f.write('\n')
Output:
stock: Total: 28
name quantity price value
AA 2 10 20
DD 4 2 8
bond: Total: 30
name quantity price value
B 2 15 30
cash: Total: 52
name quantity price value
CC 10 4 40
EE 3 4 12
I have a question that includes various steps.
I am parsing a file that looks like this:
9
123
0 987
3 890 234 111
1 0 1 90 1 34 1 09 1 67
1 684321
2 352 69
1 1 1 243 1 198 1 678 1 11
2 098765
1 143
1 2 1 23 1 63 1 978 1 379
3 784658
1 43
1 3 1 546 1 789 1 12 1 098
I want to make this lines in the file, keys of a dictionary (ignoring the first number and just taking the second one, because it just indicates which number of key should be):
0 987
1 684321
2 098765
3 784658
And this lines, the values of the elements (ignoring only the first number too, because it just indicates how many elements are):
3 890 234 111
2 352 69
1 143
1 43
So at the end it has to look like this:
d = {987 : [890, 234, 111], 684321 : [352, 69],
098765 : [143], 784658 : [43]}
So far I have this:
findkeys = re.findall(r"\d\t(\d+)\n", line)
findelements = re.findall(r"\d\t(\d+)", line)
listss.append("".join(findelements))
d = {findkeys: listss}
The regular expressions need more exceptions because the one for the keys, it gives me the elements of other lines that I don't want them to be keys, but have just one number too. Like in the example of the file, the number 43 appears as a result.
And the regular expression of the elements gives me back all the lines.
I don´t know if it will be easier to make that the code should ignore the lines of which I do not need information, but I don't know how to do that.
I want it to keep it has simple has possible.
Thanks!
with open('filename.txt') as f:
lines = f.readlines()
lines = [x.strip() for x in lines]
lines = lines[2:]
keys = lines[::3]
values = lines[1::3]
output lines:
['0 987',
'3 890 234 111',
'1 0 1 90 1 34 1 09 1 67',
'1 684321',
'2 352 69',
'1 1 1 243 1 198 1 678 1 11',
'2 098765',
'1 143',
'1 2 1 23 1 63 1 978 1 379',
'3 784658',
'1 43',
'1 3 1 546 1 789 1 12 1 098']
output keys:
['0 987', '1 684321', '2 098765', '3 784658']
output values:
['3 890 234 111', '2 352 69', '1 143', '1 43']
Now you just have to put it together ! Iterate through keys and values.
Once you have the lines in a list (lines variable), you can simply use re to isolate numbers and dictionary/list comprehension to build the desired data structure.
Based on you example data, every 3rd line is a key with values on the following line. This means you only need to stride by 3 in the list.
findall() will give you the list of numbers (as text) on each line and you can ignore the first one with simple subscripts.
import re
value = re.compile(r"(\d+)")
numbers = [ [int(v) for v in value.findall(line)] for line in lines]
intDict = { key[1]:values[1:] for key,values in zip(numbers[2::3],numbers[3::3]) }
You could also do it using split() but then you have to exclude empty entries that multiple spaces will create in the split:
numbers = [ [int(v) for v in line.split() if v != ""] for line in lines]
intDict = { key[1]:values[1:] for key,values in zip(numbers[2::3],numbers[3::3]) }
You could build yourself a parser with e.g. parsimonious:
from parsimonious.nodes import NodeVisitor
from parsimonious.grammar import Grammar
data = """
9
123
0 987
3 890 234 111
1 0 1 90 1 34 1 09 1 67
1 684321
2 352 69
1 1 1 243 1 198 1 678 1 11
2 098765
1 143
1 2 1 23 1 63 1 978 1 379
3 784658
1 43
1 3 1 546 1 789 1 12 1 098
"""
grammar = Grammar(
r"""
data = (important / garbage)+
important = keyline newline valueline
garbage = ~".*" newline?
keyline = ws number ws number
valueline = (ws number)+
newline = ~"[\n\r]"
number = ~"\d+"
ws = ~"[ \t]+"
"""
)
tree = grammar.parse(data)
class DataVisitor(NodeVisitor):
output = {}
current = None
def generic_visit(self, node, visited_children):
return node.text or visited_children
def visit_keyline(self, node, children):
key = node.text.split()[-1]
self.current = key
def visit_valueline(self, node, children):
values = node.text.split()
self.output[self.current] = [int(x) for x in values[1:]]
dv = DataVisitor()
dv.visit(tree)
print(dv.output)
This yields
{'987': [890, 234, 111], '684321': [352, 69], '098765': [143], '784658': [43]}
The idea here is that every "keyline" is only composed of two numbers with the second being the soon-to-be keyword. The next line is the valueline.
Input:
LineNo word_num left top width text
1 1 322 14 14 My
1 2 304 4 41 Name
1 3 322 5 9 is
1 4 316 14 20 Raghav
2 1 420 129 34 Problem
2 2 420 31 27 just
2 3 420 159 27 got
2 4 431 2 38 complicated
1 1 322 14 14 #40
1 2 304 4 41 #gmail.com
2 1 420 129 34 2019
2 2 420 31 27 January
As you can see there are columns lineNo, left, top and word_num, so I was trying if I can get some logic using these both maybe I can achieve my solution.
I wanted to do some tweaks in the output, actually this output is coming through a PDF after its converted into an image, so it is catching the whole line because of which whole line is coming and the output is not making sense, what i am thinking of doing now is to group the text in a meaning full way. For e.g
lets say this output i am getting by using this:
g = df['line_num'].ne(df['line_num'].shift()).cumsum()
out = '\n'.join(df.groupby(g)['text'].agg(' '.join))
print (out)
Output=
"My name is raghav #40 #gmail.com
Problem just got complicated $2019 January"
Expected Output=
"My name is raghav
*40
#gmail.com
Problem just got complicated
2019 January"
All are in different lines no matter if they are in same line or not but logically grouped in different lines.
In my understanding maybe we can achieve this by doing these steps:
enter image description here
a) Words on same line are grouped if x distance < threshold
b) Words on next line are grouped with previous if y distance < threshold
Threshold is width(image)/ 100; x distance is calculated from left; y distance is calculated from top.
Can we do this ?
Let me know if the question is not clear enough!
Thanks!
Added the image i am trying to get the output, data in it is little complicated this i have changed it according to me!
To answer your second concern, maybe try iterating through the column like so.
phrase = ""
for i in range(0, df.count):
if type(df.iat[i, 'text']) == str:
phrase = phrase + " " + df.iat[i, 'text']
To add the space/..., I agree with jezrael, use the str.cat method.
Use double join - with agg and then for output Series:
out = '.....'.join(df.groupby('LineNo')['text'].agg(' '.join))
print (out)
My Name is Raghav.....Roll No. # 242
Another solution with str.cat:
out = df.groupby('LineNo')['text'].agg(' '.join).str.cat(sep='.....')
EDIT:
g = df['LineNo'].ne(df['LineNo'].shift()).cumsum()
out = '.....'.join(df.groupby(g)['text'].agg(' '.join))
print (out)
My Name is Raghav.....Roll No. # 242.....hello the problem just.....got more complicated !!!!
This question already has answers here:
Create nice column output in python
(22 answers)
Closed 5 years ago.
I have a problem that in the output of my code;
elements of each column does not place exactly beneath each other.
My original code is too busy, so I reduce it to a simple one;
so at first les's explain this simple one:
At first consider one simple question as follows:
Write a code which recieves a natural number r, as number of rows;
and recieves another natural number c, as number of columns;
and then print all natural numbers
form 1 to rc in r rows and c columns.
So the code will be something like the following:
r = int(input("How many Rows? ")); ## here r stands for number of rows
c = int(input("How many columns? ")); ## here c stands for number of columns
for i in range(1,r+1):
for j in range (1,c+1):
print(j+c*(i-1)) ,
print
and the output is as follows:
How many Rows? 5
How many columns? 6
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
>>>
or:
How many Rows? 7
How many columns? 3
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
>>>
What should I do, to get an output like this?
How many Rows? 5
How many columns? 6
1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
>>>
or
How many Rows? 7
How many columns? 3
1 2 3
4 5 6
7 8 9
10 11 12
13 14 15
16 17 18
19 20 21
>>>
Now my original code is somthing like the following:
def function(n):
R=0;
something...something...something...
something...something...something...
something...something...something...
something...something...something...
return(R)
r = int(input("How many Rows? ")); ## here r stands for number of rows
c = int(input("How many columns? ")); ## here c stands for number of columns
for i in range(0,r+1):
for j in range(0,c+1)
n=j+c*(i-1);
r=function(n);
print (r)
Now for simplicity, suppose that by some by-hand-manipulation we get:
f(1)=function(1)=17, f(2)=235, f(3)=-8;
f(4)=-9641, f(5)=54278249, f(6)=411;
Now when I run the code the out put is as follows:
How many Rows? 2
How many columns? 3
17
235
-8
-9641
54278249
41
>>>
What shold I do to get an output like this:
How many Rows? 2
How many columns? 3
17 235 -8
-9641 54278249 411
>>>
Also note that I did not want to get something like this:
How many Rows? 2
How many columns? 3
17 235 -8
-9641 54278249 411
>>>
Use rjust method:
r,c = 5,5
for i in range(1,r+1):
for j in range (1,c+1):
str_to_printout = str(j+c*(i-1)).rjust(2)
print(str_to_printout),
print
Result:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
UPD.
As for your last example, let's say f(n) is defined in this way:
def f(n):
my_dict = {1:17, 2:235, 3:-8, 4:-9641, 5:54278249, 6:411}
return my_dict.get(n, 0)
Then you can use the following approach:
r,c = 2,3
# data table with elemets in string format
data_str = [[str(f(j+c*(i-1))) for j in range (1,c+1)] for i in range(1,r+1)]
# transposed data table and list of max len for every column in data_str
data_str_transposed = [list(i) for i in zip(*data_str)]
max_len_columns = [max(map(len, col)) for col in data_str_transposed]
# printing out
# the string " " before 'join' is a delimiter between columns
for row in data_str:
print(" ".join(elem.rjust(max_len) for elem, max_len in zip(row, max_len_columns)))
Result:
17 235 -8
-9641 54278249 411
With r,c = 3,3:
17 235 -8
-9641 54278249 411
0 0 0
Note that the indent in each column corresponds to the maximum length in this column, and not in the entire table.
Hope this helps. Please comment if you need any further clarifications.
# result stores the final matrix
# max_len stores the length of maximum element
result, max_len = [], 0
for i in range(1, r + 1):
temp = []
for j in range(1, c + 1):
n = j + c * (i - 1);
r = function(n);
if len(str(r)) > max_len:
max_len = len(str(r))
temp.append(r)
result.append(temp)
# printing the values seperately to apply rjust() to each and every element
for i in result:
for j in i:
print(str(j).rjust(max_len), end=' ')
print()
Adopted from MaximTitarenko's answer:
You first look for the minimum and maximum value, then decide which is the longer one and use its length as the value for the rjust(x) call.
import random
r,c = 15,5
m = random.sample(xrange(10000), 100)
length1 = len(str(max(m)))
length2 = len(str(min(m)))
longest = max(length1, length2)
for i in range(r):
for j in range (c):
str_to_printout = str(m[i*c+j]).rjust(longest)
print(str_to_printout),
print
Example output:
937 9992 8602 4213 7053
1957 9766 6704 8051 8636
267 889 1903 8693 5565
8287 7842 6933 2111 9689
3948 428 8894 7522 417
3708 8033 878 4945 2771
6393 35 9065 2193 6797
5430 2720 647 4582 3316
9803 1033 7864 656 4556
6751 6342 4915 5986 6805
9490 2325 5237 8513 8860
8400 1789 2004 4500 2836
8329 4322 6616 132 7198
4715 193 2931 3947 8288
1338 9386 5036 4297 2903
You need to use the string method .rjust
From the documentation (linked above):
string.rjust(s, width[, fillchar])
This function right-justifies a string in a field of given width. It returns a string that is at least width characters wide, created by padding the string with the character fillchar (default is a space) until the given width on the right. The string is never truncated.
So we need to calculate what the width (in characters) each number should be padded to. That is pretty simple, just the number of rows * number of columns + 1 (the +1 adds a one-space gab between each column).
Using this, it becomes quite simple to write the code:
r = int(input("How many Rows? "))
c = int(input("How many columns? "))
width = len(str(r*c)) + 1
for i in range(1,r+1):
for j in range(1,c+1):
print str(j+c*(i-1)).rjust(width) ,
print
which for an r, c of 4, 5 respectively, outputs:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
Hopefully this helps you out and you can adapt this to other situations yourself!
I have read other simliar posts but they don't seem to work in my case. Hence, I'm posting it newly here.
I have a text file which has varying row and column sizes. I am interested in the rows of values which have a specific parameter. E.g. in the sample text file below, I want the last two values of each line which has the number '1' in the second position. That is, I want the values '1, 101', '101, 2', '2, 102' and '102, 3' from the lines starting with the values '101 to 104' because they have the number '1' in the second position.
$MeshFormat
2.2 0 8
$EndMeshFormat
$Nodes
425
.
.
$EndNodes
$Elements
630
.
97 15 2 0 193 97
98 15 2 0 195 98
99 15 2 0 197 99
100 15 2 0 199 100
101 1 2 0 201 1 101
102 1 2 0 201 101 2
103 1 2 0 202 2 102
104 1 2 0 202 102 3
301 2 2 0 303 178 78 250
302 2 2 0 303 250 79 178
303 2 2 0 303 198 98 249
304 2 2 0 303 249 99 198
.
.
.
$EndElements
The problem is, with the code I have come up with mentioned below, it starts from '101' but it reads the values from the other lines upto '304' or more. What am I doing wrong or does someone has a better way to tackle this?
# Here, (additional_lines + anz_knoten_gmsh - 2) are additional lines that need to be skipped
# at the beginning of the .txt file. Initially I find out where the range
# of the lines lies which I need.
# The two_noded_elem_start is the first line having the '1' at the second position
# and four_noded_elem_start is the first line number having '2' in the second position.
# So, basically I'm reading between these two parameters.
input_file = open(os.path.join(gmsh_path, "mesh_outer_region.msh"))
output_file = open(os.path.join(gmsh_path, "mesh_skip_nodes.txt"), "w")
for i, line in enumerate(input_file):
if i == (additional_lines + anz_knoten_gmsh + two_noded_elem_start - 2):
break
for i, line in enumerate(input_file):
if i == additional_lines + anz_knoten_gmsh + four_noded_elem_start - 2:
break
elem_list = line.strip().split()
del elem_list[:5]
writer = csv.writer(output_file)
writer.writerow(elem_list)
input_file.close()
output_file.close()
*EDIT: The piece of code used to find the parameters like two_noded_elem_start is as follows:
# anz_elemente_ueberg_gmsh is another parameter that is found out
# from a previous piece of code and '$EndElements' is what
# is at the end of the text file "mesh_outer_region.msh".
input_file = open(os.path.join(gmsh_path, "mesh_outer_region.msh"), "r")
for i, line in enumerate(input_file):
if line.strip() == anz_elemente_ueberg_gmsh:
break
for i, line in enumerate(input_file):
if line.strip() == '$EndElements':
break
element_list = line.strip().split()
if element_list[1] == '1':
two_noded_elem_start = element_list[0]
two_noded_elem_start = int(two_noded_elem_start)
break
input_file.close()
>>> with open('filename') as fh: # Open the file
... for line in fh: # For each line the file
... values = line.split() # Split the values into a list
... if values[1] == '1': # Compare the second value
... print values[-2], values[-1] # Print the 2nd from last and last
1 101
101 2
2 102
102 3