How can I print first 52 numbers in the same column and so on (in 6 columns in total that repeats). I have lots of float numbers and I want to keep the first 52 and so on numbers in the same column before starting new column that will as well have to contain the next 52 numbers. The numbers are listed in lines separated by one space in a file.txt document. So in the end I want to have:
1 53 105 157 209 261
2
...
52 104 156 208 260 312
313 ... ... ... ... ...
...(another 52 numbers and so on)
I have try this:
with open('file.txt') as f:
line = f.read().split()
line1 = "\n".join("%-20s %s"%(line[i+len(line)/52],line[i+len(line)/6]) for i in range(len(line)/6))
print(line1)
However this only prints of course 2 column numbers . I have try to add line[i+len()line)/52] six time but the code is still not working.
for row in range(52):
for col in range(6):
print line[row + 52*col], # Dangling comma to stay on this line
print # Now go to the next line
Granted, you can do this in more Pythonic ways, but this will show you the algorithm structure and let you tighten the code as you wish.
Related
i have the following task. I have to find a specific pattern(word) in my file.txt(is a song centered on page) and to print out the row number + the row which has the pattern in it getting rid of the left spaces.
You can see the correct output here:
92 Meant in croaking "Nevermore."
99 She shall press, ah, nevermore!
107 Quoth the Raven, "Nevermore."
115 Quoth the Raven, "Nevermore."
and without this: my_str += ' '+str(count)+ ' ' + line.lstrip(), it will print:
92 Meant in croaking "Nevermore."
99 She shall press, ah, nevermore!
107 Quoth the Raven, "Nevermore."
115 Quoth the Raven, "Nevermore."
This is my code, but i want to have only 4 lines of code
```python
def find_in_file(pattern,filename):
my_str = ''
with open(filename, 'r') as file:
for count,line in enumerate(file):
if pattern in line.lower():
if count >= 10 and count <= 99:
my_str += ' '+str(count)+ ' ' + line.lstrip()
else:
my_str += str(count)+ ' ' + line.lstrip()
print(my_str)
In fact, one line can be completed:
''.join(f' {count} {line.lstrip()}' if 10 <= count <= 99 else f'{count} {line.lstrip()}' for count, line in enumerate(file) if pattern in line.lower())
However, this seems a little too long...
According to the comment area, it can be simplified:
''.join(f'{count:3} {line.lstrip()}' for count, line in enumerate(file) if pattern in line.lower())
def find_in_file(pattern,filename):
with open(filename, 'r') as file:
# 0 based line numbering, for 1 based use enumerate(file,1)
for count,line in enumerate(file):
if pattern in line.lower():
print(f"{count:>3} {line.strip()}")
would be 4 lines of code (inside the function) and should be equivalent to what you got.
Possible in one line as well:
def find_in_file(pattern,filename):
# 1 based line numbering
return '\n'.join(f'{count:>3} {line.strip()}' for count, line in enumerate(file,1) if pattern in line.lower())
See pythons mini format language.
You can use formatted strings to make sure the numbers always use three characters, even when they have only 1 or 2 digits.
I also prefer to use str.strip rather than str.lstrip, to get rid of trailing whitespace; in particular, lines read from the file will typically end with a linebreak, and then print will add a second linebreak, and we end up with too many linebreaks if we don't strip them away.
def find_in_file(pattern,filename):
with open(filename, 'r') as file:
for count,line in enumerate(file):
if pattern in line.lower():
print('{:3d} {}'.format(count, line.strip()))
find_in_file('nevermore','theraven.txt')
# 55 Quoth the Raven "Nevermore."
# 62 With such name as "Nevermore."
# 69 Then the bird said "Nevermore."
# 76 Of 'Never—nevermore'."
# 83 Meant in croaking "Nevermore."
# 90 She shall press, ah, nevermore!
# 97 Quoth the Raven "Nevermore."
# 104 Quoth the Raven "Nevermore."
# 111 Quoth the Raven "Nevermore."
# 118 Quoth the Raven "Nevermore."
# 125 Shall be lifted—nevermore!
I have a fixed width text file that I must convert to a .csv where all numbers have to be converted to integers (no commas, dollar signs, quotes, etc). I have currently parsed the text file using plain python, but when utilizing the right package I seem to be at an impasse.
With csv, I can use writer.writerows in place of my print statement to write the output into my csv file, but the problem is that I have more columns (such as the date and time) that I must add after these rows that I cannot seem to do with csv. I also cannot seem to find a way to translate the blank columns in my text document to blank columns in output. csv seems to write in order.
I was reading the documentation on xlsxwriter and I see how you can write to individual columns with a set formatting, but I am unsure if it would work with my .csv requirement
My input text has a series of random groupings throughout the 50k line document but follows the below format
* START ******************************************************************************************************************** START *
* START ******************************************************************************************************************** START *
* START ******************************************************************************************************************** START *
1--------------------
1ANTECR09 CHEK DPCK_R_009
TRANSIT EXTRACT SUB-SYSTEM
CURRENT DATE = 08/03/2017 JOURNAL REPORT PAGE 1
PROCESS DATE =
ID = 022000046-MNT
FILE HEADER = H080320171115
+____________________________________________________________________________________________________________________________________
R T SEQUENCE CR BT A RSN ITEM ITEM CHN USER REASO
NBR NBR OR PIC NBR DB NBR NBR COD AMOUNT SERIAL IND .......FIELD.. DESCR
5,556 01 7450282689 C 538196640 9835177743 15 $9,064.81 00 CREDIT
5,557 01 7450282690 D 031301422 362313705 38 $592.35 43431 DR CR
5,558 01 7450282691 D 021309379 601298839 38 $1,491.04 44896 DR CR
5,559 01 7450282692 D 071108834 176885 38 $6,688.00 1454 DR CR
5,560 01 7450282693 D 031309123 1390001566241 38 $293.42 6878 DR CR
My code currently parses this document, pulls the date, time, and prints only the lines where the sequence number starts with 42 and the CR is "C"
lines = []
a = 'PRINT DATE:'
b = 'ARCHIVE'
c = 'PRINT TIME:'
with open(r'textfile.txt') as in_file:
for line in in_file:
values = line.split()
if 'PRINT DATE:' in line:
dtevalue = line.split(a,1)[-1].split(b)[0]
lines.append(dtevalue)
elif 'PRINT TIME:' in line:
timevalue = line.split(c,1)[-1].split(b)[0]
lines.append(timevalue)
elif (len(values) >= 4 and values[3] == 'C'
and len(values[2]) >= 2 and values[2][:2] == '41'):
print(line)
print (lines[0])
print (lines[1])
What would be the cleanest way to achieve this result, and am I headed in the right direction by writing out the parsing first or should I have just done everything within a package first?
Any help is appreciated
Edit:
the header block (between 1----------, and +___________) is repeated throughout the document, as well as different sized groupings separated by -------
--------------------
34,615 207 4100223726 C 538196620 9866597322 10 $645.49 00 CREDIT
34,616 207 4100223727 D 022000046 8891636675 31 $645.49 111583 DR ON-
--------------------
34,617 208 4100223728 C 538196620 11701364 10 $756.19 00 CREDIT
34,618 208 4100223729 D 071923828 00 54 $305.31 11384597 BAD AC
34,619 208 4100223730 D 071923828 35110011 30 $450.88 10913052 6 DR SEL
--------------------
I would recommend slicing fixed width blocks to parse through the fixed width fields. Something like the following (incomplete) code:
data = """ 5,556 01 4250282689 C 538196640 9835177743 15 $9,064.81 00
CREDIT
5,557 01 7450282690 D 031301422 362313705 38 $592.35 43431
DR CR
5,558 01 7450282691 D 021309379 601298839 38 $1,491.04 44896
DR CR
"""
# list of data layout tuples (start_index, stop_index, field name)
# TODO add missing data layout tuples
data_layout = [(0, 12, 'r_nbr'), (12, 22, 't_nbr'), (22, 39, 'seq'), (39, 42, 'cr_db')]
for line in data.splitlines():
# skip "separator" lines
# NOTE this may be an iterative process to identify these
if line.startswith('-----'):
continue
record = {}
for start_index, stop_index, name in data_layout:
record[name] = line[start_index:stop_index].strip()
# your conditional (seems inconsistent with text)
if record['seq'].startswith('42') and record['cr_db'] == 'C':
# perform any special handling for each column
record['r_nbr'] = record['r_nbr'].replace(',', '')
# TODO other special handling (like $)
print('{r_nbr},{t_nbr},{seq},{cr_db},...'.format(**record))
Output is:
5556,01,4250282689,C,...
Update based on seemingly spurious values in undefined columns
Based on the new information provided about the "spurious" columns/fields (appear only rarely), this will likely be an iterative process.
My recommendation would be to narrow (appropriately!) the width of the desired fields. For example, if spurious data is in line[12:14] above, you could change the tuple for (12, 22, 't_nbr') to (14, 22, 't_nbr') to "skip" the spurious field.
An alternative is to add a "garbage" field in the list of tuples to handle those types of lines. Wherever the "spurious" fields appear, the "garbage" field would simply consume it.
If you need these fields, the same general approach to the "garbage" field approach still applies, but you save the data.
Update based on random separators
If they are relatively consistent, I'd simply add some logic (as I did above) to "detect" the separators and skip over them.
I know there are questions on how to extract numbers from a text file, which have helped partially. Here is my problem. I have a text file that looks like:
Some crap here: 3434
A couple more lines
of crap.
34 56 56
34 55 55
A bunch more crap here
More crap here: 23
And more: 33
54 545 54
4555 55 55
I am trying to write a script that extracts the lines with the three numbers and put them into separate text files. For example, I'd have one file:
34 56 56
34 55 55
And another file:
54 545 54
4555 55 55
Right now I have:
for line in file_in:
try:
float(line[1])
file_out.write(line)
except ValueError:
print "Just using this as placeholder"
This successfully puts both chunks of numbers into a single file. But I need it to put one chunk in one file, and another chunk in another file, and I'm lost on how to accomplish this.
You didn't specify what version of Python you were using but you might approach it this way in Python2.7.
string.translate takes a translation table (which can be None) and a group of characters to translate (or delete if table is None).
You can set your delete_chars to everything but 0-9 and space by slicing string.printable correctly:
>>> import string
>>> remove_chars = string.printable[10:-6] + string.printable[-4:]
>>> string.translate('Some crap 3434', None, remove_chars)
' 3434'
>>> string.translate('34 45 56', None, remove_chars)
'34 45 56'
Adding a strip to trim white space on the left and right and iterating over a testfile containing the data from your question:
>>> with open('testfile.txt') as testfile:
... for line in testfile:
... trans = line.translate(None, remove_chars).strip()
... if trans:
... print trans
...
3434
34 56 56
34 55 55
23
33
54 545 54
4555 55 55
You can use regex here.But this will require reading file into a variable by file.read() or something.(If the file is not huge)
((?:(?:\d+ ){2}\d+(?:\n|$))+)
See demo.
https://regex101.com/r/tX2bH4/20
import re
p = re.compile(r'((?:(?:\d+ ){2}\d+(?:\n|$))+)', re.IGNORECASE)
test_str = "Some crap here: 3434\nA couple more lines\nof crap.\n34 56 56\n34 55 55\nA bunch more crap here\nMore crap here: 23\nAnd more: 33\n54 545 54\n4555 55 55"
re.findall(p, test_str)
re.findall returns a list.You can easily put each content of list in a new file.
To know if a string is a number you can use str.isdigit:
for line in file_in:
# split line to parts
parts = line.strip().split()
# check all parts are numbers
if all([str.isdigit(part) for part in parts]):
if should_split:
split += 1
with open('split%d' % split, 'a') as f:
f.write(line)
# don't split until we skip a line
should_split = False
else:
with open('split%d' % split, 'a') as f:
f.write(line)
elif not should_split:
# skipped line means we should split
should_split = True
I have a tab-delimited txt that looks like
11 22 33 44
53 25 36 25
74 89 24 35 and
But there is no "tab" after 44 and 25. So the 1st and 2nd rows have 4 columns, 3rd row has 5 columns.
To rewrite it so that tabs are shown,
11\t22\t33\t44
53\t25\t36\t25
74\t89\t24\t35\tand
I need to have a tool to mass-add tabs where there are no entries.
If the maximum length of column is n (n=5 in the above example), then I want to fill tabs until that nth column for all rows to make
11\t22\t33\t44\t
53\t25\t36\t25\t
74\t89\t24\t35\tand
I tried to do it by notepad++, and python by using replacer code like
map_dict = {'':'\t'}
but it seems I need more logic to do it.
I am assuming your file also contains newlines so it would actually look like this:
11\t22\t33\t44\n
53\t25\t36\t25\n
74\t89\t24\t35\tand\n
If you know for sure that the maximum length of your columns is 5, you can do it like this:
with open('my_file.txt') as my_file:
y = lambda x: len(x.strip().split('\t'))
a = [line if y(line) == 5 else '%s%s\n' % (line.strip(), '\t'*(5 - y(line)))
for line in my_file.readlines()]
# ['11\t22\t33\t44\t\n', '53\t25\t36\t25\t\n', '74\t89\t24\t35\tand\n']
This will add ending tabs until you reach 5 columns. You will get a list of lines that you need to write back to a file (i have 'my_file2.txt' but you can write back to the original one if you want).
with open('my_file2.txt', 'w+') as out_file:
for line in a:
out_file.write(line)
If I understood it correctly, you can achieve this in Notepad++ only using following:
And yes, if you have several files on which you want to perform this, you can record this as a macro and bind it on to key as a shortcut
Hello. I am very new to Python and programming in general.
I have 3 columns from CSV file
X,CH1,CH2,
Second,Volt,Volt,
2.66400e-02,4.00e-03,1.04e-03,
-2.66360e-02,4.00e-03,7.20e-04,
-2.66320e-02,4.00e-03,5.60e-04,
-2.66280e-02,4.00e-03,3.20e-04,
-2.66240e-02,4.00e-03,8.00e-05,
-2.66200e-02,4.00e-03,-2.40e-04,
-2.66160e-02,4.00e-03,-5.60e-04,
-2.66120e-02,4.00e-03,-7.20e-04,
-2.66080e-02,4.00e-03,-1.04e-03, ***for example.***
I am using
:
with open('maximum.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
for _ in xrange(2):
next(f)
to skip first two lines, as this is just text, and then
for row in reader:
x=(float(row[2]))
print(x)
gives me
0.00104
0.00072
0.00056
0.00032
8e-05
-0.00024
-0.00056
-0.00072
-0.00104
So there is the question:
What should I write, so that it will give me an integer number instead of decimals, like
104
72
56
24
8
24
56
72
104
P.S I do not want just to multiply by 10^5
Thanks
You have to multiply by 10 ^ 5 because you actually want to have bigger number.
Then apply function int() and get 104 instead of 104.0