i have a file with many lines like this,
>6_KA-RFNB-1505/2021-EPI_ISL_8285588-2021-12-02
i need to convert it to
>6_KA_2021-1202
all of the lines that require this change start in a >
The 6_KA and the 2021-12-02 are different for all lines.
I also need to add an empty line before every line that i change in thsi manner.
UPDATE: You changed the requirements from when I originally answered yourpost, but the below does what you are looking for. The principle remains the same: use regex to identify the parts of the string you are looking to replace. And then as you are going thru each line of the file create a new string based on the values you parsed out from the regex
import re
regex = re.compile('>(?P<first>[0-9a-zA-Z]{1,3}_[0-9a-zA-Z]{1,3}).*(?P<year>[0-9]{4})-(?P<month>[0-9]{2})-(?P<day>[0-9]{2})\n')
def convert_file(inputFile):
with open(inputFile, 'r') as input, open('Output.txt', 'w') as output:
for line in input:
text = regex.match(line)
if text:
output.write("\n" + text.group("first") + '_' + text.group("year") + "-" + text.group("month") + text.group("day") + "\n")
else:
output.write(line)
convert_file('data.txt')
-
Hi friends.
I have a lot of files, which contains text information, but I want to search only specific lines, and then in these lines search for on specific position values and multiply them with fixed value (or entered with input).
Example text:
1,0,0,0,1,0,0
15.000,15.000,135.000,15.000
7
3,0,0,0,2,0,0
'holep_str',50.000,-15.000,20.000,20.000,0.000
3
3,0,0,100,3,-8,0
58.400,-6.600,'14',4.000,0.000
4
3,0,0,0,3,-8,0
50.000,-15.000,50.000,-15.000
7
3,0,0,0,4,0,0
'holep_str',100.000,-15.000,14.000,14.000,0.000
3
3,0,0,100,5,-8,0
108.400,-6.600,'14',4.000,0.000
And I want to identify and modify only lines with "holep_str" text:
'holep_str',50.000,-15.000,20.000,20.000,0.000
'holep_str',100.000,-15.000,14.000,14.000,0.000
There are in each line that begins with the string "holep_str" two numbers, at position 3rd and 4th value:
20.000 20.000
14.000 14.000
And these can be identified like:
1./ number after 3rd comma on line beginning with "holep_str"
2./ number after 4th comma on line beginning with "holep_str"
RegEx cannot help, Python probably sure, but I'm in time press - and go no further with the language...
Is there somebody that can explain how to write this relative simple code, that finds all lines with "search string" (= "holep_str") - and multiply the values after 3rd & 4th comma by FIXVALUE (or value input - for example "2") ?
The code should walk through all files with defined extension (choosen by input - for example txt) where the code is executed - search all values on needed lines and multiply them and write back...
So it looks like - if FIXVALUE = 2:
'holep_str',50.000,-15.000,40.000,40.000,0.000
'holep_str',100.000,-15.000,28.000,28.000,0.000
And whole text looks like then:
1,0,0,0,1,0,0
15.000,15.000,135.000,15.000
7
3,0,0,0,2,0,0
'holep_str',50.000,-15.000,40.000,40.000,0.000
3
3,0,0,100,3,-8,0
58.400,-6.600,'14',4.000,0.000
4
3,0,0,0,3,-8,0
50.000,-15.000,50.000,-15.000
7
3,0,0,0,4,0,0
'holep_str',100.000,-15.000,28.000,28.000,0.000
3
3,0,0,100,5,-8,0
108.400,-6.600,'14',4.000,0.000
Thank You.
with open(file_path) as f:
lines = f.readlines()
for line in lines:
if line.startswith(r"'holep_str'"):
split_line = line.split(',')
num1 = float(split_line[3])
num2 = float(split_line[4])
print num1, num2
# do stuff with num1 and num2
Once you .split() the lines with the argument ,, you get a list. Then, you can find the values you want by index, which are 3 and 4 in your case. I also convert them to float at the end.
Also final solution - whole program (version: python-3.6.0-amd64):
# import external functions / extensions ...
import os
import glob
# functions definition section
def fnc_walk_through_files(path, file_extension):
for (dirpath, dirnames, filenames) in os.walk(path):
for filename in filenames:
if filename.endswith(file_extension):
yield os.path.join(path, filename)
# some variables for counting
line_count = 0
# Feed data to program by entering them on keyboard
print ("Enter work path (e.g. d:\\test) :")
workPath = input( "> " )
print ("File extension to perform Search-Replace on [spf] :")
fileExt = input( "> " )
print ("Enter multiplier value :")
multiply_value = input( "> " )
print ("Text to search for :")
textToSearch = input( "> " )
# create temporary variable with path and mask for deleting all ".old" files
delPath = workPath + "\*.old"
# delete old ".old" files to allow creating backups
for files_to_delete in glob.glob(delPath, recursive=False):
os.remove(files_to_delete)
# do some needed operations...
print("\r") #enter new line
multiply_value = float(multiply_value) # convert multiplier to float
textToSearch_mod = "\'" + textToSearch # append apostrophe to begin of searched text
textToSearch_mod = str(textToSearch_mod) # convert variable to string for later use
# print information line of what will be searched for
print ("This is what will be searched for, to identify right line: ", textToSearch_mod)
print("\r") #enter new line
# walk through all files with specified extension <-- CALLED FUNCTION !!!
for fname in fnc_walk_through_files(workPath, fileExt):
print("\r") # enter new line
# print filename of processed file
print(" Filename processed:", fname )
# and proccess every file and print out numbers
# needed to multiplying located at 3rd and 4th position
with open(fname, 'r') as f: # opens fname file for reading
temp_file = open('tempfile','w') # open (create) tempfile for writing
lines = f.readlines() # read lines from f:
line_count = 0 # reset counter
# loop througt all lines
for line in lines:
# line counter increment
line_count = line_count + 1
# if line starts with defined string - she will be processed
if line.startswith(textToSearch_mod):
# line will be divided into parts delimited by ","
split_line = line.split(',')
# transfer 3rd part to variable 1 and make it float number
old_num1 = float(split_line[3])
# transfer 4th part to variable 2 and make it float number
old_num2 = float(split_line[4])
# multiply both variables
new_num1 = old_num1 * multiply_value
new_num2 = old_num2 * multiply_value
# change old values to new multiplied values as strings
split_line[3] = str(new_num1)
split_line[4] = str(new_num2)
# join the line back with the same delimiter "," as used for dividing
line = ','.join(split_line)
# print information line on which has been the searched string occured
print ("Changed from old:", old_num1, old_num2, "to new:", new_num1, new_num2, "at line:", line_count)
# write changed line with multiplied numbers to temporary file
temp_file.write(line)
else:
# write all other unchanged lines to temporary file
temp_file.write(line)
# create new name for backup file with adding ".old" to the end of filename
new_name = fname + '.old'
# rename original file to new backup name
os.rename(fname,new_name)
# close temporary file to enable future operation (in this case rename)
temp_file.close()
# rename temporary file to original filename
os.rename('tempfile',fname)
Also after 2 days after asking with a help of good people and hard study of the language :-D (indentation was my nightmare) and using some snippets of code on this site I have created something that works... :-) I hope it helps other people with similar question...
At beginning the idea was clear - but no knowledge of the language...
Now - all can be done - only what man can imagine is the border :-)
I miss GOTO in Python :'( ... I love spaghetti, not the spaghetti code, but sometimes it would be good to have some label<--goto jumps... (but this is not the case...)
I am currently trying to extract the raw data from a .txt file of 10 urls, and put the raw data from each line(URL) in the .txt file. And then repeat the process with the processed data(the raw data from the same original .txt file stripped of the html) by using Python.
import commands
import os
import json
# RAW DATA
input = open('uri.txt', 'r')
t_1 = open('command', 'w')
counter_1 = 0
for line in input:
counter_1 += 1
if counter_1 < 11:
filename = str(counter_1)
print str(line)
filename= str(count)
command ='curl ' + '"' + str(line).rstrip('\n') + '"'+ '> ./rawData/' + filename
output_1 = commands.getoutput(command)
input.close()
# PROCESSED DATA
counter_2 = 0
input = open('uri.txt','r')
t_2 = open('command','w')
for line in input:
counter_2 += 1
if counter_2 <11:
filename = str(counter_2) + '-processed'
command = 'lynx -dump -force_html ' + '"'+ str(line).rstrip('\n') + '"'+'> ./processedData/' + filename
print command
output_2 = commands.getoutput(command)
input.close()
I am attempting to do all of this with one script. Can anyone help me refine my code so I can run it? it should loop through the code completely once for each kind line in the .txt file. For example, I should have 1 raw & 1 processed .txt file for every url line in my .txt file.
Break your code up into functions. Currently the code is hard to read and debug. Make a function called get_raw() and a function called get_processed(). Then for your main loop, you can do
for line in file:
get_raw(line)
get_processed(line)
Or something similar. Also you should avoid using 'magic numbers' like counter<11. Why is it 11? Is it the number of the lines in the file? If it is you can get the number of lines with len().
I have this table of data in Notepad
But it's not really a table because there aren't like official columns. It's just looks like a table, but the data is organized using spaces.
I want to convert it into a CSV format. How should I go about doing this?
The panda python packages I am using for data analysis work best with CSV, as far as I understand.
Here is a hackjob python script to do exactly what you need. Just save the script as a python file and run it with the path of your input file as the only argument.
UPDATED: After reading the comments to my answer, my script now uses regular expressions to account for any number of spaces.
import re
from sys import argv
output = ''
with open(argv[1]) as f:
for i, line in enumerate(f.readlines()):
if i == 0:
line = line.strip()
line = re.sub('\s+', ',', line) + '\n'
else:
line = re.sub('\s\s+', ',', line)
output += line
with open(argv[1] + '.csv', 'w') as f:
f.write(output)
So this is put into a file (if you call it csvify.py) and executed as:
python csvify.py <input_file_name>
csvify.py:
from sys import argv
from re import finditer
#Method that returns fields separated by commas
def comma_delimit(line, ranges):
return ','.join(get_field(line, ranges))
#Method that returns field info in appropriate format
def get_field(line, ranges):
for span in ranges: #Iterate through column ranges
field = line[slice(*span)].strip() #Get field data based on range slice and trim
#Use str() function if field doesn't contain commas, otherwise use repr()
yield (repr if ',' in field else str)(field)
#Open the input text file from command line (readonly, closed automatically)
with open(argv[1], 'r') as inp:
#Convert the first line (assumed header) into range indexes
#Use finditer to split the line by word border until the next word
#This assumes no spaces within header names
columns = map(lambda match: match.span(), finditer(r'\b\w+\s*', inp.readline()))
inp.seek(0) #Reset file pointer to beginning to include header line
#Create new CSV based on input file name
with open(argv[1] + '.csv', 'w') as txt:
#Writes to file and join all converted lines with newline
txt.write('\n'.join(comma_delimit(line, columns) for line in inp.readlines()))
I am still learner in python. I was not able to find a specific string and insert multiple strings after that string in python. I want to search the line in the file and insert the content of write function
I have tried the following which is inserting at the end of the file.
line = '<abc hij kdkd>'
dataFile = open('C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml', 'a')
dataFile.write('<!--Delivery Date: 02/15/2013-->\n<!--XML Script: 1.0.0.1-->\n')
dataFile.close()
You can use fileinput to modify the same file inplace and re to search for particular pattern
import fileinput,re
def modify_file(file_name,pattern,value=""):
fh=fileinput.input(file_name,inplace=True)
for line in fh:
replacement=value + line
line=re.sub(pattern,replacement,line)
sys.stdout.write(line)
fh.close()
You can call this function something like this:
modify_file("C:\\Users\\Malik\\Desktop\\release_0.5\\release_0.5\\5075442.xml",
"abc..",
"!--Delivery Date:")
Python strings are immutable, which means that you wouldn't actually modify the input string -you would create a new one which has the first part of the input string, then the text you want to insert, then the rest of the input string.
You can use the find method on Python strings to locate the text you're looking for:
def insertAfter(haystack, needle, newText):
""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
You could use it like
print insertAfter("Hello World", "lo", " beautiful") # prints 'Hello beautiful world'
Here is a suggestion to deal with files, I suppose the pattern you search is a whole line (there is nothing more on the line than the pattern and the pattern fits on one line).
line = ... # What to match
input_filepath = ... # input full path
output_filepath = ... # output full path (must be different than input)
with open(input_filepath, "r", encoding=encoding) as fin \
open(output_filepath, "w", encoding=encoding) as fout:
pattern_found = False
for theline in fin:
# Write input to output unmodified
fout.write(theline)
# if you want to get rid of spaces
theline = theline.strip()
# Find the matching pattern
if pattern_found is False and theline == line:
# Insert extra data in output file
fout.write(all_data_to_insert)
pattern_found = True
# Final check
if pattern_found is False:
raise RuntimeError("No data was inserted because line was not found")
This code is for Python 3, some modifications may be needed for Python 2, especially the with statement (see contextlib.nested. If your pattern fits in one line but is not the entire line, you may use "theline in line" instead of "theline == line". If your pattern can spread on more than one line, you need a stronger algorithm. :)
To write to the same file, you can write to another file and then move the output file over the input file. I didn't plan to release this code, but I was in the same situation some days ago. So here is a class that insert content in a file between two tags and support writing on the input file: https://gist.github.com/Cilyan/8053594
Frerich Raabe...it worked perfectly for me...good one...thanks!!!
def insertAfter(haystack, needle, newText):
#""" Inserts 'newText' into 'haystack' right after 'needle'. """
i = haystack.find(needle)
return haystack[:i + len(needle)] + newText + haystack[i + len(needle):]
with open(sddraft) as f1:
tf = open("<path to your file>", 'a+')
# Read Lines in the file and replace the required content
for line in f1.readlines():
build = insertAfter(line, "<string to find in your file>", "<new value to be inserted after the string is found in your file>") # inserts value
tf.write(build)
tf.close()
f1.close()
shutil.copy("<path to the source file --> tf>", "<path to the destination where tf needs to be copied with the file name>")
Hope this helps someone:)