I want to read a file which is on my drive at location C:\Users\PITA SHIVAYA\Desktop\BIGDATA\test.txt. How can I run so that I can use this .txt file as input for above code.
import sys
for line in sys.stdin:
line = line.strip()
items = line.split(' ')
print((str)(items[2] + "\t" + items[4] + "\t" + items[6] + "\t" + items[9] + "\t1"))
Can you change the code ? You can open your file instead of using stdin
file_path = r'C:\Users\PITA SHIVAYA\Desktop\BIGDATA\test.txt'
with open(file_path, 'r') as file:
for line in file:
line = line.strip()
items = line.split(' ')
print((str)(items[2] + "\t" + items[4] + "\t" + items[6] + "\t" + items[9] + "\t1"))
Not that I use a raw string for file_path (a rbefore the opening quote), to avoid backslashes to be interpreted as special characters
Another solution is to redirect the file to stdin when you execute python:
python my_file.py < C:\Users\PITA SHIVAYA\Desktop\BIGDATA\test.txt
Related
I am trying to replace a line when a pattern (only one pattern I have in that file) found with the below code, but it replaced whole content of the file.
Could you please advise or any better way with pathlib ?
import datetime
def insert_timestamp():
""" To Update the current date in DNS files """
pattern = '; serial number'
current_day = datetime.datetime.today().strftime('%Y%m%d')
subst = "\t" + str(current_day) + "01" + " ; " + pattern
print(current_day)
with open(lab_net_file, "w+") as file:
for line in file:
file.write(line if pattern not in line else line.replace(pattern, subst))
lab_net_file = '/Users/kams/nameserver/10_15'
insert_timestamp()
What you would want to do is read the file, replace the pattern, and write to it again like this:
with open(lab_net_file, "r") as file:
read = file.read()
read = read.replace(pattern, subst)
with open(lab_net_file, "w") as file:
file.write(read)
The reason that you don't need to use if/else is because if there is no pattern inside read, then .replace won't do anything, and you don't need to worry about it. If pattern is inside read, then .replace will replace it throughout the entire string.
I am able to get the output I wanted with this block of code.
def insert_timestamp(self):
""" To Update the current date in DNS files """
pattern = re.compile(r'\s[0-9]*\s;\sserial number')
current_day = datetime.datetime.today().strftime('%Y%m%d')
subst = "\t" + str(current_day) + "01" + " ; " + 'serial number'
with open(lab_net_file, "r") as file:
reading_file = file.read()
pattern = pattern.search(reading_file).group()
reading_file = reading_file.replace(pattern, subst)
with open(lab_net_file, "w") as file:
file.write(reading_file)
Thank you #Timmy
This is my code:
org = "na"
OutputFile = open("F&FHOutput.txt", "a")
#Part 1
with open("input.txt") as file:
for line in file:
string,letter = line.strip().split(",")
print(string + "," + letter + "," + string.replace(letter, ""))
OutputFile.write(string + "," + letter + "," + string.replace(letter, ""))
#Part 2
def remove_strings_recursive(lines):
if not lines:
return ""
word,letter = lines[0].rstrip().split(',')
org = word
word = word.replace(letter, '')
print(org + "," + letter + "," + word)
OutputFile.write(org + "," + letter + "," + word)
return word + '\n' + remove_strings_recursive(lines[1:])
with open('input.txt', 'r') as file:
lines = file.readlines()
result = remove_strings_recursive(lines)
OutputFile.close()
I am trying to have it take the same things that are being printed and put them into a new file that the program creates if the file doesn't exist. Every time I run the code, everything works fine but the output file is nowhere to be found. Could someone please help? (Sorry about the messy code)
Your file name has a special character (&), which can cause problems. Try changing the file name to a more standard one.
While running the following code for converting csv to xml I'm getting index out of range error.
I used the code below a small subset of file with 16 columns it works fine but when I try it on more than 30 its giving following error
Traceback (most recent call last):
File "csv2xml.py", line 40, in <module>
+ rowData[i] + '</' + tags[i] + '>' + "\n")
IndexError: list index out of range
#!/usr/bin/python
import sys
import os
import glob
delimiter = "," # "\t" "|" # delimiter used in the CSV file(s)
# the optional command-line argument maybe a CSV file or a folder
if len(sys.argv) == 2:
arg = sys.argv[1].lower()
if arg.endswith('.csv'): # if a CSV file then convert only that file
csvFiles = [arg]
else: # if a folder path then convert all CSV files in the that folder
os.chdir(arg)
csvFiles = glob.glob('*.csv')
# if no command-line argument then convert all CSV files in the current folder
elif len(sys.argv) == 1:
csvFiles = glob.glob('*.csv')
else:
os._exit(1)
for csvFileName in csvFiles:
xmlFile = csvFileName[:-4] + '.xml'
# read the CSV file as binary data in case there are non-ASCII characters
csvFile = open(csvFileName, 'rb')
csvData = csvFile.readlines()
csvFile.close()
tags = csvData.pop(0).strip().replace(' ', '_').split(delimiter)
xmlData = open(xmlFile, 'w')
xmlData.write('<?xml version="1.0" encoding="UTF-8" ?>' + "\n")
# there must be only one top-level tag
xmlData.write('<CTS>' + "\n")
for row in csvData:
rowData = row.strip().split(delimiter)
xmlData.write('<Product>' + "\n")
for i in range(len(tags)):
xmlData.write(' ' + '<' + tags[i] + '>'
+ rowData[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</Product>' + "\n")
xmlData.write('</CTS>' + "\n")
xmlData.close()
It sounds like your for loop over the lines of data should check the length of rowData like this:
tags_length = len(tags)
for row in csvData:
rowData = row.strip().split(delimiter)
xmlData.write('<Product>' + "\n")
if len(rowData) >= tags_length:
for i in range(tags_length):
xmlData.write(' ' + '<' + tags[i] + '>'
+ rowData[i] + '</' + tags[i] + '>' + "\n")
xmlData.write('</Product>' + "\n")
I'm having some trouble getting my if statement to work properly. At the end of the iplist.txt there is an empty line that I don't want to run, but for some reason, it's still running. I tried removing the last line of the file but it only removes the last legit line and not the blank line, I even tried removing spaces, empty lines, and null lines but it still would run with a line. And the weirdest part is that the blank runs both in the if block and the else block.
with open ("iplist.txt", "r") as file:
filecontents = file.read()
for line in filecontents.split('\n'):
filename = (line) + ".txt "
command = "nmap -O -oG " + ".\\ips\\" + (filename) + (line)
print(command)
print(filename)
strlen = int (len(filename))
print(strlen)
compareline = line[:4]
print(compareline)
if compareline == beginline: #beginline is declared as 10.9 earlier in the file
print("Testing 1..2...")
os.system(command)
filenameforos = (line + ".txt")
#detailedosdetection = open(filenameforos)
#next(filecontents)
else:
print("Testing...")
del line
#next(StopIteration)
Here are the contents of iplist.txt
10.9.10.38
10.9.10.45
10.9.11.10
#extra line
edit
I tried that but it didn't run the loop, I'm sure I'm doing it wrong.
with open ("iplist.txt", "r") as file:
filecontents = file.read()
lines = file.readlines()
lines = [x.strip() for x in lines]
print("Creating list")
for line in lines:
filename = (line) + ".txt "
command = "nmap -O -oG " + ".\\ips\\" + (filename) + (line)
print(command)
print(filename)
strlen = int (len(filename))
print(strlen)
compareline = line[:4]
print(compareline)
if compareline == beginline: #beginline is declared as 10.9 earlier in the file
print("Testing 1..2...")
os.system(command)
filenameforos = (line + ".txt")
#detailedosdetection = open(filenameforos)
#next(filecontents)
else:
print("Testing...")
del line
#next(StopIteration)
Python has a method to read separate lines. Try
lines = file.readlines()
lines = [x.strip() for x in lines] #To remove unneccesary white spaces and "\n"
Instead of splitting the file on "\n"
I ran into a curious problem while parsing json objects in large text files, and the solution I found doesn't really make much sense. I was working with the following script. It copies bz2 files, unzips them, then parses each line as a json object.
import os, sys, json
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# USER INPUT
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
args = sys.argv
extractDir = outputDir = ""
if (len(args) >= 2):
extractDir = args[1]
else:
extractDir = raw_input('Directory to extract from: ')
if (len(args) >= 3):
outputDir = args[2]
else:
outputDir = raw_input('Directory to output to: ')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# RETRIEVE FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
tweetModel = [u'id', u'text', u'lang', u'created_at', u'retweeted', u'retweet_count', u'in_reply_to_user_id', u'coordinates', u'place', u'hashtags', u'in_reply_to_status_id']
filenames = next(os.walk(extractDir))[2]
for file in filenames:
if file[-4:] != ".bz2":
continue
os.system("cp " + extractDir + '/' + file + ' ' + outputDir)
os.system("bunzip2 " + outputDir + '/' + file)
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# PARSE DATA
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
input = open (outputDir + '/' + file[:-4], 'r')
output = open (outputDir + '/p_' + file[:-4], 'w+')
for line in input.readlines():
try:
tweet = json.loads(line)
for field in enumerate(tweetModel):
if tweet.has_key(field[1]) and tweet[field[1]] != None:
if field[0] != 0:
output.write('\t')
fieldData = tweet[field[1]]
if not isinstance(fieldData, unicode):
fieldData = unicode(str(fieldData), "utf-8")
output.write(fieldData.encode('utf8'))
else:
output.write('\t')
except ValueError as e:
print ("Parse Error: " + str(e))
print line
line = input.readline()
quit()
continue
print "Success! " + str(len(line))
input.flush()
output.write('\n')
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
# REMOVE OLD FILE
# =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
os.system("rm " + outputDir + '/' + file[:-4])
While reading in certain lines in the for line in input.readlines(): loop, the lines would occasionally be truncated at inconsistent locations. Since the newline character was truncated as well, it would keep reading until it found the newline character at the end of the next json object. The result was an incomplete json object followed by a complete json object, all considered one line by the parser. I could not find the reason for this issue, but I did find that changing the loop to
filedata = input.read()
for line in filedata.splitlines():
worked. Does anyone know what is going on here?
After looking at the source code for file.readlines and string.splitlines I think I see whats up. Note: This is python 2.7 source code so if you're using another version... maybe this answer pertains maybe not.
readlines uses the function Py_UniversalNewlineFread to test for a newline splitlines uses a constant STRINGLIB_ISLINEBREAK that just tests for \n or \r. I would suspect Py_UniversalNewlineFread is picking up some character in the file stream as linebreak when its not really intended as a line break, could be from the encoding.. I don't know... but when you just dump all that same data to a string the splitlines checks it against \r and \n theres no match so splitlines moves on until the real line break is encountered and you get your intended line.