My problem is a simple one (too simple...). I am opening a new text file via with and attempting to write each row from a pandas.DataFrame to the file. Specifically, I'm trying to place column entries at very specific character positions on each line, as that is the required format for the people receiving my file.
df represents my pandas.DataFrame in the code below.
with open(os.path.join(a_directory_var, 'folder/myfile.txt'), 'x') as file:
for index, row in df.iterrows():
file.seek(1)
file.write(row['col1'])
file.seek(56)
file.write('|')
file.seek(61)
file.write(row['col2'])
file.seek(76)
file.write('|')
file.seek(81)
file.write('col3')
file.seek(96)
file.write('|\n')
Expected Output:
I expected my last line to place a pipe, and send file to the next line with '\n', so that the next call to file.write() would begin writing entries to the next line.
Actual Output: Characters from each row being written over themselves on the first line, over and over again. It may be worth noting that the resulting text file does have an empty second line.
In summation, I'm simply trying to write to a line, go to the next, write to that line, go to the next, etc, etc.
It looks like you're trying to write a fixed-width column format, with additional | characters as separators. As that is not a simple option in Pandas (such as df.to_csv(fp, sep='|'), you have to iterate over the rows, as you do, and write them one by one. But don't write each part separately: format the lines using Python formatting.
For example, something like this should get close to what you want (give or take a slight offset due to me not counting properly):
sep = "|"
with open(os.path.join(a_directory_var, 'folder/myfile.txt'), 'x') as fp:
for index, row in df.iterrows():
fp.write("{:56s}{:15s}{:15s}{:15s}{:15s}\n".format(
row['col1'], sep, row['col2'], sep, row['col3'], sep)
Related
Because of the way the dataset I have is formatted, each hourly timestamp is written as 18 0 instead of 1800 (for example), and the extra space instead of a zero is messing up the way that Excel is converting the dataset from a TAB file to a CSV. There are > 600,000 lines, and this happens every 4th line.
see snapshot of dataset
I'm reading the text file in, reading each line, and then trying to replace the 18th character of every 4th line (the wretched space) with a 0
I think I am incorrectly understanding how to make each line a string, and also not sure how to correct the line and then re-save it into the file that will be ready to convert to a CSV
Python strings are immutable, and so they do not support item or slice assigment. You'll have to build a new string using i.e. someString[:18] + 'a' + someString[19:] or some other suitable approach, then storing it in the file again!.
I am having trouble simply saving items into a file for later reading. When I save the file, instead of listing the items as single items, it appends the data together as one long string. According to my Google searches, this should not be appending the items.
What am I doing wrong?
Code:
with open('Ped.dta','w+') as p:
p.write(str(recnum)) # Add record number to top of file
for x in range(recnum):
p.write(dte[x]) # Write date
p.write(str(stp[x])) # Write Steps number
Since you do not show your data or your output I cannot be sure. But it seems you are trying to use the write method like the print function, but there are important differences.
Most important, write does not follow its written characters with any separator (like space by default for print) or end (like \n by default for print).
Therefore there is no space between your data and steps number or between the lines because you did not write them and Python did not add them.
So add those. Try the lines
p.write(dte[x]) # Write date
p.write(' ') # space separator
p.write(str(stp[x])) # Write Steps number
p.write('\n') # line terminator
Note that I do not know the format of your "date" that is written, so you may need to convert that to text before writing it.
Now that I have the time, I'll implement #abarnert's suggestion (in a comment) and show you how to get the advantages of the print function and still write to a file. Just use the file= parameter in Python 3, or in Python 2 after executing the statement
from __future__ import print_function
Using print you can do my four lines above in one line, since print automatically adds the space separator and newline end:
print(dte[x], str(stp[x]), file=p)
This does assume that your date datum dte[x] is to be printed as text.
Try adding a newline ('\n') character at the end of your lines as you see in docs. This should solve the problem of 'listing the items as single items', but the file you create may not be greatly structured nonetheless.
For further of your google searches you may want to check serialization, as well as json and csv formats, covered in python standard library.
You question would have befited if you gave very small example of recnum variable + original f.close() is not necessary as you have a with statement, see here at SO.
I am attempting to analyze a csv file contaning a TCP stack trace. I'm checking conditionally if a line of the file contains a certain string, then adding it to the dictionary.
The strings I'm looking for are:
[SYN]
[SYN, ACK]
I have checked the file multiple times. Python can find the first string no problem, but cannot find the second. Here's the code that checks:
#variable declaration
synString = '[SYN]'
ackString = '[SYN, ACK]'
#some code
#iterate through csv:
with open('EECS325Hw3Lab3', newline = '') as captureFile:
captureReader = csv.reader(captureFile, delimiter=' ')
for row in captureReader:
#code that doesn't work.
if synString in row or ackString in row:
serverDict[currentServer].append(row)
And I know this doesn't work because when I print serverDict, I only see the [SYN] expression. What is happening here?
When you read it a with a csv.reader each row is a list of strings, and 'SYN' and 'ACK' will be consecutive elements of the list. Just use open.
for line in open(captureFile):
if synString in line or ackString in line:
serverDict[currentServer].append(line)
Each row will be a string, as you want it to be. You may want to strip the newlines, though.
I have done a regex substitution on a CSV file that prints following output, just like anything else:
H1,H2,H3
A1,GG,98
B3,KLK,Oe
But when I write it to a CSV file, it writes complete line in one cell (doesn't use commas as delimiters even though specified). I used the writer.writerow(row.split("\n")) to write, where row is the data obtained after re.sub (i.e. the output posted above).
From the docs:
A row must be a sequence of strings or numbers
You are passing a list of rows, not individual values. You have to split each row by commas:
for row in row.split('\n'):
writer.writerow(row.split(','))
I have to read data from a text file from the command line. It is not too difficult to read in each line, but I need a way to separate each part of the line.
The file contains the following in order for several hundred lines:
String (Sometimes more than 1 word)
Integer
String (Sometimes more than 1 word)
Integer
So for example the input could have:
Hello 5 Sample String 10
The current implementation I have for reading in each line is as follows... how can I modify it to separate it into what I want? I have tried splitting the line, but I always end up getting only one character of the first string this way with no integers or any part of the second string.
with open(sys.argv[1],"r") as f:
for line in f:
print(line)
The desired output would be:
Hello
5
Sample String
10
and so on for each line in the file. There could be thousands of lines in the file. I just need to separate each part so I can work with them separately.
The program can't magically split lines the way you want. You will need to read in one line at a time and parse it yourself based on the format.
Since there are two integers and an indeterminate number of (what I assume are) space-delimited words, you may be able to use a regular expression to find the integers then use them as delimiters to split up the line.