So I have this code here which I am sure is trivial to my question as it is just question about readlines in general:
lines = file_in.readlines()
self.inputText.insert(1.0, lines,)
If I were to read in a text file, it would write like this to a text file
['Example Text'\n']
or something to that nature instead of what we really want which is:
Example Text
How do I avoid this problem?
readlines() gives you a list of lines. When you try to turn a list as a string, it gives you brackets and commas and the repr (the for-a-programmer-representation, rather than the str, the for-an-end-user-representation, which is why you get the quotes and the \n instead of a newline) of each of its elements. That's not what you want.
You could fix it up after the fact, or add each of the lines one by one, but there's a much simpler way: just use read(), which gives you the whole file as one big string:
contents = file_in.read()
self.inputText.insert(1.0, contents,)
On top of being less code, and harder to get wrong, this means you're not making Python waste time splitting the contents up into a bunch of separate lines just so you can put them back together.
As a side note, there's almost never a good reason to call readlines().
readlines returns a list of lines. To insert this into a text widget you can join those items with a newline, like so:
lines = file_in.readlines()
self.inputText.insert("1.0", "\n".join(lines))
Related
I'm aware this is a much discussed topic and even though there are similar questions I haven't found one that covers my particular case.
I have a csv file that is as follows:
alarm_id,alarm_incident_id,alarm_sitename,alarm_additionalinfo,alarm_summary
"XXXXXXX","XXXXXXXXX","XXXXX|4G_Availability_Issues","TTN-XXXX","XXXXXXX;[{"severity":"CRITICAL","formula":"${XXXXX} < 85"}];[{"name":"XXXXX","value":"0","updateTimestamp":"Oct 27, 2021, 2:00:00 PM"}];[{"coName":{"XXXX/XXX":"MRBTS-XXXX","LNCEL":"XXXXXX","LNBTS":"XXXXXXX"}}]||"
It has more lines but this is the trouble line. If you notice, the fifth field has within it several quotes and commas, which is also the separator. The quotes are also single instead of double quotes which are normally used to signal a quote character that should be kept in the field. What this is doing is splitting this last field into several when reading with pandas.read_csv() method, which throws an error of extra fields. I've tried several configurations and parameters regarding quoting in pandas.read_csv() but none works...
The csv is badly formatted, I just wanted to know if there is a way to still read it, even if using a roundabout way or it really is just hopeless.
Edit: This can happen to more than one column and I never know in which column(s) this may happen
Thank you for your help.
I think I've got what you're looking for, at least I hope.
You can read the file as regular, creating a list of the lines in the csv file.
Then iterate through the lines variable and split it into 4 parts, since you have 4 columns in the csv.
with open("test.csv", "r") as f:
lines = f.readlines()
for item in lines:
new_ls = item.strip().split(",", 4)
for new_item in new_ls:
print(new_item)
Now you can iterate through each lines' column item and do whatever you have/want to do.
If all your lines fields are consistently enclosed in quotes, you can try to split the line on ",", and to remove the initial and terminating quote. The current line is correctly separated with:
row = line.strip('"').split('","', 4)
But because of the incorrect formatting of your initial file, you will have to manually control it matches all the lines...
Can't post a comment so just making a post:
One option is to escape the internal quotes / commas, or use a regex.
Also, pandas.read_csv has a quoting parameter where you can adjust how it reacts to quotes, which might be useful.
I am having trouble simply saving items into a file for later reading. When I save the file, instead of listing the items as single items, it appends the data together as one long string. According to my Google searches, this should not be appending the items.
What am I doing wrong?
Code:
with open('Ped.dta','w+') as p:
p.write(str(recnum)) # Add record number to top of file
for x in range(recnum):
p.write(dte[x]) # Write date
p.write(str(stp[x])) # Write Steps number
Since you do not show your data or your output I cannot be sure. But it seems you are trying to use the write method like the print function, but there are important differences.
Most important, write does not follow its written characters with any separator (like space by default for print) or end (like \n by default for print).
Therefore there is no space between your data and steps number or between the lines because you did not write them and Python did not add them.
So add those. Try the lines
p.write(dte[x]) # Write date
p.write(' ') # space separator
p.write(str(stp[x])) # Write Steps number
p.write('\n') # line terminator
Note that I do not know the format of your "date" that is written, so you may need to convert that to text before writing it.
Now that I have the time, I'll implement #abarnert's suggestion (in a comment) and show you how to get the advantages of the print function and still write to a file. Just use the file= parameter in Python 3, or in Python 2 after executing the statement
from __future__ import print_function
Using print you can do my four lines above in one line, since print automatically adds the space separator and newline end:
print(dte[x], str(stp[x]), file=p)
This does assume that your date datum dte[x] is to be printed as text.
Try adding a newline ('\n') character at the end of your lines as you see in docs. This should solve the problem of 'listing the items as single items', but the file you create may not be greatly structured nonetheless.
For further of your google searches you may want to check serialization, as well as json and csv formats, covered in python standard library.
You question would have befited if you gave very small example of recnum variable + original f.close() is not necessary as you have a with statement, see here at SO.
I am new to programming and have already checked other people's questions to make sure that I am using a good method to replace tabs with spaces, know my regex is correct, and also understand what exactly my error is ("Unhashable type 'list'). But even still, I'm at a loss of what to do. Any help would be great!
I have a large file that I have broken up into lines. Ultimately I will need to access the first 3 elements of each line. Currently when I print a line, without the additional re.sub line of code, I get something like this: ['blah\tblah\tblah'], when I want ['blah blah blah'].
My code to do this is
f = open(text.txt)
raw = f.read()
raw = raw.lower()
lines = raw.splitlines()
lines = re.sub(r'\t', lines, '\s')
print lines[0:2] #just to see the first few examples
f.close()
When I print the first few lines without the regex sub bit, it works fine. And then when I add that line in attempt to change the lines, I get the error. I understand that lists are changeable and thus can't be a hashed... but I'm not trying to work with a hash. I'm just trying to replace \t with \s in a large text file to make the program easier to work with. I don't think there is a problem with how I am changing \t's to \s's, because according to this error, any way I change it will break my code. What do I do?! Any help is super appreciated. :')
You need to change the order of params present inside the re.sub function. And also note that you can't use regex \s as a second param in re.sub function. Syntax of re.sub must be re.sub(regex,replacement,string) .
lines = raw.splitlines()
lines = [re.sub(r'\t', ' ', line) for line in lines]
raw.splitlines() returns a list which was then assigned to a variable called lines. So you need to apply the re.sub function to each item present in the list, since re.sub won't directly be applied on a list.
I am trying to convert a multiline string to a single list which should be possible using splitlines() but for some reason it continues to convert each line into a list instead of processing all the lines at once. I tried to do it out of the for loop but doesnt seem to have any effect. I need the lines as a single list to use it another function. Below is how I get the multiline into a single variable. What am I missing???
multiline_string_final = []
for match_multiline in re.finditer(r'(^(\w+):\sThis particular string\s*|This particular string\s*)\{\s(\w+)\s\{(.*?)\}', string, re.DOTALL):
multi_line_string = match_multiline.group(4)
print multiline_string
This last print statement prints out the strings like this:
blah=0; blah_blah=1; Foo=3;
blah=4; blah_blah=5; Foo=0;
However I need:
['blah=0; blah_blah=1; Foo=3;''blah=4; blah_blah=5; Foo=0;']
I understand it has to be something with the finditer but cant seem to rectify.
Your new problem also has nothing to do with finditer. (Also, your code is still not an MCVE, you still haven't shown us the sample input data, etc., making it harder to help you.)
From this desired output:
['blah=0; blah_blah=1; Foo=3;''blah=4; blah_blah=5; Foo=0;']
I'm pretty sure what you're looking for is to get a list of the matches, instead of printing out each match on its own. That isn't a valid list, because it's missing the comma between the elements,* but I'll assume that's a typo from you making up data instead of building an MCVE and copying and pasting the real output.
Anyway, to get a list, you have to build a list. Printing things to the screen doesn't build anything. So, try this:
multiline_string_final.append(multiline_string)
Then, at the end—not inside the loop, only after the loop has finished—you can print that out:
print multiline_string_final
And it'll look like this:
['blah=0; blah_blah=1; Foo=3;',
'blah=4; blah_blah=5; Foo=0;']
* Actually, it is a valid list, because adjacent strings get concatenated… but it's not the string you wanted, and not a format Python would ever print out for you.
The problem has nothing to do with the finditer, it's that you're doing the wrong thing:
for line in multiline_string:
print multiline_string.splitlines()
If multiline_string really is a multiline string, then for line in multiline_string will iterate over the characters of that string.
Then, within the loop, you completely ignore line anyway, and instead print multiline_string.splitlines()).
So, if multiline_string is this:
abc
def
Then you'll print ['abc\n', 'def\n'] 8 times in a row. That's not what you want (or what you described).
What you want to do is:
split the string into lines
loop over those lines, not over the original un-split string
print each line, not the whole thing
So:
for line in multiline_string.splitlines():
print line
I have a function that can only accept strings. (it creates the image with the string, but the string has little formatting and no word wrapping, so a long string will just bleed right through the edge of the image and keep going into the abyss, when in reality I would have liked it to create a paragraph, instead of a one line infinity).
I need it print with line breaks. Currently the file is being readin using
inputFiles.readlines()
so that this reads the entire file. Storing file.readLines() creates a list. So this list cannot be passed to my function looking for a string.
I used
inputFileContent = ' \n'.join(inputFiles.readLines())
in an attempt to force hard line breaks into the string between each list item. This does not work (edit: elaboration here) which means that the inputFileContent string does not have line breaks even though I put '\n' between the list elements. From my understanding, the readLines() function puts the individual lines into individual elements of a list.
any suggestions? Thank you
Use inputFiles.read() which creates a string. Does that help?
The 'join' should have worked. Your problem may be that the writing of the string ignores newline characters. You could maybe try '\r\n'.join(...)