Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a data csv file with first few rows are storing information.
The format is like this:
info1, aa
info2, bb
info3, cc
col1, col2, col3
x1, y1, z1
x2, y2, z2
If I use numpy.genfromtxt(), it will show error due to different columns between first three lines, and the rest.
I can use numpy.genfromtxt(skip_header=3) to read the data, and numpy.genfromtxt(skip_footer= ) to read the information.
I wonder if there is a better way to do this?
When I need a solution like this and I don't know the number of lines in the header block beforehand, I read only the first column. Then I look in that column for the blank lines, which tells me where the section boundaries are. Finally I read the full data by passing the appropriate number of lines to skip and read each time.
If the file is large and I care about efficiency, I open() it once and pass that file handle to genfromtxt() with the number of lines in each section, which means the whole operation takes just two passes over the file (because the file handle remains open, all we need to do is call readline() on it to skip blank lines between sections).
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Let's say I have the following .txt file:
"StringA1","StringA2","StringA3"
"StringB1","StringB2","StringB3"
"StringC1","StringC2","StringC3"
And I want a nested list in the format:
nestedList = [["StringA1","StringA2","StringA3"],["StringB1","StringB2","StringB2"],["StringC1","StringC2","StringC3"]]
so I can access StringB2 for example like this:
nestedList[1][1]
What would be the best approach? I do not have a tremendous amount of data, maybe 100 lines at max, so I don't need a database or something
You can this sample code:
with open('file.txt') as f:
nestedList = [line.split(',') for line in f.readlines()]
print(nestedList[1][1])
file = open('a.txt').read()
file
l=[]
res = file.split('\n')
for i in range(len(res)):
l.append(res[i].split(','))
print(l[1][1])
Assuming your file name as a.txt is having data in same format as you specified in question, i.e newline seperated so that we can add nested list, and data inside is ,(comma) seperated. Above code will give you the right output.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I will receive many different words from a for loop. I need to identify if that word appeared before at loop, if not, each unique word must be saved into a txt file, if the word already appereared, then the logic must return to the for loop for the next word.
I will receive a lot of words, so, this logic needs to be light for RAM memory.
Use a set. It will prevent duplicate entries.
If you already have a list of words:
world_list = [...] # a list of words
output = set(word_list)
If you're reading from an input stream, like from a file:
output = set()
for line in f.readline():
output.add(line)
You can then write your set to a text file, just like you would with a list.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am trying to do some tutorial on Edx. The file I am working with is csv. I have pandas imported and I have the working directory set to where the file is store but it always says:
Files does not exist
or
Error tokenizing data. C error: Expected 1 fields in line 108, saw 3
what do I have to do in order to not put the full file path for importing in pycharm?
That is an error that can occur if your file is not comma delimited or if you have some field in your data that also contains commas. For example if you have numerical data in your file that contains commas as thousands separators.
This will fail with pd.read_csv(filename):
108
1
2
108,109,104
Likewise this will also fail pd.read_csv(filename):
108, [23]
2, [15]
3, [15, 17]
If your data is not comma separated you need to specify the separator with the sep= kwarg. For example:
some_file.csv
108|[23]
2|[15,17]
Trying to load this with pd.read_csv('some_file.csv') will fail on line 2 as it expects only one column based on the first line, and finds two values on line 2. The correct way to read this file is pd.read_csv('some_file.csv', sep='|').
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Any tips on how to count the amount of characters in each line of a text file, to then compare them using python?
It would be helpful to have an idea of what the end goal of your code is. What information do you want to gain from comparing the number of characters on a line? I would have written this as a comment, but it's not yet an option for me since I just joined.
If you're completely lost and don't know where to begin, here are some general bits of code to get you started (this is using Python 3.x):
file = open("YourFileName.txt", "r")
stringList = file.readlines()
The first line will open (read, hence the "r") the file in question. The second line of code goes through each line in the file and assigns them to a variable I called stringList. stringList is now a list, in which each element is a string corresponding to one line of your text file.
So,
print(stringList)
should return
['line0', 'line1', 'line2', 'line3', etc...]
It's possible that stringList could look like
['line0\n', 'line1\n', 'line2\n', 'line3\n', etc...]
depending on how your file is formatted. In case you didn't know, the '\n' is a newline character, equivalent to hitting enter on the keyboard.
From there you can create another list to hold the length of each line, and then loop through each element of stringList to store the lengths.
lengthList = []
for line in stringList:
lengthList.append(len(line))
len(line) takes the number of characters in a string and converts it to the equivalent integer value. Your lengthList will then contain how many characters are on each line, stored as ints. If there are '\n's, you may want to use len(line) - 1, depending on what you want to do with the lengths.
I hope this is helpful; I can't help with the comparisons until you provide some code and explain more specifically what you want to accomplish.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have function written by a colleague working in same field. So I know I should write script to execute python code,but issue of how the format of the input bbfile looks like bothers me.As I see fidlines read all the content,correct?My may concern is bbfile(tab delimited in my case),should it have three columns one for freq,other for breal and third for bimag?
def bbcalfunc(bbfile,nfreqlst):
fid=file(bbfile,'r')
fidlines=fid.readlines()
#define the delimiter
if bbfile.find('.txt')>=0:
delimiter='\t'
elif bbfile.find('.csv')>=0:
delimiter=','
freq=[]
breal=[]
bimag=[]
for ii in range(1,len(fidlines)):
linestr=fidlines[ii]
linestr=linestr.rstrip()
linelst=linestr.split(delimiter)
if len(linelst)>2:
freq.append(float(linelst[0]))
breal.append(float(linelst[1]))
bimag.append(float(linelst[2]))
else:
pass
freq=np.array(freq)
breal=np.array(breal)
bimag=np.array(bimag)
nfreq=np.log10(np.array(nfreqlst))
brinterp=interpolate.splrep(freq,breal)
brep=1E3*interpolate.splev(nfreq, brinterp)
biinterp=interpolate.splrep(freq,bimag)
bip=1E3*interpolate.splev(nfreq, biinterp)
return brep,bip
The format of the input file depends on the extension that you use, a .txt file will be a Tab Separated Values (tsv) file while a .csv file will be a Comma Separated Values (csv) file (please note that this is not a general convention, it is something that was decided by that colleague of yours that wrote the function, or maybe it's a local convention).
Each line of the file is usually composed by three {tab,comma} separated values, i.e., frequency, real part and imaginary part of a complex value.
I said usually composed because the code silently discards all the
lines for which the element count is less than three.
There is something here and there that can be streamlined in the code,
but it's inessential.
Rather, to answer your question re closing the file, change the first part
of the function to
def bbcalfunc(bbfile,nfreqlst):
#define the delimiter
if bbfile.find('.txt')>=0:
delimiter='\t'
elif bbfile.find('.csv')>=0:
delimiter=','
# slurp the file
with file(bbfile,'r') as fid:
fidlines=fid.readlines()
...