Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am trying to do some tutorial on Edx. The file I am working with is csv. I have pandas imported and I have the working directory set to where the file is store but it always says:
Files does not exist
or
Error tokenizing data. C error: Expected 1 fields in line 108, saw 3
what do I have to do in order to not put the full file path for importing in pycharm?
That is an error that can occur if your file is not comma delimited or if you have some field in your data that also contains commas. For example if you have numerical data in your file that contains commas as thousands separators.
This will fail with pd.read_csv(filename):
108
1
2
108,109,104
Likewise this will also fail pd.read_csv(filename):
108, [23]
2, [15]
3, [15, 17]
If your data is not comma separated you need to specify the separator with the sep= kwarg. For example:
some_file.csv
108|[23]
2|[15,17]
Trying to load this with pd.read_csv('some_file.csv') will fail on line 2 as it expects only one column based on the first line, and finds two values on line 2. The correct way to read this file is pd.read_csv('some_file.csv', sep='|').
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am creating excel file from referring another excel so the excel file which I created has 3 columns one is Serial number,class and student name so in student name column all names length are different but I want all names length are same means all names are displayed in proper alignment.I am using pandas module for excel creation.enter image description here
You can use apply the len function .
df['student name_length'] = df['student name'].apply(len)
You can use ljust like so:
df.name.str.ljust(20)
20 can be replaced by the longest name.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have a data csv file with first few rows are storing information.
The format is like this:
info1, aa
info2, bb
info3, cc
col1, col2, col3
x1, y1, z1
x2, y2, z2
If I use numpy.genfromtxt(), it will show error due to different columns between first three lines, and the rest.
I can use numpy.genfromtxt(skip_header=3) to read the data, and numpy.genfromtxt(skip_footer= ) to read the information.
I wonder if there is a better way to do this?
When I need a solution like this and I don't know the number of lines in the header block beforehand, I read only the first column. Then I look in that column for the blank lines, which tells me where the section boundaries are. Finally I read the full data by passing the appropriate number of lines to skip and read each time.
If the file is large and I care about efficiency, I open() it once and pass that file handle to genfromtxt() with the number of lines in each section, which means the whole operation takes just two passes over the file (because the file handle remains open, all we need to do is call readline() on it to skip blank lines between sections).
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
import nltk
file = open('SMSm.txt', 'r')
file2 = open('SMSw.txt', 'w')
for line in file.readlines():
if 'Rs' in line:
line.append(file2)
I am getting an attribute error in the last line of my code. I basically want all the lines with 'Rs' in it. Some of the lines have Rs5000 and some have format as Rs 5000. I want both the line to be appended in the new file. Any help would be appreciated.
You have your understanding about methods all mixed up.
If you want to write to a file object, then you must use the file.write() method; it is a member of the file object. Strings know nothing about files and don't care about files, so strings do not have any methods to do with files.
To add your selected lines to file2 then, you need to call file2.write(line):
for line in file.readlines():
if 'Rs' in line:
file2.write(line)
You may have gotten confused with lists; list objects do have a list.append() method.
Strings have no append() method, to concatenate strings use the + operator
"string" + "string"
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have function written by a colleague working in same field. So I know I should write script to execute python code,but issue of how the format of the input bbfile looks like bothers me.As I see fidlines read all the content,correct?My may concern is bbfile(tab delimited in my case),should it have three columns one for freq,other for breal and third for bimag?
def bbcalfunc(bbfile,nfreqlst):
fid=file(bbfile,'r')
fidlines=fid.readlines()
#define the delimiter
if bbfile.find('.txt')>=0:
delimiter='\t'
elif bbfile.find('.csv')>=0:
delimiter=','
freq=[]
breal=[]
bimag=[]
for ii in range(1,len(fidlines)):
linestr=fidlines[ii]
linestr=linestr.rstrip()
linelst=linestr.split(delimiter)
if len(linelst)>2:
freq.append(float(linelst[0]))
breal.append(float(linelst[1]))
bimag.append(float(linelst[2]))
else:
pass
freq=np.array(freq)
breal=np.array(breal)
bimag=np.array(bimag)
nfreq=np.log10(np.array(nfreqlst))
brinterp=interpolate.splrep(freq,breal)
brep=1E3*interpolate.splev(nfreq, brinterp)
biinterp=interpolate.splrep(freq,bimag)
bip=1E3*interpolate.splev(nfreq, biinterp)
return brep,bip
The format of the input file depends on the extension that you use, a .txt file will be a Tab Separated Values (tsv) file while a .csv file will be a Comma Separated Values (csv) file (please note that this is not a general convention, it is something that was decided by that colleague of yours that wrote the function, or maybe it's a local convention).
Each line of the file is usually composed by three {tab,comma} separated values, i.e., frequency, real part and imaginary part of a complex value.
I said usually composed because the code silently discards all the
lines for which the element count is less than three.
There is something here and there that can be streamlined in the code,
but it's inessential.
Rather, to answer your question re closing the file, change the first part
of the function to
def bbcalfunc(bbfile,nfreqlst):
#define the delimiter
if bbfile.find('.txt')>=0:
delimiter='\t'
elif bbfile.find('.csv')>=0:
delimiter=','
# slurp the file
with file(bbfile,'r') as fid:
fidlines=fid.readlines()
...
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Restaurant sorting. The text file scores.txt contains a series of local restaurant ratings. Each line looks like this:
Restaurant Name: Rating
I'm trying to write a program that reads the file and then spits out the ratings In alphabetical order by restaurant.
These are the contents of the scores.rtf file:
Pancho Villa:3
Andalu:3
Urbun Burger:1
El Toro:5
Casa Thai:2
Taqueria Cancun:2
Little Baobab:1
Charanga:3
Irma's Pampanga:5
Bay Blend Coffee and Tea:3
Giordano Bros:2
Two Amy's: 5
Chef Geoff: 3
I'm not sure where to start with this.
Let's think through this. You have an input with a regular format: a name and a value, separated by a colon :. You'll need to open the file and read each line, then split the line into two parts, name and value. Think about what kind of data structure would be best for holding these values. Once you've read through the file and closed it, you just need to sort your data structure alphabetically, and print out the contents. Easy enough?
import operator
with open('scores.txt') as infile:
for stuff in sorted([line.strip().split(":") for line in infile], key=lambda iGotThisFromStackOverflow: [operator.itemgetter(0)(iGotThisFromStackOverflow)][0]):
print(stuff)