Pandas Read Irregular Data From Clipboard - python

I am trying to copy data from this CSV file and read it via Pandas.read_clipboard().
I keep getting this error:
ParserError: Error tokenizing data. C error: Expected 5 fields in line 6, saw 7
Is it possible to read in data like this? It works with read_csv (encoding='latin-1') but not read_clipboard.
Thanks in advance!

Let's use skiprows=6 parameter, to ignore the data at the top of the file which looks like a header and not part of the core dataframe:
df = pd.read_clipboard(sep='\t', skiprows=6)

Related

Pandas Error Tokenizing but when trouble shooting the data points are not separated

When trying to open the file it gives me ParserError: Error tokenizing data. C error: Expected 1 fields in line 8, saw 22152
What I tried:
mat = pd.read_csv('/Users/csb/Desktop/Zebrafish_scRNA/sample_Control_WTA_1_RSEC_MolsPerCell.csv', sep = '\t')
Output:
Instead of data points in each separate columns it got all combined
Output I want is:
The first 7 rows of Output dataframe dropped. Want to automate this since there will be several files which comes in that Output format.
How can I resolve this issue and automate it to get the output that I want to achieve?

Unable to read raw file view in dataframe as structured view

I am reading a .csv file into Databricks, but when I read the file I display the result as what is shown in the .csv file - along with the pipe characters, with everything being displayed in one column. This allows me to work on the data. However, I am now trying to take this raw view and read the data as a structured table.
The data that I am displaying in the raw format, in my dataframe is as follows:
|Name|Surname|Age|Gender|
|John|Doe|32|M
|Lisa|Doe|53|F
I would like to take the above and have my output as follows:
|Name|Surname|Age|Gender|
|----|-------|---|------|
|John|Doe|32|M
|Lisa|Doe|53|F
The following is what I do to get the initial output in my dataframe:
df = rdd_df.toDF()
df = df.withColumn('Line', df['_1'].getItem("_c0"))
df.show()
I would appreciate any help.

Comma in numbers causing problem reading csv

Upon calling a csv file I am getting the following error
ParserError: Error tokenizing data. C error: Expected 1 fields in line 12, saw 2
I opened my csv file and then went to the line and saw that the error is coming because one of the numbers is with decimals but separated by a cooma.
That entire column of my csv file has whole numbers but also decimals numbers that look like the following .
385433,4
Not sure how I can resolve this error when reading the csv file using pandas
It sounds like you have European-formatted CSV. Since you haven't provided a real sample of your CSV as requested, I will guess. If this doesn't solve your issue, edit your question to provide an actual sample:
Given test.csv:
c1;c2;c3
1,2;3,4;5,6
3,4;5,6;7,8
Then:
import pandas as pd
data = pd.read_csv('test.csv',decimal=',',delimiter=';')
print(data)
Produces:
c1 c2 c3
0 1.2 3.4 5.6
1 3.4 5.6 7.8

trying to import a excel csv (?!) file with panda

I am new to Python/Panda and I am trying to import the following file in Jupyter notebook via pd.read_
Initial file lines:
either pd.read_excel or pd.read_csv returned an error.
eliminating the first row allowed me to read the file but all csv data were not separated.
could you share the line of code you have used so far to import the data?
Maybe try this one here:
data = pd.read_csv(filename, delimiter=',')
It is always easier for people to help you if you share the relevant code accompanied by the error you are getting.

Error When Reading CSV With C Engine

I have a large data file that I'm trying to read into a Pandas Dataframe.
If I try to read it using the following code:
df = pd.read_csv(file_name,
sep='|',
compression='gzip',
skiprows=54,
comment='#',
names=column_names,
header=None,
usecols=column_numbers,
engine='python',
nrows=15347,
na_values=["None", " "])
It works perfectly, but not quickly. If I try to use the C engine to speed the import up though, I get an error message:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 0 fields in line 55, saw 205
It looks like something is going wrong when I change the engine, and the parser isn't figuring out how many\which columns it should be using. What I can't figure out is why. None of the input arguments are only supported by the Python engine.
The problem only occurred after I upgraded from version 14.1 to 16.0.
I can't attach a copy of the data, because it contains confidential information.

Categories