How can I fix the CSV import issue? - python

I am trying to import this data from a CSV file
location scale
 0.90109  0.63551
0.59587  0.65525
0.80460  0.64227
0.65178  0.65198
0.76307  0.64503
0.52575  0.65915
0.41322 0.66496
0.30059 0.67022
0.21620  0.67382
0.17404 0.67552
-0.05027 0.68363
-0.0782 0.68454
Using this code.
test=[]
import csv
f=open("data.csv")
for row in csv.reader(f):
test.append(row)
But when I open the test file, I am getting some \xao encodings. Can you tell me how to fix this?
All I want to do is perform some operations on the data after importing into a variable.

Your input file appears to contain some non-breaking space characters (0xA0). Remove those from the file and try again.

Related

Picking out a specific column in a table

My goal is to import a table of astrophysical data that I have saved to my computer (obtained from matching 2 other tables in TOPCAT, if you know it), and extract certain relevant columns. I hope to then do further manipulations on these columns. I am a complete beginner in python, so I apologise for basic errors. I've done my best to try and solve my problem on my own but I'm a bit lost.
This script I have written so far:
import pandas as pd
input_file = "location\\filename"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])
The file that I'm trying to import is listed as having file type "File", in my drive. I've looked at this file in Notepad and it has a lot of descriptive bumf in the first few rows, so to try and get rid of this I've used "skiprows" as you can see. The data in the file is separated column-wise by lines--at least that's how it appears in Notepad.
The problem is when I try to extract the first column using "usecol" it instead returns what appears to be the first row in the command window, as well as a load of vertical bars between each value. I assume it is somehow not interpreting the table correctly? Not understanding what's a column and what's a row.
What I've tried: Modifying the file and saving it in a different filetype. This gives the following error:
FileNotFoundError: \[Errno 2\] No such file or directory: 'location\\filename'
Despite the fact that the new file is saved in exactly the same location.
I've tried using "pd.read_table" instead of csv, but this doesn't seem to change anything (nor does it give me an error).
When I've tried to extract multiple columns (ie "usecol=[1,2]") I get the following error:
ValueError: Usecols do not match columns, columns expected but not found: \[1, 2\]
My hope is that someone with experience can give some insight into what's likely going on to cause these problems.
Maybie you can try dataset.iloc[:,0] . With iloc you can extract the column or line you want by index(not only). [:,0] for all the lines of 1st column.
The file is incorrectly named.
I expect that you are reading a csv file or an xlsx or txt file. So the (windows) path would look similar to this:
import pandas as pd
input_file = "C:\\python\\tests\\test_csv.csv"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])
The error message tell you this:
No such file or directory: 'location\\filename'

Reading .ASC format file in python

I am dealing with a certain .asc file format which contains some data regarding weight and height. I just want to find BMI indexes of people with this data. I am not able to make sense of the dataframe formed after reading the data.
import pandas as pd
df = pd.read_table("data.asc")
I am not able to make sense of the result that I get. Please help me out
I recently had to work with a file with the extension "asc". My solution was the following:
I opened the file with a text editor and check the separator for the file, then transform it into a spreadsheet. In my case, I turned the document into a "csv" file.
After that I ran:
import pandas as pd
df = pd.read_csv('path to your file')

How to create a template for "could not convert string to float"?

Is there a way to test a CSV file for errors? For example, I have a CSV file downloaded from Kaggle. When I try to run it in Anaconda, it throws an error.
a) How do you test files before you run them for string to float errors?
b) Is there a way to set up a template to do this for all files moving forward?
Here is the text from notepad. I have converted all text to numbers and still throws an error.
My code:
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense
# load the dataset
dataset = loadtxt('data.csv', delimiter=',')
data.csv file
15,1,14,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
34,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
52,5,16,4,1,37,37,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0
46,3,21,4,0,0,0,1,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
42,3,23,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
51,3,17,6,1,34,3,0,0,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1
26,1,26,3,0,0,0,1,2,1,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
45,1,20,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0
44,3,15,0,1,1,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
44,3,26,4,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
27,1,17,3,0,0,0,1,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
45,4,14,6,0,0,0,1,10,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
44,2,25,2,0,0,0,1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
43,2,18,5,0,0,0,0,0,1,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
40,3,18,2,0,0,0,1,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Seems like certain CSV files from Kaggle & others have encoding issues.
Instead of opening the file with the default encoding (which is 'utf-8'), use 'utf-8-sig'.
dataset = loadtxt('data.csv', delimiter=',', encoding='utf-8-sig')
Once I create some code to scan for this PRIOR to running in a deep learning algo, I will post it as follow on.

Code in Python that converts .h5 files to .csv

I have a .h5 file that I need to convert to .csv and this is what I have done.
#coding: utf-8
import numpy as np
import sys
import h5py
file = h5py.File('C:/Users/Sakib/Desktop/VNP46A1.A2020086.h01v01.001.2020087082319.h5','r')
a = list(file.keys())
np.savetxt(sys.stdout, file[a[0:]], '%g', ',')
But this generates an error saying 'list' object has no attribute 'encode'
[P.S Also I have not worked with the module sys before. Where will my new csv file be written and with which name?]
First, you have a small error in the arrangement of the []
. There is no need to create a list.
Also, sys.stdout depends on your process "standard output". For an interactive process it will go to the screen. You should create a file and write to it if you want to capture the output. Also, your formatting string (%g) needs to match the data in the HDF5 dataset.
Try this:
h5f= h5py.File('C:/Users/.....h5','r')
for a in h5f.keys() :
outf = open('./save_'+a+'.txt','w')
np.savetxt(outf, file[a][:], '%g', ',')
outf.close

trying to import a excel csv (?!) file with panda

I am new to Python/Panda and I am trying to import the following file in Jupyter notebook via pd.read_
Initial file lines:
either pd.read_excel or pd.read_csv returned an error.
eliminating the first row allowed me to read the file but all csv data were not separated.
could you share the line of code you have used so far to import the data?
Maybe try this one here:
data = pd.read_csv(filename, delimiter=',')
It is always easier for people to help you if you share the relevant code accompanied by the error you are getting.

Categories