Reading csv-file in Python - python

I know this question has been asked a lot, but none of the solutions I can find seems to work.
I'm trying to read a csv in python using pandas. The csv file 'data.csv' contains 8 comma separated and no header in the format:
T,000027E7,24.56,3.41,5.03,12,1260497437.817,4,0.18
T,00006726,28.84,8.24,5.03,14,1260497437.818,4,3.62
However, when using the command below, only a single column containing all values is outputted.
import pandas as pd
data2=pd.read_csv('data.csv',header=None)
I've also tried specifying names of each column to no avail.
data2=pd.read_csv('data.csv',header=None, names=['Type','TagID','x','y','z','BatLvl','TimeStamp','Unit','DQI'])
Does anybody know of a way to solve this?

Related

Picking out a specific column in a table

My goal is to import a table of astrophysical data that I have saved to my computer (obtained from matching 2 other tables in TOPCAT, if you know it), and extract certain relevant columns. I hope to then do further manipulations on these columns. I am a complete beginner in python, so I apologise for basic errors. I've done my best to try and solve my problem on my own but I'm a bit lost.
This script I have written so far:
import pandas as pd
input_file = "location\\filename"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])
The file that I'm trying to import is listed as having file type "File", in my drive. I've looked at this file in Notepad and it has a lot of descriptive bumf in the first few rows, so to try and get rid of this I've used "skiprows" as you can see. The data in the file is separated column-wise by lines--at least that's how it appears in Notepad.
The problem is when I try to extract the first column using "usecol" it instead returns what appears to be the first row in the command window, as well as a load of vertical bars between each value. I assume it is somehow not interpreting the table correctly? Not understanding what's a column and what's a row.
What I've tried: Modifying the file and saving it in a different filetype. This gives the following error:
FileNotFoundError: \[Errno 2\] No such file or directory: 'location\\filename'
Despite the fact that the new file is saved in exactly the same location.
I've tried using "pd.read_table" instead of csv, but this doesn't seem to change anything (nor does it give me an error).
When I've tried to extract multiple columns (ie "usecol=[1,2]") I get the following error:
ValueError: Usecols do not match columns, columns expected but not found: \[1, 2\]
My hope is that someone with experience can give some insight into what's likely going on to cause these problems.
Maybie you can try dataset.iloc[:,0] . With iloc you can extract the column or line you want by index(not only). [:,0] for all the lines of 1st column.
The file is incorrectly named.
I expect that you are reading a csv file or an xlsx or txt file. So the (windows) path would look similar to this:
import pandas as pd
input_file = "C:\\python\\tests\\test_csv.csv"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])
The error message tell you this:
No such file or directory: 'location\\filename'

Replace newline chars with space in python

Am using SQL to get the data from salesforce api using python.
The output is writing into csv file.
When I tried with the below statement, to replace the newlines, its not replacing all of them. For example, for one record we are getting multiple lines of data in multiple fields, leads to lot more records, instead of actual records that to in a wrong format.
tmp.append(str(record['Description']).replace('\r\n',''))
Used this link to write the json data into csv
Any help would be appreciated.
Thanks
Venkat
Have you tried to use pandas to read the data?
you can do
import pandas as pd
pd.read_csv('filepath.csv', sep=',')
or you can do
pd.read_json('data.json', orient='records')
Do any of these work?

How do you read rows from a csv file and store it in an array using Python codes?

I have a CSV file, diseases_matrix_KNN.csv which has excel table.
Now, I would like to store all the numbers from the row like:
Hypothermia = [0,-1,0,0,0,0,0,0,0,0,0,0,0,0]
For some reason, I am unable to find a solution to this. Even though I have looked. Please let me know if I can read this type of data in the chosen form, using Python please.
most common way to work with excel is use Pandas.
Here is example:
import pandas as pd
df = pd.read_excel(filename)
print (df.iloc['Hypothermia']). # gives you such result

Fixed Width File manipulation in Pandas

I have a fixed-width file with the following format:
5678223313570888271712000000024XAXX0101010006461801325345088800.0784001501.25abc#yahoo.com
5678223324686600271712000000070XAXX0101010006461801325390998280.0784001501.25abcde.12345#gmail.com 5678123422992299
Here's what i tried :
import pandas as pd
ColSpecs = [(0,16),(16,31),(31,44),(44,62),(62,70),(70,73),(73,77),(77,127),(127,143)]
df = pd.read_fwf("~/filename.txt",colspecs=ColSpecs,Header=True)
Now this surely helps me to convert cleanly in Pandas format. However, the blank(or fixed white spaces) get trimmed off. For Eg: the Email field(#8) has 50 characters set fixed. They get truncated as soon as they're imported to Pandas dataframe.
For the data manipulation, I am creating 3 new fields that are extracted from the values of the previously imported fields.
Final Output file structure:
[(0,16),(16,31),(31,44),(44,62),(62,70),(70,73),(73,77),(77,127),(127,143),(143,153),(153,163),(164,165)]
Since, I haven't found any to_fwf method on dataframes or any other alternative for Pandas -> Flat File (keeping original lengths intact) , I would really appreciate if anyone has a better solution.
P.S. : I read that awk/sed in Unix works better, but still would like to know for Python

Pandas read csv - dealing with mixed named/nameless columns

I am trying to open a csv file using pandas.
This is a screenshot of the file opened in excel.
Some columns have names and some do not. When trying to read this in with pandas I get the "ValueError: Passed header names mismatches usecols" error.
When I open part of the file in excel, add column names, save, and then import with pandas it works.
The problem is the files are large and cannot fully open in excel (plus I'd prefer a more elegant solution anyway).
Is there a way to deal with this issue in pandas?
I have read answers to other questions regarding this error but none were relevant.
Thanks so much in advance!
In names you can provide column names:
df = pd.read_csv('pandas_dataframe_importing_csv/example.csv', names=['col1', 'col2', 'col3'], engine='python')

Categories