Pandas Cant Read Timestamp Column in csv File - python

I'm working on a quite big file and with using the pd.read_csv command I get the following results:
I've already tried converter{1:str} and setting dtype without any luck.I used print (data.to_string()) and got the following:
I cant reformat the file but when I tried using only a small portion of it everything worked fine.

Related

Python reads only one column from my CSV file

first post here.
I am very new to programmation, sorry if it is confused.
I made a database by collecting multiple different data online. All these data are in one xlsx file (each data a column), that I converted in csv afterwards because my teacher only showed us how to use csv file in Python.
I installed pandas and make it read my csv file, but it seems it doesnt understand that I have multiple columns, it reads one column. Thus, I can't get the info on each data (and so i can't transform the data).
I tried df.info() and df.info(verbose=True, show_counts=True) but it makes the same thing
len(df.columns) = 1 which proves it doesnt see that each data has its own column
len(df) = 1923, which is right
I was expecting that : https://imgur.com/a/UROKtxN (different project, not the same database)
database used: https://imgur.com/a/Wl1tsYb
And I have that instead : https://imgur.com/a/iV38YNe
database used: https://imgur.com/a/VefFrL4
idk, it looks pretty similar, why doesn't work :((
Thanks.

Picking out a specific column in a table

My goal is to import a table of astrophysical data that I have saved to my computer (obtained from matching 2 other tables in TOPCAT, if you know it), and extract certain relevant columns. I hope to then do further manipulations on these columns. I am a complete beginner in python, so I apologise for basic errors. I've done my best to try and solve my problem on my own but I'm a bit lost.
This script I have written so far:
import pandas as pd
input_file = "location\\filename"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])
The file that I'm trying to import is listed as having file type "File", in my drive. I've looked at this file in Notepad and it has a lot of descriptive bumf in the first few rows, so to try and get rid of this I've used "skiprows" as you can see. The data in the file is separated column-wise by lines--at least that's how it appears in Notepad.
The problem is when I try to extract the first column using "usecol" it instead returns what appears to be the first row in the command window, as well as a load of vertical bars between each value. I assume it is somehow not interpreting the table correctly? Not understanding what's a column and what's a row.
What I've tried: Modifying the file and saving it in a different filetype. This gives the following error:
FileNotFoundError: \[Errno 2\] No such file or directory: 'location\\filename'
Despite the fact that the new file is saved in exactly the same location.
I've tried using "pd.read_table" instead of csv, but this doesn't seem to change anything (nor does it give me an error).
When I've tried to extract multiple columns (ie "usecol=[1,2]") I get the following error:
ValueError: Usecols do not match columns, columns expected but not found: \[1, 2\]
My hope is that someone with experience can give some insight into what's likely going on to cause these problems.
Maybie you can try dataset.iloc[:,0] . With iloc you can extract the column or line you want by index(not only). [:,0] for all the lines of 1st column.
The file is incorrectly named.
I expect that you are reading a csv file or an xlsx or txt file. So the (windows) path would look similar to this:
import pandas as pd
input_file = "C:\\python\\tests\\test_csv.csv"
dataset = pd.read_csv(input_file,skiprows=12,usecols=[1])
The error message tell you this:
No such file or directory: 'location\\filename'

Getting an error code in Python that df is not defined

I am new to data, so after a few lessons on importing data in python, I tried the following codes in my jupter notebook but keep getting an error saying df not defined. I need help.
The code I wrote is as follows;
import pandas as pd
url = "https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv"
df = pd.read_csv(https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv)
After running the third code, I got a series of reports on jupter notebook but one that stood out was "df not defined"
The problem here is that your data is a ZIP file containing multiple CSV files. You need to download the data, unpack the ZIP file, and then read one CSV file at a time.
If you can give more details on the problem(etc: screenshots), debugging will become more easier
One possibility for the error is that the response content accessed by the url(https://api.worldbank.org/v2/en/indicator/SH.TBS.INCD?downloadformat=csv) is a zip file, which may prevent pandas from processing it further.

Reading csv-file in Python

I know this question has been asked a lot, but none of the solutions I can find seems to work.
I'm trying to read a csv in python using pandas. The csv file 'data.csv' contains 8 comma separated and no header in the format:
T,000027E7,24.56,3.41,5.03,12,1260497437.817,4,0.18
T,00006726,28.84,8.24,5.03,14,1260497437.818,4,3.62
However, when using the command below, only a single column containing all values is outputted.
import pandas as pd
data2=pd.read_csv('data.csv',header=None)
I've also tried specifying names of each column to no avail.
data2=pd.read_csv('data.csv',header=None, names=['Type','TagID','x','y','z','BatLvl','TimeStamp','Unit','DQI'])
Does anybody know of a way to solve this?

Export SAS lib to csv with correct date format (in CSV file)

I use:
Python 3.7
SAS v7.1 Eterprise
I want to export some data (from library) from SAS to CSV. After that I want to import this CSV to Pandas Dataframe and use it.
I have problem, because when I export data from SAS with this code:
proc export data=LIB.NAME
outfile='path\to\export\file.csv'
dbms=csv
replace;
run;
Every column were exported correctly instead of Column with Date. In SAS I see something like:
06NOV2018
16APR2018
and so on... In CSV it looks the same. But if i import this CSV to DataFrame, unfortunatelly, Python see the column with date as Object/string instead of date type.
So here is my question. How Can I export whole library to CSV from SAS with correct type of column (ecpessially column with Date). Maybe I should convert something before Export? Plz help me with this, In SAS I'm new, i want to just import Data from it and use it in Python.
Before you write something, keep in mind, that I had tried with pandas read_sas function, but during this command I've got such Exception with error:
df1 = pd.read_sas(path)
ValueError: Unexpected non-zero end_of_first_byte Exception ignored
in: 'pandas.io.sas._sas.Parser.process_byte_array_with_data' Traceback
(most recent call last): File "pandas\io\sas\sas.pyx", line 31, in
pandas.io.sas._sas.rle_decompress
I put fillna function and show the same error :/
df = pd.DataFrame.fillna((pd.read_sas(path)), value="")
I tried with sas7bdat module in Python, but I've got the same error.
Then I tried with sas7bdat_converter module. But CSV has the same values in Date column, so problem with dtype will arrive after convert csv to DataFrame.
Have you got any sugestions? I've spent 2 days tried to figure it out, but without any positive results :/
Regarding the read_sas error, a Git issue has been reported but closed for lack of reproducible example. However, I can easily import SAS data files with Pandas using .sas7bdat files generated from SAS 9.4 base (possibly the v7.1 Enterprise is the issue).
However, consider using parse_dates argument of read_csv as it can convert your date DDMMMYY format to datetime during import. No change needed with your SAS exported dataset.
sas_df = pd.read_csv(r"path\to\export\file.csv", parse_dates = ['DATE_COLUMN'])

Categories