How to extract data after specific string in csv files of pandas - python

I want to read data after specific string in csv file of pandas i know this can be acheive through indexing but data length is changing every time how do i acheive it by using pandas ?

Related

Skipping rows and columns when reading csv with Pandas

I need help about read csv file with pandas.
I have a .csv file that recorded machine parameters and want to read this excel with pandas and analyze. But problem is this excel file not in a proper table format. That means there are a lot of empty rows and columns. Also parameter values are starting from 301st line (example).
How can I read as properly this csv file?
You can use skiprows:
pd.read_csv(csv_file, skiprows=301)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

How do I convert the following .csv data into bi-grams?

I have some data in the .csv format file given in the link below.
https://drive.google.com/file/d/1kBtK-uBhZEyCMQ2ndHpQ1Rqd6LZ3sVJJ/view?usp=sharing
I have converted it into a pandas dataframe. My question is how do I convert it into bi-grams as in a pandas dataframe fo bigrams?
(Usually we use [i:i+n] for text but here I am dealing with columns)
Picture of the pandas dataframe I currently to make it easier for you

how to make pandas not read display values from csv?

I have a date column in csv file like as shown below
23/6/2011 7:00
21/4/1998 05:00
17/02/1990
11/01/1985 30:30:01
26/02/1976
45:42:7
But the problem here is, when I double click the rows in csv, the actual date value is correctly displayed 15/02/2010 10:30:00` etc.
My csv looks like as below
But I cannot do this manually because you can imagine, I have 20-30 csv files and there are lot of rows like this.
So, when I read the column in pandas dataframe and apply datetime function like below,
df['Date'] = pd.to_datetime(df['Date'])
ParserError: hour must be in 0..23: 55:45.0
But how can I make pandas read the actual value and not csv display value?
I tried changing the format in excel csv file but that doesn't help
Basically I want pandas to read the double clicked value from csv but not the display value?

How to read an excel file with nested columns in pandas

Using Pandas, I'm trying to read an excel file that looks like the following:
another sample
I tried to read the excel file using the regular approach by running: df = pd.read_excel('filename.xlsx', skiprows=6).
But the problem with it is that I don't get all the columns names needed and most of the column names are Unnamed:1
Is there a way to solve these and read all the columns? Or an approach were I can convert it to a json file

Creating Arrays from cvs files in python

So I have a data file, which i must extract specific data from. Using;
x=15 #need a way for code to assess how many lines to skip from given data
maxcol=2000 #need a way to find final row in data
data=numpy.genfromtxt('data.dat.csv',skip_header=x,delimiter=',')
column_one=data[0;max,0]
column_two=data[0:max,1]
this gives me an array for the specific case where there are (x=)15 lines of metadata above the required data and where the number of rows of data is (maxcol=)2000. In what way do I go about changing the code to satisfy any value for x and maxcol?
Use pandas. Its read_csv function does all that you want (I don't include its equivalent of delimiter, sep=',', because comma-delimited is the default):
import pandas as pd
data = pd.read_csv('data.dat.csv', skiprows=x, nrows=maxcol)
If you really want that as a numpy array, you can do this:
data = data.values
But you can probably just leave it as a pandas DataFrame.

Categories