I have a file which consists of nearly 10000 rows of data. I want to extract some data corresponding to some element. For example
I want to extract values of xErr for say, x>22.1 and x<22.3.
The data are in a CSV file.
How can I do this?
I have tried using np.where() but I'm unsuccessful.
Use pandas and it will be very simple. Just Google how to read a CSV with pandas and then look for examples of how to filter your data frame.
There are a lot of other posts on here that cover this problem.
Related
I have two excel files which are in general ledger format, I am trying to open them as dataframes so I can do some analysis, specifically look for duplicates. I tried opening them using
pd.read_excel(r"Excelfile.xls)in pandas. The files are being read but when I use df.head() , I am getting nans for all the records and columns as well. Is there a way to load the data in general ledger format into a data frame?
This is how the dataset looks like in the Jupyter notebook
This is what the dataset looks like in excel
I am new to stack overflow and I haven't learnt its full functionality to upload part of a dataset yet
I hope the images help in describing my situation
I've been going through hours of research just to try to solve this seemingly simple issue. I'm not sure why it's been so hard to try to find. I'm trying to plot the stock data of aapl. When i extract the data from ameritrade, its a nested json dictionary. I came from matlab and found this very simple in matlab, but I am not sure how to extract the nested json out. I used pd.read_json to extract the first json but then there is still one left inside the dataframe that has data i need to plot. Any help would be greatly appreciated. Below is what they look like:
df = pd.read_json(aapldata)
And the df looks like this, I'm trying to extract the data within the 'candles' column.
Dataframe Picture Showing Candle Column
As long as there is only one level of nesting, you should be able to do this :
from pandas.io.json import json_normalize
df = json_normalize(aapldata)
source
I want to convert a CSV file of time-series data with
multiple sensors.
This is what the data currently looks like:
The different sensors are described by numbers and have different numbers of axes. If a new activity is labeled, everything below belongs to this new label. The label is in the same column as the first entry of each sensor.
This is the way I would like the data to be:
Each sensor axis has its own column and the according label is added in the last column.
So far, I have created a DataObject class to access timestamp, sensortype, sensorvalues, and the belonging parent_label for each row in the CSV.
I thought the most convenient way to solve this would be by using pandas DataFrame but simply using pd.DataFrame(timestamp, sensortype, sensorvalues, label)
won't work.
Any ideas/hints? Maybe other ways to solve this problem?
I am fairly new to programming, especially Python, so I have already run out of ideas.
Thanks in advance
Try creating a numpy matrix of the columns you require then convert them to a pandas DataFrame.
Otherwise, you can also try to import the csv using pandas from the start.
Also for the following
pd.DataFrame(timestamp, sensortype, sensorvalues, label)
try referring to the pd.concat function as well. You would need to convert each array to a DataFrame, put them in a list and then concat them with pandas.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
I want to create a "presentation ready" excel document with embedded pandas DataFrames and additional data and formatting
A typical document will include some titles and meta data, several Data Frames with sum row\column for each data frame.
The DataFrame itself should be formatted
The best thing I found was this which explains how to use pandas with XlsxWriter.
The main problem is that there's no apparent method to get the exact location of the embedded DataFrame to add the summary row below (the shape of the DataFrame is a good estimate, but it might no be exact when rendering complex DataFrames.
If there's a solution that relies on some kind of template, and not hard coding it would be even better.
I am fairly new to python and I'm not sure what the best way is to approach this issue. I have a text file that has 400 000 rows of data and each row has 21 columns. My desired task is to classify or sort the data by rows 4 and 5 which are latitude and longitude. I want the sorted data to be stored or appended to something (ie. variable or empty array) so that I can extract the sorted data for further processing. First I need to import the data and I'm not sure what the best way is to do that.
Overview:
Import text file data?
Convert to array or matrix for manipulation
Classify on certain keys
Store classified data in (bins, empty arrays, or variables)
I would highly suggest using the pandas library for this, you can easily import the file, convert it to a dataframe, and then sort it by your two rows using http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.sort.html
If you can convert you file into a csv, it will be even easier as you can do dataframe = pandas.read_csv(file). You can then resave the file with dataframe.to_csv(file)