Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 10 months ago.
Improve this question
I have an excel sheet and I want to extract different values from different columns into a single columns.
desired excel sheet format
I want to figure out first of all how to deal with subheaders like astro and athens grey as well as to extract information in this patterns. Thanks
sample output
I have managed to resolve the sub header issue , Now i just want help with regex to extract information in desired format.
Here is what I have done so far ,Subheaders
See if it helps:
import pandas as pd
data = pd.read_excel('Sample.xlsx')
data[data.isna().sum(axis=1)==6]
data = data.dropna(how='all')
import numpy as np
data['SKU'].astype(str).str.extract('([^\(\)]*)')[0].str.strip().replace('\d+', np.nan, regex = True).fillna(method='ffill')+' '+data['DESCRIPTION']+' '+data['SIZE'].str.extract('([^0-9x]+)').fillna('')[0]
Output:
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
i have used pandas to read the csv file already
i have some questions, is the csv file been set to be some sort of list, or do i have to store the data?
i used df = pd.read.cv bla2
Your df would be a pandas dataframe object that includes all of the data.
As others have mentioned the data will be loaded as a DataFrame. I believe the correct syntax you are after is:
df = pd.read_csv('data.csv')
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a csv file that contains the attendance of a few students on particular dates.
Here is my csv file
Name,RollNumber,Attendance,Date,Day,Time
student1,1,Present,1/30/2019,Wednesday,12:34:05
student2,2,Present,1/30/2019,Wednesday,12:34:05
student3,3,Present,1/30/2019,Wednesday,12:34:05
student4,4,Present,1/30/2019,Wednesday,12:34:05
student1,1,Absent,1/31/2019,Thursday,23:34:05
student2,2,Present,1/31/2019,Thursday,23:34:05
student3,3,Present,1/31/2019,Thursday,23:34:05
student4,4,Present,1/31/2019,Thursday,12:34:05
student1,1,Present,2/1/2019,Friday,12:34:05
student2,2,Absent,2/1/2019,Friday,12:34:05
student3,3,Absent,2/1/2019,Friday,12:34:05
student4,4,Present,2/1/2019,Friday,12:34:05
student1,1,Absent,2/2/2019,Saturday,12:34:05
student2,2,Absent,2/2/2019,Saturday,12:34:05
student3,3,Absent,2/2/2019,Saturday,12:34:05
student4,4,Absent,2/2/2019,Saturday,12:34:05
I want to plot a graph that show the number of students present and absent on each date from the csv file. How do I do this with matplotlib?
The easiest way in my opinion is to work with pandas pivot_table as follow:
df = pd.read_csv('your_csv_filepath_here')
# Create a duplicate of your target value
df['attendance'] = a.Attendance
# Pivot your dataframe
df_pivot = df.pivot_table(index=['Date'], columns='Attendance', values='attendance', aggfunc='count')
# Plot it using pandas (barplot is probably what you want)
df_pivot.plot(kind='bar')
Of course further plot customizations are possible, as well as other methods would achieve the same result
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a csv file with four columns (no header). I would like to sort the file
by the first, then second column, and store back to disk.
I can read the file in using pandas or numpy, no problem, but not sure how to sort it, and store.
just like you wanted to process:
read / parse CSV into a DF
sort DF
export DF to CSV and write it to disk
If we chain all steps together, then we don't even need to create a variable for the DataFrame...
Demo:
(pd.read_csv('/path/to/file.csv', header=None)
.sort_values([0,1])
.to_csv('/path/to/result.csv', index=False, header=None))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have some large matrices and vectors calculated in R. I want to transfer this data to Python (2.7) in order to do some further data analysis.
What is a recommended way to do this?
I am very familiar with R, but a beginner in Python.
Use write.csv(matrix, "~/filename.csv) in R and then in Python either (if you want to use pandas)
import pandas as pd
new_matrix = pd.read_csv("~/filename.csv")
or (if you want to use numpy)
import numpy as np
new_matrix = np.genfromtxt("~/filename.csv", delimiter = ",")