Extract certain values from different columns in dataframe [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 10 months ago.
Improve this question
I have an excel sheet and I want to extract different values from different columns into a single columns.
desired excel sheet format
I want to figure out first of all how to deal with subheaders like astro and athens grey as well as to extract information in this patterns. Thanks
sample output
I have managed to resolve the sub header issue , Now i just want help with regex to extract information in desired format.
Here is what I have done so far ,Subheaders

See if it helps:
import pandas as pd
data = pd.read_excel('Sample.xlsx')
data[data.isna().sum(axis=1)==6]
data = data.dropna(how='all')
import numpy as np
data['SKU'].astype(str).str.extract('([^\(\)]*)')[0].str.strip().replace('\d+', np.nan, regex = True).fillna(method='ffill')+' '+data['DESCRIPTION']+' '+data['SIZE'].str.extract('([^0-9x]+)').fillna('')[0]
Output:

Related

How can I best split this entry into searchable words? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 months ago.
Improve this question
This is a picture of my dataset this dataset is from Kaggle here the link to it: https://www.kaggle.com/code/gleblevankov/exploring-spotify-data
The type of the column is object, and I want to split/transform this column into a list with words which could be also searchable.
I do want to split the column after every "," to get the word. I am somehow searching for a function which could create the words in the column into a list of searchable words pro row. So for example if I want to plot the column to see which genre is the most used one to not see genre like "rap,pop,kpop" but more "rap" "pop" "k-pop" instead.
I tried to change the type to list but then it aggregates the whole column into a list.
Is there another possible action on how I could transform this column?
Try running this command:
import pandas as pd
pd.Series([x for item in df.Genere for x in item]).value_counts()

Read two SQL using two different dataframe and compare the two resultant datasets [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Read two SQL(both SQL query has 7 similar column structure) using two different data frame and compare the two resultant datasets whether they match.
I have tried with .equals operator but I got:
ValueError: too many values to unpack (expected 2)
I am writing the code using Python Pandas. Let me know if something like that is possible, I am new to Python any help or advice would be appreciated.
Thanks in advance.
You can check the (exact) equality of a DataFrame like this:
import pandas as pd
df1 = pd.DataFrame({1: [20], 2: [30]}) # here would be your first sql-query
df2 = pd.DataFrame({1: [20], 2: [30]}) # here would be your second sql-query
df1.equals(df2) # results in True/False

what does the pd.read.csv in python turn your data to [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
i have used pandas to read the csv file already
i have some questions, is the csv file been set to be some sort of list, or do i have to store the data?
i used df = pd.read.cv bla2
Your df would be a pandas dataframe object that includes all of the data.
As others have mentioned the data will be loaded as a DataFrame. I believe the correct syntax you are after is:
df = pd.read_csv('data.csv')

how to plot graph based on attendance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a csv file that contains the attendance of a few students on particular dates.
Here is my csv file
Name,RollNumber,Attendance,Date,Day,Time
student1,1,Present,1/30/2019,Wednesday,12:34:05
student2,2,Present,1/30/2019,Wednesday,12:34:05
student3,3,Present,1/30/2019,Wednesday,12:34:05
student4,4,Present,1/30/2019,Wednesday,12:34:05
student1,1,Absent,1/31/2019,Thursday,23:34:05
student2,2,Present,1/31/2019,Thursday,23:34:05
student3,3,Present,1/31/2019,Thursday,23:34:05
student4,4,Present,1/31/2019,Thursday,12:34:05
student1,1,Present,2/1/2019,Friday,12:34:05
student2,2,Absent,2/1/2019,Friday,12:34:05
student3,3,Absent,2/1/2019,Friday,12:34:05
student4,4,Present,2/1/2019,Friday,12:34:05
student1,1,Absent,2/2/2019,Saturday,12:34:05
student2,2,Absent,2/2/2019,Saturday,12:34:05
student3,3,Absent,2/2/2019,Saturday,12:34:05
student4,4,Absent,2/2/2019,Saturday,12:34:05
I want to plot a graph that show the number of students present and absent on each date from the csv file. How do I do this with matplotlib?
The easiest way in my opinion is to work with pandas pivot_table as follow:
df = pd.read_csv('your_csv_filepath_here')
# Create a duplicate of your target value
df['attendance'] = a.Attendance
# Pivot your dataframe
df_pivot = df.pivot_table(index=['Date'], columns='Attendance', values='attendance', aggfunc='count')
# Plot it using pandas (barplot is probably what you want)
df_pivot.plot(kind='bar')
Of course further plot customizations are possible, as well as other methods would achieve the same result

Transforming a csv data frame into an html table [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I would like to transform my dataframe into an html table. I don't know how I can do it. Please assist me
import numpy as np
import pandas as pd
from prettytable import PrettyTable
data=pd.read_csv('C:/Users/ABDILLAH/Desktop/datasets/Angular/AngularDataset.csv')
pandas actually has a pretty nifty tool function to_html that will do exactly what you're saying: convert a dataframe into an html table. The exact documentation is available here, but essentially something as follows should work, depending on your csv structure:
data.to_html()
It's simple as pandas renders a DataFrame as an HTML table.
import pandas
data = pd.read_csv("myfile.csv")
data.to_html('newFile.html')

Categories