Split Multiline/Wrapped Cell Contents Into Separated Rows In Python - python

I have a data which consist of wrapped cell content
Wrapped data cell
My aim is to split the wrapped cell into multiple rows but yet retaining the UID through filling which identify the customer that buy the fruits and vegetables. To achieve something like this.
Multi-Row
Previously, I have tried using lambda to split the cell contents but it didn't work. Been stuck for quite awhile already would be glad if someone could just give me some guidance on this.

Related

Merging multiple excel files into a master file using python with out any repeated values

I have multiple excel files with different columns and some of them have same columns with additional data added as additional columns. I created a masterfile which contain all the column headers from each excel file and now I want to export data from individual excel files into the masterfile. Ideally, each row representing all the information about one single item.
I tried merging and concatenating the files, it adds all the data as new rows so, now I have some columns with repeated data but they also contain additional data in different columns.
What I want now is to recognize the columns that are already present and fill in the new data instead of repeating the all columns using python. I cannot share the data or the code so, looking for some help or idea to get this done. Any help would be appreciated, Thanks in advance!
You are probably merging the wrong way.
Not sure about your masterfile, sounds not very intuitive.
Make sure your rows have a specific ID that identifies it.
Then always perform the merge with that id and the 'inner' merge type.

Why is Python/Pandas "losing" thousands of rows of data despite apparently knowing they are there?

I am trying to work with a (British) police data set via an API regarding stop and search incidents.
I'm using Jupyter Notebook. I have imported the data via an API and then output a local copy to a CSV for reference purposes (this contains the full 6249 rows). I know that the dataset is 6249 rows in length too (see image), but when I enter the name of the dataset into a cell and run it to check it, it shows the first and last 5 rows as normal, but the tail end is telling me the dataset runs to only 541 rows?
I tried to output all of the values in one of the columns to a list to see if it would enter all 6250 to the list, but when I checked it has only put 541 in there....It's almost like Pandas knows these thousands of "missing" rows are there but is "ignoring" them for some reason?
I have dropped certain columns from the dataframe, but no rows and when I run the output to CSV function, the "Missing" rows are all present in the CSV file.
I tried using the display max rows function but it didn't work, it still only displays 541 rows:
pd.set_option('display.max_rows', None)
I've tried running the .shape function and that is telling me that there are 6249 rows as well:
I've been working for hours on this but can't seem to find anything like this that has happened to others. It basically means that I cannot continue with my project because Pandas is refusing to recognise most of my data.
Any help or suggestions would be really appreciated!
Many thanks,
T

Reading survey data CSV with multiple selection sub-columns?

I would like to import this data from a Navigraph survey results.
https://navigraph.com/blog/survey2022
The dataset is here:
https://download.navigraph.com/docs/flightsim-community-survey-by-navigraph-2022-data.zip
However, I noticed the structure is something I'm not quite used to, and perhaps this is how a lot of polling data is shared. The semicolons being separators is not an issue. It's the fact there's a mix of "select multiple" responses as columns. The tidiest thing is starting at the third row, each row is a single respondent.
How can I clean up this data so it is as "tidy" as possible? How would I melt() these columns into rows? How do I handle the multiple selection responses in the sub-columns?
I'd like the questions and responses to simply be two columns respectively.
Hello how are you? I don't have full knowledge in this type of work but I believe you will have to:
1- Read the file as is
2- Concatenate the columns of questions and answers
3- Create the dataset that will be used
I believe that pandas has some commands that will help you, just find the patterns to define what are "questions" and "answers" in this dataset.

How do I create a column with certain conditions built in (not the same as a conditional column) in python

I've attached a screenshot of a table in excel, but I'm doing this in pythonenter image description here
I'm trying to recreate the column "predict" in python, I have the other columns already. I am trying to get the first row of "predict" to be equal to the first row of "ytd" and then for every value following that one, I want it to be the result of the "nc" value multiplied by the previous value in the "predict" column. It doesn't have to be done in this particular order or in this way, I just want that to be the end result, and any clear help to achieve that would be much appreciated. I feel like there should be a way to do this with conditionals, but I am struggling to find the right combination of information.
Have you got any code in Python? Is the information there or are you reading the information from the excel file, and then printing out or saving to the file? I didn't quite understand the question.

A CNN that takes several rows from a CSV file as a single input

I have extracted facial features from several videos in the form of facial action units(AU) using an open face. These features span for several seconds and hence take several rows in a CSV file (each row containing AU data for each frame of the video). Originally, I had multiple CSV files as input for CNN but, as advised by others, I have concatenated and condensed the data into a single file. My CSV columns look like this:
Filename | Label | the other columns contain AU related data
Filename contains individual "ID" that helps keep track of a single "example". The label column contains 2 possible values. Either "yes" or "no". I'm also considering to add a "Frames" column to keep track of frame number for a certain "example".
The most likely scenario is that I will require some form of a 3DCNN but so far, the only codes or help that I found for 3DCNN are specific to videos while I require code for either a CSV file or various CSV files. I've been unable to find any code that can help me out in this scenario. Can someone please help me out? I have no idea how/where to move forward.

Categories