plotting the data from csv file in python - python

5.30-420462 | 100 | SAT-Synergy-gen2 |
| 5.30-42 | 92 | Scale |
| 5.30-423 | 90 | Scale |
| 5.30-420 | 76 | Scale |
| 5.30-420462 | 85 | Scale |
| 5.30-4205 | 88 | Scale |
| 5.30-420664 | 88 | Scale |
| 5.30-421187 | 90 | Scale |
| 5.30-421040 | 93 | Scale |
| 5.30-421225 | 100 | Scale-DCS-VET |
| 5.30-421069 | 100 | UPT_C7000 |
| 5.30-420664 | 0 | UPT_C7000 |
| 5.30-421040 | 100 | UPT_C7000 |
| 5.30-420693 | 100 | UPT_C7000 |
| 5.30-420543 | 88 | UPT_C7000 |
| 5.30-421225 | 76 | UPT_C7000 |
| 5.30-420462 | 96 | UPT_C7000 |
The above is the data from the database in the csv file. I want to use the first and second columns for plotting the graph and 3rd column will be the reference to 1st and 2nd column. Can someone help me to plot these data using pandas or any module?

Try using matplotlib for plotting data. And reading the data in with pandas. Then you could try set label and other stuff.
Reading in the data from your file -> https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Plotting the data -> https://datatofish.com/plot-dataframe-pandas/

Related

Creating multiindex from 2 rows with duplicated columns

I have an excel file that I read with pandas and convert to a dataframe. Here is a sample of the dataframe:
| | salads_count | salads_count | salads_count | carrot_counts | carrot_counts | carrot_counts |
|---------------|--------------|--------------|--------------|---------------|---------------|---------------|
| | 01.2016 | 02.2016 | 03.2016 | 01.2016 | 02.2016 | 03.2016 |
| farm_location | | | | | | |
| sweden | 42 | 41 | 43 | 52 | 51 | 53 |
It's a very weird formatting, but that's what is in the excel file. At first the 2 first rows are not even in a multiindex form.
I managed to get it into a multiindex with the code below, but some columns are duplicated (salads_count appears several times for example):
arrays = [df.columns.tolist(), df.iloc[0].tolist()]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples)
df.columns = index
I would like to convert the columns to a multiindex, something like that:
| | salads_count | | | carrot_counts | | |
|---------------|--------------|---------|---------|---------------|---------|---------|
| | 01.2016 | 02.2016 | 03.2016 | 01.2016 | 02.2016 | 03.2016 |
| farm_location | | | | | | |
| sweden | 42 | 41 | 43 | 52 | 51 | 53 |
Or even better, like that:
| | 01.2016 | | 02.2016 | | | |
|---------------|--------------|--------------|--------------|-------------|---|---|
| | carrot_count | salads_count | carrot_count | salad_count | | |
| farm_location | | | | | | |
| sweden | 52 | 42 | 51 | 41 | | |
How can I do this?
The best is convert columns to MultiIndex in read_excel by parameter header=[0,1]:
df = pd.read_excel(file, header=[0,1])
Then use swaplevel with sort_index:
df = df.swaplevel(0,1, axis=1).sort_index(axis=1, level=0)

How to create duplicate rows based on columns?

Consider this data frame
order number | Item | column 0 | column 1 | Column 2
12 | [abcd][efgh] | [abcd | [efgh] |
34 | [mnop] | | [mnop] | |
56 | [xyzz][zzyx][mnoq] | [xyzz] | [zzyx] | [mnoq]
How do I turn it into?
order number | Item | column 0 |
12 | [abcd][efgh] | [abcd |
12 | [abcd][efgh] | [efgh] |
34 | [mnop] | | [mnop] |
56 | [xyzz][zzyx][mnoq] | [xyzz] |
56 | [xyzz][zzyx][mnoq] | [zzyx] |
56 | [xyzz][zzyx][mnoq] | [mnoq] |
This is my first time posting on stackoverflow so apologies for any mistakes. I've tried searching the blogs but have not any luck with this kind of problem. Any help is really appreciated

Generating multiple csv files from a list in pandas, python

I'm trying to create a new dataframe for each possible combination in 'combinations' reading in some values from a dataframe, an example of the dataframe:
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Species | OGT | Domain | A | C | D | E | F | G | H | I | K | L | M | N | P | Q | R | S | T | V | W | Y |
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| Aeropyrum pernix | 95 | Archaea | 9.7659115711 | 0.6720465616 | 4.3895390781 | 7.6501943794 | 2.9344881615 | 8.8666657183 | 1.5011817208 | 5.6901432494 | 4.1428307243 | 11.0604191603 | 2.21143353 | 1.9387130928 | 5.1038552753 | 1.6855017182 | 7.7664358772 | 6.266067034 | 4.2052190807 | 9.2692433532 | 1.318690698 | 3.5614200159 |
| Argobacterium fabrum | 26 | Bacteria | 11.5698896021 | 0.7985475923 | 5.5884500155 | 5.8165463343 | 4.0512504104 | 8.2643271309 | 2.0116736244 | 5.7962804605 | 3.8931525401 | 9.9250463349 | 2.5980609708 | 2.9846761128 | 4.7828063605 | 3.1262365491 | 6.5684282943 | 5.9454781844 | 5.3740045968 | 7.3382308193 | 1.2519739683 | 2.3149400984 |
| Anaeromyxobacter dehalogenans | 27 | Bacteria | 16.0337898849 | 0.8860252895 | 5.1368827707 | 6.1864992608 | 2.9730203513 | 9.3167603253 | 1.9360386851 | 2.940143349 | 2.3473650439 | 10.898494736 | 1.6343905351 | 1.5247123262 | 6.3580285706 | 2.4715303021 | 9.2639057482 | 4.1890063803 | 4.3992339725 | 8.3885969061 | 1.2890166336 | 1.8265589289 |
| Aquifex aeolicus | 85 | Bacteria | 5.8730327277 | 0.795341216 | 4.3287799008 | 9.6746388172 | 5.1386954322 | 6.7148035486 | 1.5438364179 | 7.3358775924 | 9.4641440609 | 10.5736658776 | 1.9263080969 | 3.6183861236 | 4.0518679067 | 2.0493569604 | 4.9229955632 | 4.7976564501 | 4.2005259246 | 7.9169763709 | 0.9292167138 | 4.1438942987 |
| Archaeoglobus fulgidus | 83 | Archaea | 7.8742687687 | 1.1695110027 | 4.9165979364 | 8.9548767369 | 4.568636662 | 7.2640358917 | 1.4998752909 | 7.2472039919 | 6.8957233203 | 9.4826333048 | 2.6014466253 | 3.206476915 | 3.8419576418 | 1.7789787933 | 5.7572748236 | 5.4763351139 | 4.1490633048 | 8.6330814159 | 1.0325605451 | 3.6494619148 |
+-------------------------------+-----+----------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+---------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
Here is my code at the moment.
import itertools
import pandas as pd
letters = ['G','A','L','M','F','W','K','Q','E','S','P','V','I','C','Y','H','R','N','D','T']
combinations = [''.join(i) for j in range(1,len(letters) + 1) for i in itertools.combinations(letters,r=j)]
df = pd.read_csv('COMPLETECOPYFORR.csv')
for combination in combinations:
new_df = df[['Species', 'OGT']]
new_df['Sum of percentage'] = df[list(combination)]
new_df.to_csv(combination + '.csv')
The desired output is something along the lines of 10 million CSV files, each with the name of the different combinations, so
G.csv, A.csv, through to GALMFWKQESPVICYHRNDT.csv
Species OGT Sum of percentage
------------------------------- ----- -------------------
Aeropyrum pernix 95 23.4353
Anaeromyxobacter dehalogenans 26 20.3232
Argobacterium fabrum 27 14.2312
Aquifex aeolicus 85 15.0403
Archaeoglobus fulgidus 83 34.0532
It looks like need:
new_df['Sum of percentage'] = df[list(combination)].sum(axis=1)

pandas column shift with day 0 value as 0

I've got a pandas dataframe(pivoted) like customer_name, current_date, current_day_count
+----------+--------------+-------------------+
| customer | current_date | current_day_count |
+----------+--------------+-------------------+
| Mark | 2018_02_06 | 15 |
| | 2018_02_09 | 42 |
| | 2018_02_12 | 33 |
| | 2018_02_21 | 82 |
| | 2018_02_27 | 72 |
| Bob | 2018_02_02 | 76 |
| | 2018_02_23 | 11 |
| | 2018_03_04 | 59 |
| | 2018_03_13 | 68 |
| Shawn | 2018_02_11 | 71 |
| | 2018_02_15 | 39 |
| | 2018_02_18 | 65 |
| | 2018_02_24 | 38 |
+----------+--------------+-------------------+
Now, I want another new column with previous_day_counts for each customer but the first day of the customer's previous day value should be 0 something like this customer, current_date, current_day_count, previous_day_count (with first day value as 0)
+----------+--------------+-------------------+--------------------+
| customer | current_date | current_day_count | previous_day_count |
+----------+--------------+-------------------+--------------------+
| Mark | 2018_02_06 | 15 | 0 |
| | 2018_02_09 | 42 | 33 |
| | 2018_02_12 | 33 | 82 |
| | 2018_02_21 | 82 | 72 |
| | 2018_02_27 | 72 | 0 |
| Bob | 2018_02_02 | 76 | 0 |
| | 2018_02_23 | 11 | 59 |
| | 2018_03_04 | 59 | 68 |
| | 2018_03_13 | 68 | 0 |
| Shawn | 2018_02_11 | 71 | 0 |
| | 2018_02_15 | 39 | 65 |
| | 2018_02_18 | 65 | 38 |
| | 2018_02_24 | 38 | 0 |
+----------+--------------+-------------------+--------------------+
Try this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'name': ['Mark','Mark','Mark','Mark','Bob','Bob','Bob','Bob'], 'current_day_count': [18,28,29,10,19,92,7,43]})
df['previous_day_count'] = df.groupby('name')['current_day_count'].shift(-1)
df.loc[df.groupby('name',as_index=False).head(1).index,'previous_day_count'] = np.nan
df['previous_day_count'].fillna(0, inplace=True)

Parsing out indeces and values from pandas multi index dataframe

I have a dataframe in a similar format to this:
+--------+--------+----------+------+------+------+------+
| | | | | day1 | day2 | day3 |
+--------+--------+----------+------+------+------+------+
| id_one | id_two | id_three | date | | | |
| 18273 | 50 | 1 | 3 | 9 | 11 | 3 |
| | | | 4 | 26 | 27 | 68 |
| | | | 5 | 92 | 25 | 4 |
| | | | 6 | 60 | 72 | 83 |
| | 60 | 2 | 5 | 69 | 93 | 84 |
| | | | 6 | 69 | 30 | 12 |
| | | | 7 | 65 | 65 | 59 |
| | | | 8 | 57 | 88 | 59 |
| | 70 | 3 | 5 | 22 | 95 | 7 |
| | | | 6 | 40 | 24 | 20 |
| | | | 7 | 73 | 81 | 57 |
| | | | 8 | 43 | 8 | 66 |
+--------+--------+----------+------+------+------+------+
I am trying to create tuple that contains id_one, id_two and the values that each grouping contains.
To test this, I am simply trying to print the ids and values like this:
for id_two, data in df.head(100).groupby(level='id_two'):
print id_two, data.values.ravel()
Which gives me the id_two and the data exactly as it should.
I am running into problems when I try and incorporate id_one. I tried this, but was met with an error ValueError: need more than 2 values to unpack
for id_one, id_two, data in df.head(100).groupby(level='id_two'):
print id_one, id_two, data.values.ravel()
How can I print id_one, id_two and the data?
You can pass a list of columns into the level parameter:
df.head.groupby(level=['id_one', 'id_two'])

Categories