Concatanate Lists in one DataFrame

Concatanate Lists in one DataFrame - python

I have the following lists:
rBruta = [76843339.93, 68564200.34, 114946898.37, 75687842.36, 34530505.68, 116481217.14, 95696528.10000002, 40015273.68, 33416618.4, 34530505.68, 33416618.4, 81118744.08]
rLiquida = [417648532.25, 362509251.24, 410746539.59, 365572296.03, 335338029.26, 416780171.86, 423577376.06, 385353312.36, 380507243.23, 404170649.16, 380269620.17, 426637510.38
rEmpres = [1169415.89, 1015025.9, 1150090.31, 1023602.43, 938946.48, 1166984.48, 1186016.65, 1078989.27, 1065420.28, 1131677.82, 1064754.94, 1194585.03
And i need to concatenate those 3 lists into 1 single DataFrame. Like stacking one in another.
I tried to transform each on into a dataframe Column, then, used the T for transpose the columns.
Worked, but i have 16 lists to concatenate with different names.

The below should do the work
df = pd.DataFrame([rBruta, rLiquida, ... all lists], columns = ["a1", "a2", ... "a12"])
# as you have 12 columns in your data

Related

Check for existence of data from two Dataframe's columns in List

I'm searching for difference between columns in DataFrame and a data in List.
I'm doing it this way:
# pickled_data => list of dics
pickled_names = [d['company'] for d in pickled_data] # get values from dictionary to list
diff = df[~df['company_name'].isin(pickled_names)]
which works fine, but I realized that I need to check not only for company_name but also for place, because there could be two companies with the same name.
df contains also column place as well as pickled_data contains place key in the dictionary.
I would like to be able to do something like this
pickled_data = [(d['company'], d['place']) for d in pickled_data]
diff = df[~df['company_name', 'place'].isin(pickled_data)] # For two values in same row

You can convert values to MultiIndex by MultiIndex.from_tuples, then convert both columns too and compare:
pickled_data = [(d['company'], d['place']) for d in pickled_data]
mux = pd.MultiIndex.from_tuples(pickled_data)
diff = df[~df.set_index(['company_name', 'place']).index.isin(mux)]
Sample:
data = {'company_name':['A1','A2','A2','A1','A1','A3'],
'place':list('sdasas')}
df = pd.DataFrame(data)
pickled_data = [('A1','s'),('A2','d')]
mux = pd.MultiIndex.from_tuples(pickled_data)
diff = df[~df.set_index(['company_name', 'place']).index.isin(mux)]
print (diff)
company_name place
2 A2 a
4 A1 a
5 A3 s

You can form a set of tuples from your pickled_data for faster lookup later, then using a list comprehension over company_name and place columns of the frame, we get a boolean list of whether they are in the frame or not. Then we use this to index into the frame:
comps_and_places = set((d["company"], d["place"]) for d in pickled_data)
not_in_list = [(c, p) not in comps_and_places
for c, p in zip(df.company_name, df.place)]
diff = df[not_in_list]

Create a dataframe from a dictionary with multiple keys and values

So I have a dictionary with 20 keys, all structured like so (same length):
{'head': X Y Z
0 -0.203363 1.554352 1.102800
1 -0.203410 1.554336 1.103019
2 -0.203449 1.554318 1.103236
3 -0.203475 1.554299 1.103446
4 -0.203484 1.554278 1.103648
... ... ... ...
7441 -0.223008 1.542740 0.598634
7442 -0.222734 1.542608 0.599076
7443 -0.222466 1.542475 0.599520
7444 -0.222207 1.542346 0.599956
7445 -0.221962 1.542225 0.600375
I'm trying to convert this dictionary to a dataframe, but I'm having trouble with getting the output I want. What I want is a dataframe structured like so: columns = [headX, headY, headZ etc.] and rows being the 0-7445 rows.
Is that possible? I've tried:
df = pd.DataFrame.from_dict(mydict, orient="columns")
And different variations of that, but can't get the desired output.
Any help will be great!
EDIT: The output I want has 60 columns in total, i.e. from each of the 20 keys, I want an X, Y, Z for each of them. So columns would be: [key1X, key1Y, key1Z, key2X, key2Y, key2Z, ...]. So the dataframe will be 60 columns x 7446 rows.

Use concat with axis=1 and then flatten Multiindex by f-strings:
df = pd.concat(d, axis=1)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')

How to split a column into many columns where the name of this columns change

I defined a data frame into a "function" where the name of each column in the dataframes changes continuously so I can't specify the name of this column and then split it to many columns. For example, I can't say df ['name'] and then split it into many columns. The number of columns and rows of this dataframes is not constant. I need to split any column contains more than one item to many components (columns).
For example:
This is one of the dataframes which I have:
name/one name/three
(192.26949,) (435.54,436.65,87.3,5432)
(189.4033245,) (45.51,56.612, 54253.543, 54.321)
(184.4593252,) (45.58,56.6412,654.876,765.66543)
I want to convert it to:
name/one name/three1 name/three2 name/three3 name/three4
192.26949 435.54 436.65 87.3 5432
189.4033245 45.51 56.612 54253.543 54.321
184.4593252 45.58 56.6412 654.876 765.66543

Solution if all data are tuples in all rows and all columns use concat with DataFrame constructor and DataFrame.add_prefix:
df = pd.concat([pd.DataFrame(df[c].tolist()).add_prefix(c) for c in df.columns], axis=1)
print (df)
name/one0 name/three0 name/three1 name/three2 name/three3
0 192.269490 435.54 436.6500 87.300 5432.00000
1 189.403324 45.51 56.6120 54253.543 54.32100
2 184.459325 45.58 56.6412 654.876 765.66543
If possible string repr of tuples:
import ast
L = [pd.DataFrame([ast.literal_eval(y) for y in df[c]]).add_prefix(c) for c in df.columns]
df = pd.concat(L, axis=1)
print (df)
name/one0 name/three0 name/three1 name/three2 name/three3
0 192.269490 435.54 436.6500 87.300 5432.00000
1 189.403324 45.51 56.6120 54253.543 54.32100
2 184.459325 45.58 56.6412 654.876 765.66543

Separate column data with a comma to two columns for dataframe

The data set I pulled from an API return looks like this:
([['Date', 'Value']],
[[['2019-08-31', 445000.0],
['2019-07-31', 450000.0],
['2019-06-30', 450000.0]]])
I'm trying to create a DataFrame with two columns from the data:
Date & Value
Here's what I've tried:
df = pd.DataFrame(city_data, index =['a', 'b'], columns =['Names'] .
['Names1'])
city_data[['Date','Value']] =
city_data['Date'].str.split(',',expand=True)
city_data
city_data.append({"header": column_value,
"Value": date_value})
city_data = pd.DataFrame()
This code was used to create the dataset. I pulled the lists from the API return:
column_value = data["dataset"]["column_names"]
date_value = data["dataset"]["data"]
city_data = ([column_value], [date_value])
city_data
Instead of creating a dataframe with two columns from the data, in most cases I get the "TypeError: list indices must be integers or slices, not str"

is it what you are looking for:
d = ([['Date', 'Value']],
[[['2019-08-31', 445000.0],
['2019-07-31', 450000.0],
['2019-06-30', 450000.0]]])
pd.DataFrame(d[1][0], columns=d[0][0])
return:

Python: Merging/joining two dataframes

I'm trying to merge/join two dataframes, each with three keys (Age, Gender and Signed_In). Both dataframes have the same parent and were created by groupby, but have unique value columns.
It seems like the merge/join should be painless given the unique combined keys are shared across both dataframes. Thinking there must be some simple error with my attempt at 'merge' and 'join' but can't for the life of me resolve it.
times = pd.read_csv('nytimes.csv')
# Produces times_mean table consisting of two value columns, avg_impressions and avg_clicks
times_mean = times.groupby(['Age','Gender','Signed_In']).mean()
times_mean.columns = ['avg_impressions', 'avg_clicks']
# Produces times_max table consisting of two value columns, max_impressions and max_clicks
times_max = times.groupby(['Age','Gender','Signed_In']).max()
times_max.columns = ['max_impressions', 'max_clicks']
# Following intended to produce combined table with four value columns
times_join = times_mean.join(times_max, on = ['Age', 'Gender', 'Signed_In'])
times_join2 = pd.merge(times_mean, times_max, on=['Age', 'Gender', 'Signed_In'])

You don't need to the on kwarg when joining on equivalently structured MultiIndex
Here's an example demonstrating this:
import numpy as np
import pandas
a = np.random.normal(size=10)
b = a + 10
index = pandas.MultiIndex.from_product([['A', 'B'], list('abcde')])
df_a = pandas.DataFrame(a, index=index, columns=['colA'])
df_b = pandas.DataFrame(b, index=index, columns=['colB'])
df_a.join(df_b)
Which gives me:
colA colB
A a -1.525376 8.474624
b 0.778333 10.778333
c 1.153172 11.153172
d 0.966560 10.966560
e 0.089765 10.089765
B a 0.717717 10.717717
b 0.305545 10.305545
c 0.123548 10.123548
d -1.018660 8.981340
e -0.635103 9.364897

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Concatanate Lists in one DataFrame - python

The below should do the work df = pd.DataFrame([rBruta, rLiquida, ... all lists], columns = ["a1", "a2", ... "a12"]) # as you have 12 columns in your data

Related

Check for existence of data from two Dataframe's columns in List

Create a dataframe from a dictionary with multiple keys and values

How to split a column into many columns where the name of this columns change

Separate column data with a comma to two columns for dataframe

Python: Merging/joining two dataframes

Categories

Resources