Lets say I have a large pandas Dataframe
df = pd.read_csv("temp.csv")
Now I want get the html representation of this dataframe but in memory, and not by writing to a file.
Is there a thing like following
html_object=df.to_html()
Writing html is incredibly slow.
Yes there is just use the code you wrote.
html_object=df.to_html()
Yes, there literally is df.to_html(). By default, this will return a string.
Related
Im developing an API which should, ideally, export a conmma-separated list as a .txt file which should look like
alphanumeric1, alphanumeric2, alphanumeric3
the data to be exported is coming from a column of a pandas dataframe, so I guess I get it, but all my attempts to get it as a single-line string literal havent worked. Instead, the text file I receive is
,ColumnHeader
0,alphanumeric1
0,alphanumeric2
0,alphanumeric3
I've tried using string literals with the backticks, writing to multiple lines, appending commas to each value in the list, but it all comes out in the form of a csv, which wont work for my purposes.
How would yall achieve this effect?
I am not sure if what you need is:
csvList = ','.join(df.ColumnHeader)
where, df is of course your pandas dataframe
I want to get the discord.user_id, I am VERY new to python and just need help getting this data.
I have tried everything and there is no clear answer online.
currently, this works to get a data point in the attributes section
pledge.relationship('patron').attribute('first_name')
You should try this :
import pandas as pd
df = pd.read_json(path_to_your/file.json)
The ourput will be a DataFrame which is a matrix, in which the json attributes will be the names of the columns. You will have to manipulate it afterwards, which is preferable, as the operations on DataFrames are optimized in terms of processing time.
Here is the official documentation, take a look.
Assuming the whole object is call myObject, you can obtain the discord.user_id by calling myObject.json_data.attributes.social_connections.discord.user_id
The conversion of xml to csv file, this is done by some code and the specifications that I have added.
As as result I get a csv file, once I open it I see some weird numbers that look something like this
1,25151E+21
Is there any way to eliminate this and show the whole numbers. The code itself that parses xml to csv is working fine so I’m assuming it is an excel thing.
I don’t want to go and do something manually every time I am generating a new csv file
Additional
The entire code can be found HERE and I have only long numbers in Quality
for qu in sn.findall('.//Qualify'):
repeated_values['qualify'] = qu.text
CSV doesn't pass any cell formatting rules to Excel. Hence if you open a CSV that has very large numbers in it, the default cell formatting will likely be Scientific. You can try changing the cell formatting to Number and if that changes the view to the entire number like you want, consider using the Xlsxwriter to apply cell formatting to the document while writing to Xlsx instead of CSV.
I often end up running a lambda on dataframes with this issue when I bring in csv, fwf, etc, for ETL and back out to XLSX. In my case they are all account numbers, so it's pretty bad when Excel helpfully overrides it to scientific notation.
If you don't mind the long number being a string, you can do this:
# First I force it to be an int column as I import everything as objects for unrelated reasons
df.thatlongnumber = df.thatlongnumber.astype(np.int64)
# Then I convert that to a string
df.thatlongnumber.apply(lambda x: '{:d}'.format(x))
Let me know if this is useful at all.
Scientific notation is a pain, what I've used before to handle situations like this is to cast it into a float and then use a format specifier, something like this should work:
a = "1,25151E+21"
print(f"{float(a.replace(',', '.')):.0f}")
>>> 1251510000000000065536
i try to implement a Persian text classifier with python, i use excel to read my data and make my data set.
i would be thankful if you have any suggestion about better implementing.
i tried this code to access to body of messages which have my conditions and store them. i took screenshot of my excel file to help more.
for example i want to store body of messages which its col "foolish" (i mean F column) have value of 1(true).
https://ibb.co/DzS1RpY "screenshot"
import pandas as pd
file='1.xlsx'
sorted=pd.read_excel(file,index_col='foolish')
var=sorted[['body']][sorted['foolish']=='1']
print(var.head())
expected result is body of rows 2,4,6,8.
try assigning like this:
df_data=df["body"][df["foolish"]==1.0]
dont use - which is a python operator instead use _ (underscore)
Also note that this will return a series.
For a dataframe , use:
df_data = pd.DataFrame(df['body'][df["foolish"]==1.0])
I'm trying to parse a JSON blob with Pandas without parsing the nested JSON structures. Here's an example of what I mean.
import json
import pandas as pd
x = json.loads('{"test":"something", "yes":{"nest":10}}')
df = pd.DataFrame(x)
When I do df.head() I get the following:
test yes
nest something 10
What I really want is ...
test yes
1 something {"nest": 10}
Any ideas on how to do this with Pandas? I have workaround ideas, but I'm parsing GBs of JSON files and do not want to be dependent on a slow for loop to convert and prep the information for Pandas. It would be great to do this efficiently while still utilizing the speed of Pandas.
Note: There's been a correction to this question to fix and an error about my reference to json objects.
I'm trying to parse a JSON blob with Pandas
No you're not. You're just constructing a DataFrame out of a plain old Python dict. That dict might have been parsed from JSON somewhere else in your code, or it may never have been JSON in the first place. It doesn't matter; either way, you're not using Pandas's JSON parsing. In fact, if you did try to construct a DataFrame directly out of a JSON string, you would get a PandasError.
If you do use Pandas parsing, you can use its options, as documented in pandas.read_json. For example:
>>> j = '{"test": "something", "yes": {"nest": 10}}'
>>> pd.read_json(j, typ='series')
test something
yes {u'nest': 10}
dtype: object
(Of course that's obviously a Series, not a DataFrame. But I'm not sure exactly what you want your DataFrame to be here…)
But if you've already parsed the JSON elsewhere, you obviously can't use Pandas's data parsing on it.
Also:
… and do not want to be dependent on a slow for loop to convert and prep the information for Pandas …
Then use, e.g., a dict comprehension, generator expression, itertools function, or something else that can do the looping in C instead of in Python.
However, I doubt that the speed of looping over the JSON strings is actually a real performance issue here, compared to the cost of parsing the JSON, building the Pandas structures, etc. Figure out what's actually taking the time by profiling, then optimize that, instead of just picking some random part of your code and hoping it makes a difference.