Python/Pandas adding quotes to string - python

I'm using Python/Pandas to edit a csv file created by another program.
One of the columns contains values contained within duoble quotes:
"RGB(0,255,255)"
for example.
This is just how it is output by the program and I need to preserve these quotes in order for it to be read back into the program once I have edited it. Currently when I try to exporting the edited data frame to a .csv, the quotes around the values dissapear. so the values look like this:
RGB(0,255,255)
I tried adding quotes manually to the values in the column before exporting, but now the .csv file has triple quotes so looks like this:
"""RGB(0,255,255)"""
I'm not doing anything with this particular column, I literally just need it to retain the format it had before being read into my Python script. I'm assuiming there are some arguments in either my read_csv or to_csv commands but I'm not sure where to start. Any help gratefully appreciated!

Save the DataFrame as a pickle instead.
df.to_pickle('test.pkl')
# To load the dataframe again
df = pd.read_pickle('test.pkl')
This will preserve the structures!

Related

Trying to write/read CSV file with None objects for empty cells [Python]

I'm trying to read data in format CSV using pandas DataFrame so that the empty cells will be recognized as None values.
the delimiter is ',' and I have two of them wherever I need None value. for example, the row:
12345,'abc','abc',,,12,'abc'
Will be converted to a tuple and replaced to:
(12345,'abc','abc',None,None,12,'abc',)
I need it in order to insert data to MySQL later and I'm using cursor.execute() function with the query and the data
I have tried to load the CSV file to a DataFrame and replace but it is not supported:
chunk = chunk.replace(np.nan, None, regex=True)
Any suggestions?
Sorry I did not attain the meaning of the question completely but if it is in regards to CSV, why don't you have arbitrary values of your choice or even empty strings that you can then change later during the programme for when you want to write the data out or when you read.

Pandas Read csv just read a line of a row

I have a classic panda data frame made of ID and Text. I would like to get just one column and therefore i use the typical df["columnname"]. But at this point it becomes a Pandas Series. Is there a way to make a new dataframe with just that single column?
I'm asking this is because if I cast the Pandas series in a string (columnname = columnname.astype ("string")) and I save it in a text file, I see that it only saves the first sentence of each line and not the entire textual content, as I would like.
If there are any other solution, I'm open to learn :)
Try this: pd.DataFrame(dfname["columnname"])

How do I stop Excel from converting numbers to date?

I save my DataFrame as csv and try to open it in excel, problem is that excel converts some of my float data to date format. I use excel 2016.
This is how my DataFrame looks like in excel.
Does anyone have an idea how to stop this ?
You have to select the required column and then press CNT + 1 and then select the correct format. As you are saving the file as CSV, you have to repeat this action every time you open the file as CSV don't save such information and by default excel reads everything as generic format. You can find more details here
If you use Excel to open a CSV file it will attempt to interpret each cell. Something that resembles a date will be formatted as a date. Excel has the same behaviour if you type or paste something that resembles a date into a cell formatted as General.
However, if you paste the same data into a cell that has already been formatted other than General it will no longer be re-interpreted.
Format a blank Excel sheet as you expect the data to appear. Open the CSV file in a text editor such as Notepad. Copy the data then paste it into the Excel sheet.
If you aren't sure how the data should appear, for example because you aren't sure about the number of columns, you can format all of the cells as Text. That will suppress interpretation but you can change the formatting afterwards.
Incidentally, I discovered a bug in Excel that relates to this. When you add a new row to the bottom of a table it inherits the formatting of the row above, however Excel does this in the wrong order. To see this, format a table column as Text. In the row below the last row of the table, formatted General, type '1/1/2022'. Excel misinterprets this as 44562. That is because it interpreted 1/1/2022 as a date then changed the formatting to Text to match the row above.
Consequently, when applying the initial formatting you should select at least as many rows as in your CSV file. The easiest way to achieve this is simply to format entire columns.
In your particular case you probably want to pre-format certain columns as Number.

Pyspark: how to read a .csv file?

I am trying to read a .csv file that has a strange format.
This is what I am doing
df = spark.read.format('csv').option("header", "true").option("delimiter", ',').load("muyFile.csv"))
df.show(5)
I do not understand why the lonlat entry of the third id is transposed. It seems that the file has two different delimiters. Your help would be much appreciated!
your tag field probably contains comma as a value which is treated as the delimiter.
enclose your data in quotes or any other quote char(remember to set .option('quote','')) and read the data again. It should work

Pandas, remove outer quote marks from specific columns on export?

I have a specific problem, we are moving our from old to a new system. Old databse was adjusted to a new one with Pandas. However, I am facing a problem.
If file opened with SQL or Csv, it has outer quotes,
"UUID_TO_BIN('5e6f7922-8ae9-11ea-a3bd-888888888788', true)"
I need to make sure it has no upper quotes like this:
UUID_TO_BIN('5e6f7922-8ae9-11ea-a3bd-888888888788', true)
What could be pandas solution to do this for specific columns, on exporting saving to SQL or csv? Because now it's as a string and returns like this?
If your problem is that the old system produces files like .csv with the quotes, you might just want to edit the .csv file itself as described here
If your problem is that pandas saves it as a string with double quotes you can either run the same thing on the csv output of pandas, or you could pass the .to_csv() function the argument
quotechar=''
for which you can find more info on this page
Try to read your data in Pandas using:
df = pd.read_csv(filename, sep=',').replace('"','', regex=True)
This reading line should remove the " " from your data.

Categories