I wrote a code to open a CSV file with pandas
from pandas import read_csv
print(read_csv("weather_data.csv"))
But when I tried to run it in PyCharm, a completely different code that I wrote weeks ago gets executed.
I checked that I execute the right file, and even deleted the project multiple times and created it in different locations.
Another thing that confuses me is when I edit this code to:
print("hello")
from random import randint
from pandas import read_csv
print("hello")
print(randint(1,10))
print(read_csv("weather_data (1).csv"))
the output is:
hello
Pls enter a number: (-> the game i made weeks ago)
This code seems to get executed when I import packages, because the print function before works fine.
Related
My question is: does anyone recognise this Pandas Jupiter problem. I do what I always do but suddenly things are not going the way they should go....
why, why changes the type back to the original type and
why is the type after writing to csv from notebook and reloading back, changed back to the original type.....
I must be overheated or something, can't figure it out. Somehow Jupyter has a bug or I have forgotten some basic coding.... I tried this in base environment as well in a new virt Conda env. Thanks for the help / suggestions!!!
I have a csv, I import this with:
df=pd.read_csv('df_test.csv')
df.info() (only the lines needed for my question)
I make a copy of my df:
df2=df.copy()
df2.info is the same:
I want to make Opnameduur, OK TIJD and Snijtijd integers. I do this one by one:
df2.astype({'Opnameduur': 'int32'}).dtypes
This works:
Then I change OK TIJD:
df2.astype({'OK TIJD': 'int32'}).dtypes
THis works BUT.... Opnameduur changed back to float....
This is the same as I change types in one batch.
Also: when writing the df2 to CSV (with for example Opnameduur changed to int32), reloading the csv in my notebook shows with .info(), that all three are yet back again on float.
df2.info():
I have a csv file in my computer that updates automatically after every 1 minute eg. after 08:01(it updates), after 08:02(it updates) etc...
importing this file to python is easy...
import pandas as pd
myfile=pd.read_csv(r'C:\Users\HP\Desktop\levels.csv')
i want to update/re-import this file after every minute based on my pc clock/time. i want to use 'threading' since i want to run other cells while the import function is running at all times.
so basically the code might be(other suggestions are welcome):
import pandas as pd
import threading
import datetime
import time
# code to import the csv file based on pc clock automatically after every
minute.
i want this to run in a way that i can still run other functions in other cells(i tried using "schedule" but i cant run other functions after that since it shows the asterisk symbol(*))
meaning if i run on another cell the variable 'myfile'
myfile
it shows a dataframe with updated values each time.
I installed pandas through pip, but when I import it, the code runs but no output is shown at all, right after the import statement.
Here's a sample of my code
import xlrd, xlwt
print("1")
import pandas as pd
print("2")
from math import trunc
1 is printed, but 2 isn't. After 1 is printed, the script just hangs for a few seconds and terminates. This occurs regardless of the code written below the import statement. I also seem to get the same error for the openpyxl module. Does anyone know a fix to this?
In Pycharm, the console history has entries from newest (top) to oldest (bottom). Which is fine in a way, but it's horrible for copy pasting several lines of code from history.
What happens is that you get your code flow upside down when copying from history. Basically, you have to copy+paste one line at a time, at the cost of opening the history and scrolling to the desired line every single time.
It doesn't matter if you Ctrl select your lines in the order you want them to be re-entered. The console history pop-up will sort them according to the order shown (i.e., newest on top, oldest in the bottom).
Example:
Say you ran the following two lines on console
import pandas as pd
df = pd.read_csv('path_to_file')
When you look it up on history, this is what you'll see:
1 df = pd.read_csv('path_to_file')
2 import pandas as pd
So, if you select those two lines to paste it in the console or in your script, they'll be in the incorrect order, breaking down code flow.
I have searched for a way to either:
(1) invert how console history is displayed (i.e., oldest (top) to newest (bottom)).
(2) preserve selecting order (i.e, ignore position on history, order by Ctrl+click, so that in the example above I could select line #2 first, line #1 second and this order would be preserved for pasting).
Applications:
a) Rerun previously entered code slices in console;
b) copy from console history to script file.
Another option is that if you have ipython installed, then your python console will by default use ipython. And you could use the %history magic command of ipython to print out your history for copy.
c.f. http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-history
just write a short code to reverse it:
#triple string quotes over multiple lines
code= """
df = pd.read_csv('path_to_file')
import pandas as pd
""" #end of multiline quote
split_by_line = code.split("\n")
split_by_line.reverse()
print("\n".join(split_by_line))
note: I have never worked with pycharm so this maay not work properly for blocks (if, for etc)
I have a py script, lets call it MergeData.py, where I merge two data files. Since I have a lot of pairs of data files that have to be merged I thought it would be good for readability reasons to put my code in MergeData.py into a function, say merge_data(), and call this function in a loop over all my pairs of data files in a different py script.
2 Questions:
Is it wise, in terms of speed, to call the function from a different file instead of runing the code directly in the loop? (I have thousands of pairs that have to be merged.)
I thought, to use the function in MergeData.py I have to include in the head of my script from MergedData import merge_data. Within the function merge_data I make use of pandas which I import in the main file by 'import pandas as pd'. When calling the function I get the error 'NameError: global name 'pd' is not defined'. I have tried all possible places to import the pandas modul, even within the function, but the error keeps popping up. What am I doing wrong?
In MergeData.py I have
def merge_data(myFile1,myFile2):
df1 = pd.read_csv(myFile1)
df2 = pd.read_csv(myFile2)
# ... my code
and in the other file I have
import pandas as pd
from MergeData import merge_data
# then some code to get my file names followed by
FileList = zip(FileList1,FileList2)
for myFile1,myFile2 in FileList:
# Run Merging Algorithm
dataEq = merge_data(myFile1,myFile2)
I am aware of What is the best way to call a Python script from another Python script?, but cannot really see if that relates to me.
You need to move the line
import pandas as pd
Into the module in which the symbol pd is actually needed, i.e. move it out of your "other file" and into your MergeData.py file.