Integrating Python monorepo application into Yocto - python

A little context before the actual questions. As a project that I have worked on grew, the number of repositories used for the application we're developing grew as well. To the point that it's gotten unbearable to keep making modifications to the codebase: when something is changed in the core library in one repository we need to make adjustments to other Python middleware projects that use this library. Whereas this sounds not that terrifying, managing this in Yocto has become a burden: on every version bump of the Python library we need to bump all of the dependant projects as well. After some thinking, we decided to try a monorepo approach to deal with the complexity. I'm omitting the logic behind choosing this approach but I can delve into it if you think this is wrong.
1) Phase 1 was easy. Just bring every component of our middleware to one repository. Right now we have a big repository with more than git 20 submodules. Submodules will be gone once we finish the transition but as the project doesn't stop while we're doing the transition, submodules were chosen to track the changes and keep the monorepo up to date. Every submodule is a Python code with a setup.py/Pipfile et al. that does the bootstrapping.
2) Phase 2 is to integrate everything in Yocto. It has turned out to be more difficult than I had anticipated.
Right now we have this:
Application monorepo
Pipfile
python-library1/setup.py
python-library1/Makefile
python-library1/python-library1/<code>
python-library2/setup.py
python-library2/Makefile
python-library2/python-library2/<code>
...
python-libraryN/setup.py
python-libraryN/Makefile
python-libraryN/python-libraryN/<code>
Naturally, right now we have python-library1_<some_version>.bb, python-library2_<some_other_version>.bb in Yocto. Now we want to get rid of this version hell and stick to monorepo versions so that we could have just monorepo_1.0.bb that is to be updated when anything changes in any part.
Right now I have this monorepo_1.0.bb (actually, would like to have as this approach doesn't work)
SRCREV=<hash>
require monorepo.inc
monorepo.inc is as follows:
SRC_URI=<gitsm://the location of the monorepo,branch=...,>
S = "${WORKDIR}/git"
require python-library1.inc
require python-library2.inc
...
require python-libraryN.inc
python-libraryN.inc:
S = "${WORKDIR}/git/python-libraryN"
inherit setuptools
This approach doesn't work for multiple reasons. Basically every new require will override ${S} to another git/python-library and the only package that will be built is the last one.
Even though I know that this approach won't end up being the final solution I just wanted to tell you what I've strived for: very easy updates. The only thing that I need to do is just update SRCREV (or ${PV} in the future when this will be deployed) and the whole stack will be brought up to date.
So, well, the question is how to structure the Yocto recipes for Python monorepo manipulation?
P.S.
1) The monorepo structure is not set in stone so if you have any suggestions, I'm more than open to any criticism
2) The .inc file structure might not be suitable for another reason. This stack is deployed across a dozen of different devices and some of those python-library_N.bb have .bbappends in other layers that are custom for the devices. I mean that some of them might require different components of the system installed or some configs and files/ modifications.
Any suggestion on how to deal with the complexity of a big Python application in Yocto will be welcome. Thanks in advance.

I'm currently looking into the same issue as you.
I found that the recipe for the boost library also builds several packages from a "monorepo".
This recipe can be found in the poky layer:
http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-support/boost
Look especially into the boost.inc file, which builds the actual package list, while the other files define both SRC_URI and various patches.
Hope that helps !

Related

How do I track and reuse small projects I've written across many large projects I'm writing?

I have written a single, monolithic project (let's call it Monolith.py) over the course of about 3 years in python. During that time I have created a lot of large and small utilities that would be useful in other projects.
Here are a couple simple examples:
Color.py: A small script that easily allows me to essentially colorize text. I use this a lot for print().
OExplorer.py: A larger script that is a CLI object explorer that I can use to causally browse classes and objects interactively.
Stuff like that. There's probably 20 of them, mostly modules. They are all under constant development.
My question is, what is the best way to use those in another project and keeping everything up to date?
I am using visual studio code and I use it to handle all my git stuff. I'm guessing there's a way to like nested git folders? I'm worried about screwing up my main repo.
I also do not want to separate out these projects into their own vscode workspace. It is better if they are left where they are. I also do not want to pull in all the code from the monolith for another project, that would be silly.
I think there is a simple git solution here. I would appreciate it if someone can give me some direction here and hand hold a bit, git is clearly not my strong suit.

Central control of importing different versions of script as I save up

As I improve my code each time, for big structural changes in flow and other features I save up and add a new number...
so when I use the code it looks like this:
import custom3 as c
function = c.do_thing()
as I save up to custom4, I change it to
import custom4 as c
function = c.do_thing()
very simple update.
My problem is that I have many scripts where i'm using import custom# as c so when I update the version number, I have to go back and change the number everywhere.
Is there a way to centrally control this? Basically dynamically importing a library using another script? I guess I can use something like modules = map(__import__, moduleNames) and keep a spreadsheet of latest version? And write a script to access that file first every time?
Has anybody implemented anything else more elegant?
The way to do this that the pros use is not to create different modules for different versions, but to use a version control system to manage and track changes to the same module.
A good version control system will do the following:
allow you to keep and view a history of changes to your module
allow you to mark your versions with a meaningful annotation e.g.
"develop", "release"
allow you to recover from mistakes and revert back to another earlier
version without having to rewrite code
allow you to share your work with other developers.
There are many version control systems available, some are proprietary licensed, but others available free. Git is probably the most popular open source system at the moment, and can scale from a lone developer to a large team. Plus there is already a whole ecosystem of code sharing available with Github.
As you learn programming, take the time to learn and use version control. You won't regret it.
You can use importlib.
import importlib
version = "3"
c = importlib.import_module("custom"+version)
function = c.do_thing()
But yeah, as suggested in the comments, use some file versioning system like git. The learning curve is a bit steep, but it would make your life a lot easier.

rule of thumb to group/split your own functions/classes into modules

Context
I write my own library for data analysis purpose. It includes classes to import data from the server, procedures to clean, analyze and display results. It also includes functions to compare results.
Concerns
I put all these in a single module and import it when I do a new project.
I put any newly-developed classes/functions in the same file.
I have concerns that my module becomes longer and harder to browse and explain.
Questions
I started Python six months ago and want to know common practices:
How do you group your function/classes and put them into separated files?
By Purpose? By project? By class/function?
Or you are not doing it at all?
In general how many lines of code in a single module?
What's the way to track the dependency among your own libraries?
Feel free to suggest any thoughts.
I believe the best way to answer this question is to look at what the leaders in this field are doing. There is a very healthy eco-system of modules available on pypi whose authors have wrestled with this question. Take a look at some of the modules you use frequently and thus are already installed on your system. Or better yet, many of those modules have their development versions hosted on GitHub (The pypi page usually has a pointer). Go there and look around.

pyCharm: safe refactoring information for application on depending code

If I do refactoring in a library pyCharm does handle all depending applications which are known to the current running pyCharm instance.
But code which is not known to the current pyCharm does not get updated.
Is there a way to store the refactoring information in version control, so that depending applications can be updated if they get the update to the new version of the library?
Use Case:
class Server:
pass
gets renamed to
class ServerConnection:
pass
If a team mate updates the code of my library, his usage of Server needs to be changed to ServerConnection.
It would be very nice if pyCharm (or an other tool) could help my team mate to update his code automatically.
As far as I can tell this is not possible neither with a vanilla PyCharm nor with a plugin nor with a 3rd party tool.
It is not mentioned in the official documentation
There is no such plugin in the JetBrains Plugin Repositories
If PyCharm writes refactoring information to it's internal logs, you could build this yourself (but would you really want to?)
I am also not aware of any python specific refactorig tool that does that. You can check for yourself: there is another SO question for the most popular refactoring tools
But ...
I am sure there are reasons why your situation is like it is - there always are good reasons (and most of the time the terms 'historic and 'grown' turn up in explanations of these reasons) but I still feel obligated to point out what qarma already mentioned in his comment: the fact that you want to do something like replaying a refactoring on a different code base points towards a problem that should be solved in a different way.
Alternative 1: introduce an API
If you have different pieces of software that depend on each other on such a deep level, it might be a good idea to define an API that decouples the code bases from each others internals. With an API it is clear which parts have to be stable. If changes have to be done on the API level they must be communicated and coordinated with the involved teams.
Alternative 2: Make it what it actually is: one code base
If A1 for whatever reason is not possible I would conclude that you actually have one system distributed over different code bases and then those should be merged into one code base. Different teams can still work on the same code base (hopefully using a DVCS) but global refactorings can be done with tooling help and they reach all parts of the system.
Alternative 3: Make these refactorings in PyCharm over all involved code bases
Even if you can't merge them into one code base you could combine them easily in PyCharm by loading different projects into the same Window. I do this without problems with two git projects that have to be in different repositories but still share certain aspects. PyCharm handles commits to these repositories transparently: if you make changes in several repositories and commit them you write one commit message and the commits will be done to all repositories.

Git: Master-thesis subprojects as submodules or stand-alone repositories

I just started using git to get my the code I write for my Master-thesis more organized. I have divided the tasks into 4 sub-folders, each one containing data and programs that work with that data. The 4 sub-projects do not necessarily need to be connected, none off the programs contained use functions from the other sub-projects. However the output-files produced by the programs in a certain sub-folder are used by programs of another sub-folder.
In addition some programs are written in Bash and some in Python.
I use git in combination with bitbucket. I am really new to the whole concept, so I wonder if I should create one "Master-thesis" repository or rather one repository for each of the (until now) 4 sub-projects. Thank you for your help!
Well, as devnull says, answers would be highly opinion based, but given that I disagree that that's a bad thing, I'll go ahead and answer if I can type before someone closes the question. :)
I'm always inclined to treat git repositories as separate units of work or projects. If I'm likely to work on various parts of something as a single project or toward a common goal (e.g., Master's thesis), my tendency would be to treat it as a single repository.
And by the way, since the .git repository will be in the root of that single repository, if you need to spin off a piece of your work later and track it separately, you can always create a new repository if needed at that point. Meantime it seems "keep it simple" would mean one repo.
I recommend a single master repository for this problem. You mentioned that the output files of certain programs are used as input to the others. These programs may not have run-time dependencies on each other, but they do have dependencies. It sounds like they will not work without each other being present to create the data. Especially if file location (e.g. relative path) is important, then a single repository will help you keep them better organized.

Categories