summaryrefslogtreecommitdiff
blob: d84b62d18b44b569bb2cfa755b9b02254e8f3e23 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
---
GLEP: 47
Title: Creating 'safe' environment variables
Author: Diego Pettenò,
        Fabian Groffen
Type: Standards Track
Status: Rejected
Version: 1
Created: 2005-10-14
Last-Modified: 2014-01-23
Post-History: 2006-02-09
Content-Type: text/x-rst
---


Credits
=======

The text of this GLEP is a result of a discussion and input of the
following persons, in no particular order: Mike Frysinger, Diego
Pettenò, Fabian Groffen and Finn Thain.


Abstract
========

In order for ebuilds and eclasses to be able to make host specific
decisions, it is necessary to have a number of environmental variables
which allow for such decisions.  This GLEP introduces some measures that
need to be made to make these decisions 'safe', by making sure the
variables the decisions are based on are 'safe'.  A small overlap with
GLEP 22 [1]_ is being handled in this GLEP where the use of 2-tuple
keywords are being kept instead of 4-tuple keywords.  Additionally, the
``ELIBC``, ``KERNEL`` and ``ARCH`` get auto filled starting from
``CHOST`` and the 2-tuple keyword, instead of solely from they 4-tuple
keyword as proposed in GLEP 22.
   
The destiny of the ``USERLAND`` variable is out of the scope of this
GLEP.  Depending on its presence in the tree, it may be decided to set
this variable the same way we propose to set ``ELIBC``, ``KERNEL`` and
``ARCH``, or alternatively, e.g. via the profiles.


Motivation
==========

The Gentoo/Alt project is in an emerging state to get ready to serve a
plethora of 'alternative' configurations such as FreeBSD, NetBSD,
DragonflyBSD, GNU/kFreeBSD, Mac OS X, (Open)Darwin, (Open)Solaris and so
on.  As such, the project is in need for a better grip on the actual
host being built on.  This information on the host environment is
necessary to make proper (automated) decisions on settings that are
highly dependant on the build environment, such as platform or C-library
implementation.


Rationale
=========

Gentoo's unique Portage system allows easy installation of applications
from source packages.  Compiling sources is prone to many environmental
settings and availability of certain tools.  Only recently the Gentoo
for FreeBSD project has started, as second Gentoo project that operates
on a foreign host operating system using foreign (non-GNU) C-libraries
and userland utilities.  Such projects suffer from the current implicit
assumption made within Gentoo Portage's ebuilds that there is a single
type of operating system, C-libraries and system utilities.  In order to
enable ebuilds -- and also eclasses -- to be aware of these
environmental differences, information regarding it should be supplied.
Since decisions based on this information can be vital, it is of high
importance that this information can be trusted and the values can be
considered 'safe' and correct.


Backwards Compatibility
=======================

The proposed keywording scheme in this GLEP is fully compatible with the
current situation of the portage tree, this in contrast to GLEP 22.  The
variables provided by GLEP 22 can't be extracted from the new keyword,
but since GLEP 22-style keywords aren't in the tree at the moment, that
is not a problem.  The same information can be extracted from the CHOST
variable, if necessary.  No modifications to ebuilds will have to be
made.


Specification
=============

Unlike GLEP 22 the currently used keyword scheme is not changed.
Instead of proposing a 4-tuple [2]_ keyword, a 2-tuple keyword is chosen
for archs that require them.  Archs for which a 1-tuple keyword
suffices, can keep that keyword.  Since this doesn't change anything to
the current situation in the tree, it is considered to be a big
advantage over the 4-tuple keyword from GLEP 22.  This GLEP is an
official specification of the syntax of the keyword.

Keywords will consist out of two parts separated by a hyphen ('-').  The
part up to the first hyphen from the left of the keyword 2-tuple is the
architecture, such as ppc64, sparc and x86.  Allowed characters for the
architecture name are in ``a-z0-9``.  The remaining part on the right of
the first hyphen from the left indicates the operating system or
distribution, such as linux, macos, darwin, obsd, et-cetera.  If the
right hand part is omitted, it implies the operating system/distribution
type is Gentoo GNU/Linux.  In such case the hyphen is also omitted, and
the keyword consists of solely the architecture.  The operating system
or distribution name can consist out of characters in ``a-zA-Z0-9_+:-``.
Please note that the hyphen is an allowed character, and therefore the
separation of the two fields in the keyword is only determinable by
scanning for the first hyphen character from the start of the keyword
string.  Examples of keywords following this specification are
ppc-darwin and x86.  This is fully compatible with the current use of
keywords in the tree.

The variables ``ELIBC``, ``KERNEL`` and ``ARCH`` are currently set in
the profiles when other than their defaults for a GNU/Linux system.
They can as such easily be overridden and defined by the user.  To
prevent this from happening, the variables should be auto filled by
Portage itself, based on the ``CHOST`` variable.  While the ``CHOST``
variable can be as easy as the others set by the user, it still is
assumed to be 'safe'.  This assumption is grounded in the fact that the
variable itself is being used in various other places with the same
intention, and that an invalid ``CHOST`` will cause major malfunctioning
of the system.  A user that changes the ``CHOST`` into something that is
not valid for the system, is already warned that this might render the
system unusable.  Concluding, the 'safeness' of the ``CHOST`` variable
is based on externally assumed 'safeness', which discussion falls
outside this GLEP.

Current USE-expansion of the variables is being maintained, as this
results in full backward compatibility.  Since the variables themselves
don't change in what they represent, but only how they are being
assigned, there should be no problem in maintaining them.  Using
USE-expansion, conditional code can be written down in ebuilds, which is
not different from any existing methods at all::

    ...
    RDEPEND="elibc_FreeBSD? ( sys-libs/com_err )"
    ...
    src_compile() {
        ...
        use elibc_FreeBSD && myconf="${myconf} -Dlibc=/usr/lib/libc.a"
        ...
    }

Alternatively, the variables ``ELIBC``, ``KERNEL`` and ``ARCH``
are available in the ebuild environment and they can be used instead of
invoking ``xxx_Xxxx`` or in switch statements where they are actually
necessary.

A map file can be used to have the various ``CHOST`` values being
translated to the correct values for the four variables.  This change is
invisible for ebuilds and eclasses, but allows to rely on these
variables as they are based on a 'safe' value -- the ``CHOST`` variable.
Ebuilds should not be sensitive to the keyword value, but use the
aforementioned four variables instead.  They allow specific tests for
properties.  If this is undesirable, the full ``CHOST`` variable can be
used to match a complete operating system.


Variable Assignment
-------------------

The ``ELIBC``, ``KERNEL``, ``ARCH`` variables are filled from a profile
file.  The file can be overlaid, such that the following entries in the
map file (on the left of the arrow) will result in the assigned
variables on the right hand side of the arrow::

    *-*-linux-*      -> KERNEL="linux"  
    *-*-*-gnu        -> ELIBC="glibc"
    *-*-kfreebsd-gnu -> KERNEL="FreeBSD" ELIBC="glibc"
    *-*-freebsd*     -> KERNEL="FreeBSD" ELIBC="FreeBSD"
    *-*-darwin*      -> KERNEL="Darwin" ELIBC="Darwin"
    *-*-netbsd*      -> KERNEL="NetBSD" ELIBC="NetBSD"
    *-*-solaris*     -> KERNEL="Solaris" ELIBC="Solaris"

A way to achieve this is proposed by Mike Frysinger, which
suggests to have an env-map file, for instance filled with::

    % cat env-map
    *-linux-* KERNEL=linux
    *-gnu ELIBC=glibc
    x86_64-* ARCH=amd64

then the following bash script can be used to set the four variables to
their correct values::

    % cat readmap 
    #!/bin/bash 
    
    CBUILD=${CBUILD:-${CHOST=${CHOST:-$1}}} 
    [[ -z ${CHOST} ]] && echo need chost 
    
    unset KERNEL ELIBC ARCH 
    
    while read LINE ; do 
        set -- ${LINE} 
        targ=$1 
        shift 
        [[ ${CBUILD} == ${targ} ]] && eval $@ 
    done < env-map 
    
    echo ARCH=${ARCH} KERNEL=${KERNEL} ELIBC=${ELIBC}

Given the example env-map file, this script would result in::

    % ./readmap x86_64-pc-linux-gnu 
    ARCH=amd64 KERNEL=linux ELIBC=glibc

The entries in the ``env-map`` file will be evaluated in a forward
linear full scan.  A side-effect of this exhaustive search is that the
variables can be re-assigned if multiple entries match the given
``CHOST``.  Because of this, the order of the entries does matter.
Because the ``env-map`` file size is assumed not to exceed the block
size of the file system, the performance penalty of a full scan versus
'first-hit-stop technique' is assumed to be minimal.

It should be noted, however, that the above bash script is a proof of
concept implementation.  Since Portage is largerly written in Python, it
will be more efficient to write an equivalent of this code in Python
also.  Coding wise, this is considered to be a non-issue, but the format
of the ``env-map`` file, and especially its wildcard characters, might
not be the best match with Python.  For this purpose, the format
specification of the ``env-map`` file is deferred to the Python
implementation, and only the requirements are given here.

The ``env-map`` file should be capable of encoding a ``key``, ``value``
pair, where ``key`` is a (regular) expression that matches a
chost-string, and ``value`` contains at least one, distinct variable
assignment for the variables ``ARCH``, ``KERNEL`` and ``ELIBC``.  The
interpreter of the ``env-map`` file must scan the file linearly and
continue trying to match the ``key``\s and assign variables if
appropriate until the end of file.

Since Portage will use the ``env-map`` file, the location of the file is
beyond the scope of this GLEP and up to the Portage implementors.


References
==========

.. [1] GLEP 22, New "keyword" system to incorporate various
   userlands/kernels/archs, Goodyear,
   (https://www.gentoo.org/glep/glep-0022.html)

.. [2] For the purpose of readability, we will refer to 1, 2 and
   4-tuples, even though tuple in itself suggest a field consisting of
   two values.  For clarity: a 1-tuple describes a single value field,
   while a 4-tuple describes a field consisting out of four values.


Copyright
=========

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
Unported License.  To view a copy of this license, visit
https://creativecommons.org/licenses/by-sa/3.0/.