summaryrefslogtreecommitdiff
blob: a35591fa8d917037af40aec2d57b00518fc98f51 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
---
GLEP: 73
Title: Automated enforcing of REQUIRED_USE constraints
Author: Michał Górny <mgorny@gentoo.org>
Type: Standards Track
Status: Deferred
Version: 1
Created: 2017-06-11
Last-Modified: 2019-06-17
Post-History: 2017-07-08
Content-Type: text/x-rst
---

Status
======

Marked as deferred by GLEP editor Ulrich Müller on 2019-06-17, due to
inactivity.


Abstract
========

This GLEP proposes using automated solving to satisfy REQUIRED_USE
constraints. It lists the problems with the current handling of REQUIRED_USE,
and explains how auto-solving would solve them. It specifies the algorithms
that can be used to verify the constraints, automatically solve them and check
whether they can be solved. It provides the rationale for all the design
decisions, and considers the compatibility with the PMS, and between
the constraints and the package managers before and after the GLEP is used.


Motivation
==========

The issues with REQUIRED_USE
----------------------------

REQUIRED_USE [#REQUIRED_USE]_ has been introduced in EAPI 4 as a solution to
the problem of enforcing specific relations between USE flags. According to
the Portage documentation on REQUIRED_USE [#PORTAGE-REQUIRED_USE]_, it has
been specifically targeted as a more data-oriented and machine-friendly
alternative to verifying the validity of USE flag choice in ebuild phases.

At the moment of writing, REQUIRED_USE is used in around 25% of the ebuilds
in Gentoo. It is an obligatory part of some eclasses, e.g. in the Python
ecosystem. Its uses include improving clarity of user choices, simplifying
ebuilds via copying upstream feature dependencies and enforcing valid data
for USE dependencies. Nevertheless, a number of developers raise strong
arguments against using REQUIRED_USE.

The commonly noted disadvantages of REQUIRED_USE are:

1. Unsatisfied REQUIRED_USE constraints unnecessarily (and sometimes
   frequently) require explicit user action, even if there is no real gain
   from the user explicitly selecting. For example, if a package supports
   building either against Qt4 or Qt5, and user has enabled the flags for
   both, the package manager would request him to disable one of the flags for
   the package.  For most of the cases, just using the newer version would be
   more friendly.

2. Satisfying REQUIRED_USE usually requires altering flags via permanent
   configuration. Those alterations can become obsolete over time and without
   proper maintenance can put system into a suboptimal configuration.
   For example, if a Python package requires enabling a non-default Python
   target, then the leftover flag can keep forcing an obsolete Python version
   when the package gains support for the default target.

3. The machine-oriented form of REQUIRED_USE constraints can result
   in confusing and unreadable output to the user, especially for complex
   constructs and cross-dependent constraints. Bad output can result
   in the user being unable to solve the problem at all, or to solve it
   in a satisfactory way (i.e. without disabling the features he needs).
   It can also cause frustration if satisfying REQUIRED_USE requires more than
   one attempt.

Those arguments have resulted in a number of developers avoiding REQUIRED_USE.
For example, the Qt team policies [#QT-POLICY]_ discourage using it unless
absolutely necessary. The attempts of avoiding REQUIRED_USE sometimes result
in suboptimal descriptions of USE flags or even inconsistent use of them.

The providers problem
---------------------

A very specific case of a problem where REQUIRED_USE has some use is the
*providers* problem. That is, whenever a package has a feature that can be
supplied by more than one library of choice, and the user needs to choose
between the providers. The exact form of this problem depends on the number
of providers and whether the feature is optional.

The commonly used solutions include:

- Using one or more binary flags to toggle between the providers (with number
  of the flags < number of providers). This is most readable with only two
  providers, e.g. with ``USE=libressl`` meaning *use LibreSSL instead of
  OpenSSL*, and ``USE=-libressl`` meaning *use OpenSSL*. For packages with
  optional SSL/TLS feature, there is also an additional ``USE=ssl`` to toggle
  that feature, and with ``USE=-ssl``, the ``libressl`` flag is meaningless
  (ignored). This is usually the least intrusive method but it's unreadable
  and causes the flags to be confusing.

- Using unary flags for providers along with REQUIRED_USE. In this case, each
  provider gets an explicit flag and REQUIRED_USE is used to force selecting
  exactly one of them. For optional feature, there is either an additional
  feature flag or it is disabled when all providers are disabled. This is
  usually the most readable solution although it frequently requires adjusting
  flags.

- Using unary flags without REQUIRED_USE. In this case, if user selects more
  than one provider (or does not select any), the package decides which one is
  preferred and uses that. For optional feature, again there could be either
  an additional feature flag or it could be disabled by disabling all
  the providers. This is less intrusive than the previous solution but it's
  less readable (it is unclear which provider is actually used) and unsuitable
  for USE dependencies.

As noted, all of the mentioned solutions have their specific disadvantages.
This causes different developers to use different solutions for specific
problems. Sometimes, it could go as bad as to have more than one solution
applied to a single problem, or different concepts used inconsistently
by different developers.

Automatic solving as the solution
---------------------------------

This GLEP focuses on the idea of establishing automated solving of
REQUIRED_USE as a solution to the aforementioned issues. In this context,
REQUIRED_USE is extended not only to specify what combinations of USE flags
are valid but also how to proceed from a disallowed flag set to one that
satisfies the constraints.

This clearly resolves the first two issues with REQUIRED_USE. Since
REQUIRED_USE mismatches are solved automatically, there is no explicit user
interaction required. No changes are done in configuration files — since
the solving is meant to be deterministic, the package manager can recalculate
the effective USE flag set using the input USE flag set and the REQUIRED_USE
constraint.

The third disadvantage is partially solved. Since there is no necessity
for the user to perform any action, there is also no necessity of explaining
the constraints to the user. However, for practical uses it may be deemed
appropriate to explain to the user why a particular flag has been enabled
or disabled.

Solving the most common problems with REQUIRED_USE makes it possible to extend
its use cases to the areas where developers so far rejected to use it, or did
not even think of using it. This includes working towards a single solution
to the providers problem. Given that REQUIRED_USE no longer requires altering
the configuration to select between multiple allowed providers, we can
reasonably work towards using the middle solution consistently — that is,
having clear unary flags for every provider, and using REQUIRED_USE to
automatically transform inconclusive input into a single implementation.

Furthermore, the non-intrusive version of REQUIRED_USE could be used
extensively to conditionally mask meaningless flags and map equivalent flag
sets into a single common set of choice. This can further improve readability
(by making flags clearly indicate what it used, e.g. by disabling all SSL/TLS
provider flags when SSL/TLS is disabled) and improve compatibility between
binary packages (by reducing the number of incompatible USE flag sets).


Specification
=============

Restrictions on REQUIRED_USE format
-----------------------------------

The REQUIRED_USE format is defined by the PMS [#PMS]_. This specification
requires the following additional restrictions being enforced:

- An any-of group (||), at-most-one-of (??) and an exactly-one-of (^^) group
  can contain only flat USE flag items. In particular, no other group can
  be nested inside it.

- All-of groups are forbidden inside REQUIRED_USE (they have no use now).

- An any-of group (||), at-most-one-of (??) and an exactly-one-of (^^) group
  must contain at least one subitem (can not be empty).

As a result, unlimited nesting is allowed only for use-conditional groups.
All other constructs are kept flat. This serves the following goals:

- avoiding surprising results of automatic flag adjustments,
- improving readability of REQUIRED_USE constraints,
- keeping the specification and implementation relatively simple.

The algorithm for satisfying REQUIRED_USE constraints
-----------------------------------------------------
Processing algorithm
~~~~~~~~~~~~~~~~~~~~

The existing package managers have to validate REQUIRED_USE constraints while
evaluating the dependency graph. The current validation action is replaced
by the following algorithm:

1. Check whether the REQUIRED_USE constraint is satisfied by the USE flags
   enabled by the current user configuration. If it is, accept the package
   (the algorithm stops).

2. Check whether the REQUIRED_USE constraint matches restrictions set
   in `restrictions on REQUIRED_USE format`_. If it does not, report
   a REQUIRED_USE mismatch and abort.

3. Find all any-of (||), at-most-one-of (??) and exactly-one-of (^^) groups
   inside REQUIRED_USE and reorder (sort) them according to the algorithm
   defined below.

4. Attempt to solve the REQUIRED_USE constraint using the algorithm defined
   below. If the attempt succeeds, accept the package with the set of USE
   flags determined by the solver.

5. If the attempt at solving failed, report a REQUIRED_USE mismatch and abort.

REQUIRED_USE verification algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The verification algorithm is implied by the meanings of REQUIRED_USE
constructs as defined by the PMS. It is repeated here for completeness
and for reuse in further algorithms.

The REQUIRED_USE constraint is considered satisfied if *all* the top-level
items evaluate to true. An item evaluates to true if, depending on the item
type:

- A **USE flag name** that is not prefixed by an exclamation mark evaluates
  to true if the named flag is enabled. Accordingly, a USE flag name that
  is prefixed by an exclamation mark evaluates to true if the named flag
  is disabled.

- For a **USE-conditional group** the condition needs to be tested first
  (according to the same rule). If the condition evaluates to true,
  the USE-conditional group is true only if all items in it evaluate to true.
  If the condition evaluates to false, the USE-conditional group always
  evaluates to true and the items inside it need not to be tested.

- An **any-of group** (||) evaluates to true if at least one of the items
  in it evaluates to true.

- An **exactly-one-of group** (^^) evaluates to true if exactly one
  of the items in it evaluates to true, and all the remaining items evaluate
  to false.

- An **at-most-one-of group** (??) evaluates to true if at most one
  of the items in it evaluates to true.

Constraint group reordering algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The constraint solving algorithm is built on *prefer leftmost* assumption
for all any-of, exactly-one-of and at-most-one-of groups. That is,
if the constraint is not satisfied by the current set of enabled USE flags,
the algorithm prefers enforcing the leftmost constraints and disabling
rightmost.

Due to different system profiles, it might be impossible to automatically
solve the constraint using the leftmost flag specified by ebuild (e.g. when it
is masked). In order to account for this, the specification provides a group
reordering (sorting) phase before the solving algorithm.

The reordering applies to any-of, exactly-one-of and at-most-one-of groups.
Per the format restriction, each group can only contain flat USE flags.

For each of the items in the group, if the item names a forced/masked USE
flag:

- if the item evaluates to true according to the flag's value, it is moved to
  the leftmost position in the group,

- if the item evaluates to false according to the flag's value, it is moved to
  the rightmost position in the group,

Relative positions of multiple forced/masked flags are of no relevance since
those flags are not altered.

This reordering ensures that if a flag is forced, it is always preferred over
other choices; and if it is masked, it is never preferred. This makes it
possible to easily account for all possible cases without having to provide
a detailed algorithm to handle various possible results.

REQUIRED_USE solving algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the REQUIRED_USE constraint is not satisfied according to the initial set
of USE flags implied by the configuration, the package manager attempts
to alter the USE flags according to REQUIRED_USE.

Before solving, a set of **immutable flags** is determined based on forced
and masked USE flags. If a flag is either forced or masked, it is marked
immutable and the algorithm can not alter its value. If a particular rule
would cause the flag to be altered, the solving is aborted and an error is
reported.

The solving algorithm is applied at least once, and the REQUIRED_USE is
rechecked after each application. The package manager may support running
multiple iterations of the algorithm, in which case it needs to either limit
the allowed number of iterations or abort after obtaining one of the values
previously given by the algorithm (hitting an infinite loop).

In order to enforce REQUIRED_USE, each top-level item in REQUIRED_USE that did
not evaluate to true needs to be enforced. All items are enforced in order,
left to right. Depending on the item type, enforcing implies:

- For a **USE flag name** that is not prefixed by an exclamation mark,
  the named flag is enabled. If it is prefixed by an exclamation mark,
  the named flag is disabled.

- For a **USE-conditional group**, the condition (LHS) is evaluated first.
  If the condition evaluates to true, all the items inside the group
  are enforced, in order. If it evaluates to false, the group is skipped.

- For an **any-of group** that did evaluate to false, the first (left-most)
  item in the group is enforced.

- For an **at-most-one-of group** that did evaluate to false, the first
  (left-most) item that evaluates to true needs to be determined first.
  Afterwards, all items following it are negatively-enforced (forced to
  evaluate to false).

- An **exactly-one-of group** is equivalent to a conjunction of an
  at-most-one-of group and an any-of group. That is, if all items evaluate
  to false, the rule for any-of is applied. If more than one item evaluates
  to true, the rule for at-most-one-of is applied.

The negative enforcing action can be applied to plain **USE flag names** only.
If the name is not prefixed by an exclamation mark, then the flag is disabled.
If the name is prefixed by an exclamation mark, it is enabled appropriately.


QA checks to verify REQUIRED_USE solutions
------------------------------------------

Context to QA checks
~~~~~~~~~~~~~~~~~~~~

All of the QA checks are performed in context of a specific set of forced
and masked USE flags, called *immutable flags*. All of the checks need to be
repeated for every set. Since they can alter the preferences inside any-of,
at-most-one-of and exactly-one-of groups, it may also be necessary to perform
a separate transformation for each set.

The complete set of immutable flag combinations can be obtained using
the following algorithm:

1. let **U** be the set of all USE flags (both explicit IUSE and implicit)
   that are used in REQUIRED_USE,

2. for every enabled profile:

   1. let **I1** be the effective ``use.force``, ``use.mask``,
      ``package.use.force``, ``package.use.mask`` values that apply
      to the package and affect flags in **U**,

   2. let **I2** be the effective ``use.stable.force``, ``use.stable.mask``,
      ``package.use.stable.force``, ``package.use.stable.mask`` values that
      apply to the package and affect flags in **U**,

   3. add **I1** to the result set,

   4. if package has any stable keywords, combine **I1** and **I2**,
      and add the result to the result set.

Afterwards, all checks should be performed for all unique values in the result
set.

Requirements for REQUIRED_USE constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In order to verify the ability to solve REQUIRED_USE reliably, the QA check
tools should ensure that the following conditions are met:

1. no valid combination of USE flags can result in the constraint requesting
   the same flag to be simultaneously both enabled and disabled;

2. no valid combination of USE flags (that is, not prohibited by immutable
   flags) can attempt to alter immutable flags;

3. no constraint in REQUIRED_USE may alter flags in such a way that any
   of the constraints preceding it would start to apply and change
   the resulting flags in a second iteration.

Concept for transforming REQUIRED_USE into implications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The algorithms used to verify REQUIRED_USE rely on them being expressed
in a *flat implication form*. In this form, the constraints are expressed
as zero or more *implications*. Each implication specifies zero or more
conjunctive *conditions*, and one or more *effects*. It is equivalent
to a nested USE-conditional group. If all of the *conditions* are met,
the *effects* are applied.

If a constraint is valid, then the solutions of its transformation
are the same as of the original.

By idea, the transformation consists of the following steps:

1. Reordering all any-of (||), at-most-one-of (??) and exactly-one-of (^^)
   groups according to the `Constraint group reordering algorithm`_.

2. Replacing all any-of (||), at-most-one-of (??) and exactly-one-of (^^)
   groups according to the following transformations:

   - ``^^ ( a b c… )````|| ( a b c… ) ?? ( a b c… )``,
   - ``|| ( a b c… )````!b? ( !c? ( !…? ( a )… ) )``,
   - ``?? ( a b c… )````a? ( !b !c… ) b? ( !c… ) c? ( … ) …``.

3. Creating an ordered directed graph linking all nested conditions to their
   effects.

4. Traversing all the paths from the topmost graph nodes to the deepest,
   in order.

For example, an ordered graph is provided for the following REQUIRED_USE
constraint::

    a b? ( c? ( d !b ) d? ( e ) ) b? ( f )

Nodes and edges are numbered to explain the ordering. Furthermore, the final
(effect) nodes are colored red.

.. figure:: glep-0073-extras/required-use-example-graph.svg

   Example graph for REQUIRED_USE

Traversing this graph produces the following paths, in order:

1. **a(1)**
2. b(2) → c(3) → **d(4)**
3. b(2) → c(3) → **!b(5)**
4. b(2) → d(6) → **e(7)**
5. b(8) → **g(9)**

Those paths are roughly equivalent to the following USE-conditional group
constructs:

1. ``a``
2. ``b? ( c? ( d ) )``
3. ``b? ( c? ( !b ) )``
4. ``b? ( d? ( f ) )``
5. ``b? ( g )``

Except that the value of *b* for constraint 4 is considered from the initial
value rather than the one possibly altered by constraint 3. Constraint 5 uses
a separate condition, and so uses the new value of *b*.

Algorithm for transforming REQUIRED_USE into implications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Steps 2 through 4 of the fore-mentioned transformation can be performed using
the following recursive function. It should be applied to every top-level
REQUIRED_USE item, in order.

It should be noted that for the purpose of distinguishing separate branches,
all the condition objects need to have an unique identity. In Python this
occurs naturally via instantiating an object. In other languages an explicit
unique identifier may need to be included.

::

    function transform(item, conditions=[]):
      if item is a USE flag:
        append (conditions, item) to the results
      if item is a USE-conditional group:
        new_conditions := conditions + [item.condition]
        for subitem in item.subitems:
          call transform(subitem, new_conditions)
      if item is an any-of (||) group:
        n := len(item.subitems) - 1  # (last index)
        new_conditions := conditions
        for f in item.subitems[1..n-1]:
          new_conditions += [!f]
        append (new_conditions, item.subitems[0]) to the results
      if item is an at-most-one-of (??) group:
        n := len(item.subitems) - 1  # (last index)
        for i := 0 .. n-1:
          new_conditions := conditions + [item.subitems[i]]
          for f in item.subitems[i+1..n]:
            append (new_conditions, !f) to the results
      if item is an exactly-one-of (^^) group:
        apply the logic for an any-of (||) group
        apply the logic for an at-most-one of (??) group

QA check logic
~~~~~~~~~~~~~~

The logic for the reference algorithm is split into four split functions:

1. Verifying that the constraints do not alter immutable flags,

2. Verifying that the conditions for the constraints are not self-conflicting,

3. Verifying that no two constraints will attempt to force opposite values
   for a single flag,

4. Verifying that no constraint will meaningfully enable
   any of the constraints preceding it.

In the following descriptions, *C* will indicate zero or more conditions
(*ci* being the sub-conditions) of the flat constraint, and *E*
will indicate the enforcement.

The check for alteration of immutable flags is done for every constraint
separately. A flat constraint is determined to alter immutable flags if both
of the following conditions occur:

- *C* can evaluate to true — that is, none of *ci* refer to an immutable
  flag whose value is *¬ci*,

- *E* references an immutable flag whose immutable state is *¬E*.

The check for self-conflicting constraints is performed for every constraint
separately. A flat constraint is determined to be self-conflicting
if the following condition occurs:

- For any pair of sub-conditions *ci*, *cj* (*i ≠ j*), *ci = ¬cj*.

The check for attempting to force opposite values for a single flag is
performed for every pair of constraints. Since it is symmetric, it is only
necessary to perform it for unique pairs. For practical reasons, let's assume
it is performed for every pair *((Ci, Ei), (Cj, Ej))*, where *j > i*. The pair
is determined to force opposite values for a single flag if all of the
following conditions are met:

- *Ei = ¬Ej*,

- *Ci* and *Cj* can simultaneously evaluate to true,

- *Ci* can evaluate to true after applying all the constraints preceding it,
  with flags *F = Ci ∪ Cj*,

- *Cj* can evaluate to true after applying all the constraints preceding it,
  with flags *F = Ci ∪ Cj*.

The check for enabling the previous constraints is performed for every pair
*((Ci, Ei), (Cj, Ej))*, where *j > i*. The constraint *(Cj, Ej)* is determined
to meaningfully enable the constraint *(Ci, Ei)* if all of the following
conditions are met:

- *Ej* matches any of the conditions in *Ci* (*Ej = ci,k*, for any *k*),

- *Ci* and *Cj* can simultaneously evaluate to true,

- *Ei* does not always evaluate to true after applying all of the constraints,
  with flags *F = Cj*.

Two flat constraints *Ci* and *Cj* can simultaneously evaluate to true
if the following condition is met:

- For every *ci,k*, *cj,l* (where *k* and *l* are all possible indexes
  of the condition of the first and second constraint appropriately),
  *ci,k ≠ ¬cj,l*.

A constraint *C* can evaluate to true if and only if all sub-constraints can
evaluate to true. A sub-constraint *ci* can evaluate to true if the current
set of flags does not include its negation (for every *fj*, *fj ≠ ci*).

A constraint *C* always evaluates to true if and only if all sub-constraints
always evaluate to true. A sub-constraint *ci* always evaluates to true if the
current set of flags includes the condition (there exists at least one *fj*
that *fj = ci*).

In order to determine whether a condition *Ci* can evaluate to true after
applying a specific set of constraints, with initial flags *F1*, determine
the final set of flags *Fn* and afterwards test if the constraint can evaluate
to true with flags *Fn*.

In order to determine whether a condition *Ci* always evaluates to true after
applying a specific set of constraints, with initial flags *F1*, determine
the final set of flags *Fn* and afterwards test if the constraint always
evaluates to true with flags *Fn*.

In order to determine the final set of flags *Fn*, with specific set
of constraints *(Ci, Ei)* and initial flags *F1*:

- For every flat constraint *(Ci, Ei)* in the set:

  - If the condition *Ci* always evaluates to true, update *F* with *Ei*
    (*Fi+1 = Fi ∪ {Ei} ∖ {¬Ei}*).

Limitations of the algorithm
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The presented check algorithm has a limitation which could result in false
positives. However, the testing against all real Gentoo uses of REQUIRED_USE
has shown that none of those occur at the moment of writing this GLEP,
and that is quite unlikely for them to become a major issue in the future.

The algorithm is unable to infer indirect implications of the constraints.
For example, given the following constraint::

    a? ( !b ) !a? ( !b ) b? ( c )

The algorithm is unable to correctly infer that due to the first two
constraints, *b* will never be true. As a result, it will e.g. report
an immutability error on ``b? ( c )`` if *c* is masked even though this
condition could never evaluate to true.

However, it is considered that a natural occurrence of such a constraint
is quite unlikely, and usually indicates a problem with the constraint anyway.
Therefore, reporting a false positive here could serve as an indication
of another problem.

Policy implications
-------------------

This GLEP does not directly add, alter or remove any of the Gentoo policies.
Any policy changes related to it need to be done independently of its
approval, using the appropriate Gentoo procedures.


Rationale
=========

Restrictions for allowed REQUIRED_USE syntax
--------------------------------------------

The specification imposes a number of arbitrary restrictions to REQUIRED_USE
syntax, in particular by restricting the possible nesting and disallowing
other complex constructs. The main goal is to simplify the algorithms used
and make the results more obvious. This is at cost of prohibiting constructs
that are rarely used, and usually could be replaced by simpler and more
readable constructs.

Nested any-of, at-most-one-of, exactly-one-of groups
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The first and most important restriction is that nesting of any-of,
at-most-one-of and exactly-one-of groups is forbidden. While technically such
constructs could work, some of them are not really meaningful and others
are really confusing. At the time of writing, nested ||/??/^^ groups were used
in exactly two Gentoo packages. The specific uses were:

1. app-admin/bacula::

    || ( ^^ ( mysql postgres sqlite ) bacula-clientonly )

2. dev-games/ogre::

    ?? ( gl3plus ( || ( gles2 gles3 ) ) )

The first use is not very complex, and indicates that either exactly one
of the database providers need to be selected, or the *bacula-clientonly* flag
needs to be used. However, at a first glance a user might be confused that
the database ^^ constraint needs to be applied independently
of the *bacula-clientonly* flag. The same construct can be expressed in a more
straightforward way::

    !bacula-clientonly? ( ^^ ( mysql postgres sqlite ) )

The second use is much more confusing. It means that both *gl3plus* and either
of the *gles2* or *gles3* flags can not be enabled at the same time. However,
*gles2* and *gles3* can be enabled simultaneously. The same construct can be
expressed in a more straightforward way as::

    gl3plus? ( !gles2 !gles3 )

As can be seen, in both cases the alternative constructs were both more
readable and shorter than the nested expressions. In the first case, it is
also the more natural way of expressing the problem. While replacing
expressions that have more than two subexpressions would be harder, there were
no uses of such expressions so far, and the potential ambiguity makes them
unlikely to appear.

All-of groups
~~~~~~~~~~~~~

The second restriction imposed by this GLEP is disallowing all-of groups.
The PMS allows them anywhere but in reality they are only meaningful inside
||, ??  and ^^ groups (elsewhere they do not have any effect, and can be
inlined into parent block). Inside those groups, they imply that the item is
considered matched only if all items inside the all-of group match.

The meaning of all-of groups inside || is pretty clear. However, inside ??
and ^^ some confusion may occur. In particular, for a general case of::

    ?? ( a ( b c ) )

the constraint only affects the combination of all flags inside the all-of
group. In this case, enabling *a* prohibits having the combination of both *b*
and *c* enabled. However, either *b* or *c* can be enabled separately without
affecting *a*. This makes this constraint unlikely to have real use cases,
and if it has, they are unlikely to be the most natural way of expressing
the problem.

Furthermore, automatic solving of such constraints forces some implicit
ambiguity. Since both (multiple) flags have to be enabled together to cause
a particular item to match, there are multiple solutions of forcing an item
not to match. For the fore-mentioned sample, having *a* enabled would require
the solver to force *( b c )* not to match. To do this, the solver could
either disable *b*, disable *c* or disable both flags.

There are arguments for both options — disabling only one flag follows
the idea of 'smallest change needed'. Disabling both can be considered more
consistent. In either case, there will be developers and user confused
by the package manager relying on either behavior.

The all-of groups inside || do not suffer from the same issue since solving
them does not require disabling anything. However, they also have seemingly
low value and banning all-of groups altogether improves symmetry between
the different group types.

Furthermore, the nested all-of groups make transformation into implication
graph much more complex. Without them, the conditions are purely conjunctive.
If we were to support all-of groups inside ||, ??, ^^ we would have to support
disjunctive conditions, and transform them into conjunctive form.

The all-of groups were used in 5 different packages at the time of writing.
Two of them were outside ||, ??, ^^, rendering them meaningless and probably
accidental. The three remaining cases were:

1. sci-chemistry/icm::

    ^^ ( ( !32bit 64bit ) ( 32bit !64bit ) ( 32bit 64bit ) )

2. media-sound/snd::

    ^^ ( ( !ruby !s7 ) ( ruby !s7 ) ( !ruby s7 ) )

3. app-i18n/ibus::

    || ( deprecated ( gtk3 introspection ) ) )

Of those cases, the first two can be replaced by pure, flat || and ?? groups
appropriately. It furthermore indicates that all uses of all-of groups inside
^^ in Gentoo were purely mistaken.

The third case is potentially valid. It indicates that either *deprecated*
or both *gtk3* and *introspection* flags need to be enabled. However, it does
not clearly indicate the preferred course of action. After investigating
the ebuild in question, it is most likely that the following constraint would
be more correct, and clearer to the user::

    || ( deprecated gtk3 ) gtk3? ( introspection )

That is, if user enables *gtk3* and *gtk3* requires *introspection*, then it
seems more reasonable to enable *introspection* than to ignore the *gtk3* flag
and force *deprecated* module instead.

USE-conditionals inside ||, ??, ^^ groups
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The last restriction forbids using USE-conditional groups inside any-of,
at-most-one-of and exactly-one-of groups. Those indicate that some
of the items inside the group are to be considered its members only
if the relevant flags are enabled. They are logically equivalent to all-of
groups, i.e. ``|| ( foo? ( bar ) ... )`` and ``|| ( ( foo bar ) ... )``,
except they have a different semantic — the latter form suggests enabling both
flags, the former suggests considering *bar* only if *foo* is already enabled.

Supporting USE-conditional groups properly would most likely require splitting
the parent group into multiple variants for different initial values of USE
conditionals. Considering the above equality, it would also be inconsistent
with the ban on all-of groups. Finally, those groups have little real value.

The only use case in Gentoo was in media-video/mpv::

    opengl? ( || ( aqua egl X raspberry-pi !cli? ( libmpv ) ) )

It indicates that the OpenGL video output requires selecting one of the
variants, with the *libmpv* variant being allowed only without CLI enabled.
While this may be technically valid, it is confusing. Furthermore, other
REQUIRED_USE constraints already require that either *cli* or *libmpv* is
enabled, making *!cli* imply *libmpv*. Therefore, the USE-conditional
in the constraint is redundant.

Empty any-of, at-most-one-of, exactly-one-of groups
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As the first mailing list review indicated, the PMS explicitly specifies
a special case that empty any-of, at-most-one-of and exactly-one-of groups all
evaluate to true.

This behavior has been explained as a historical behavior associated with
Portage removing unmatched USE-conditional groups inside any-of dependency
groups which could result in the group becoming effectively empty.
As REQUIRED_USE was introduced, the rule was effectively extended into the new
operators.

It is unclear whether this is the most correct behavior logically though.
Alexis Ballier pointed out:

> I mean, in every context I've ever seen, applying a rule to the empty set is
> the neutral of that rule, so that it preserves associativity.
>
> That'd mean: ``|| ( )`` is false, ``&& ( )`` is true, ``^^ ( )`` is false,
> ``?? ( )`` is false.

(the thread afterwards develops that the more correct result for ``?? ( )``
could be to be true)

Since the original use case does not apply here (USE-conditional groups
are banned inside those operators), the correct behavior is unclear and this
has no real use case, banning it seems like the best course of action.

There is not a single use of such groups at the time of writing, and their
natural occurrence is extremely unlikely. It has some potential of occurring
due to eclass-generated strings but it is doubtful whether any of such cases
would not be more appropriately reported as an error.

Solving algorithm
-----------------

The solving algorithm attempts to enforce REQUIRED_USE in the most natural
way, interpreting the constraints as developer suggestions on how to make
the constraint apply.

Application of different types of constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The algorithm aims to solve mismatched constraints in the most natural way,
presuming that this interpretation is the most likely to be correct.

For the USE-conditional groups, it assumes that they mean *if X is true, then
Y should also be true*. Appropriately, the algorithm does not alter the flag
in the condition (*X*); instead, if the condition is true, it enforces
the expression inside the group (*Y*).

For other groups, the algorithm applies the natural interpretation presuming
that the items in group are stated in decreasing preference order, with
the left-most item being most preferred. That is, if the group evaluates to
false, it enforces a solution that either disables all enabled items except
for the left-most already enabled, or enables the first item if no item
is enabled.

Reordering of ||, ??, ^^ groups
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The left-most-preferred assumption about the groups results in the solving
algorithm relying on the ability to enable the item and disable other items.
This is not possible if the relevant flag is masked, or (in cases of ??, ^^)
some other flag is forced. If that were the case, the ordering inside those
groups would have to be strictly limited by the 'common denominator' between
the profiles. This would sometimes result in less preferred options being
encouraged, or even impossible to express constraints — e.g. if the preferred
implementation would not be stable but the package were stabilized.

To account for this, the groups are transformed to account for forced/masked
(immutable) flags. The transformation is done through reordering the items
because this keeps the specification as simple as possible. It does not to
cover specifically how to interpret immutable flags in different kind
of groups, and how to handle the groups afterwards. Instead, reordering
results in the forced flags being preferred naturally, and the masked flags
being discouraged naturally.

It also naturally handles the case when forced/masked flags result
in impossible to satisfy constraints. Those cases do not need to be detected
by the reordering algorithm implicitly, and instead just cause solver to fail
early.

Left-to-right constraint application
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The solving algorithm applies all changes necessary to enforce the constraints
in order, left to right. Enforcing a specific ordering, combined with the PMS
specifying how ebuild and eclass values for REQUIRED_USE are combined, makes
the algorithm deterministic. Applying left-to-right is also the most natural
way of doing it, making it easy for developers to predict the results.

Originally I had considered making the algorithm work independently
of constraint order. However, this would clearly defining what the desired
solution is, and finding an algorithm to enforce that. To achieve
a deterministic solution, we would most likely have to require developers
to provide groups that do not overlap. That is, for example::

    a? ( !b ) b? ( c )

would be unacceptable since with both *a* and *b* flags enabled,
the constraint would either enforce *c* or not, depending on the processing
order. The developer would have to write::

    a? ( !b ) !a? ( !b? ( c ) )

While this is a possible solution, expressing complex constraints would be
very hard. Developers would no longer be able to naturally express
the constraints, and instead would have to determine the correct sets
of conditions for each requested result.

Single vs multiple iterations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This GLEP does not specifically restrict the implementations to doing simple
or multiple iterations. Both options have their advantages.

A single iteration can successfully solve all valid REQUIRED_USE constraints,
as long as they are properly ordered. An implementation using a single
iteration has simpler error handling — it is only necessary to verify whether
the REQUIRED_USE actually matches after enforcing it. It is also reasonable
to request developers to order their constraints for a single iteration
solving.

The advantage of using multiple iterations is that they can also solve wrongly
ordered constraints. However, the implementation needs to account
for the possibility of invalid (circular) constraints putting the solver
in an infinite loop. For this reason, the solver needs to either limit
the maximum number of iterations or store previous results and detect when
the algorithm gives one of the previous results again.

For most of the real-life use cases, two iterations should be able to solve
all the constraints. A large number of iterations is unlikely to be required
by naturally written REQUIRED_USE constraints. It could be artificially caused
by writing constructs like::

    c? ( d ) b? ( c ) a? ( b )

QA checks/verification
----------------------

The necessity of verification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The purpose of REQUIRED_USE constraint verification is to ensure that for all
valid combinations of input USE flags, the solver will be able to find a valid
solution. This needs to be done explicitly since complex REQUIRED_USE
constraints may trigger solving issues with non-obvious USE flag combinations,
causing the developers to miss the issue.

Since the solver must be able to deal with non-solvable constraints
(by reporting them and letting the user deal with them), verification
is not a strict necessity for enforcing REQUIRED_USE. However, it improves
the user experience, and so is a worthwhile addition to the QA tools in place.

To provide the best coverage, it is beneficial to integrate the verification
into the tools commonly used by developers — repoman and pkgcheck, including
the CI runs. For this to be possible, the algorithm must meet two
requirements:

- It must be fast enough not to cause significant increase in repoman/pkgcheck
  run time for the full repository.

- It must not trigger a large number of false positives, and if any are
  triggered, they should be easy to work around.

Context to the checks
~~~~~~~~~~~~~~~~~~~~~

As noted in the specification part, all of them checks need to be repeated
for all possible sets of the immutable flags. This is necessary since
the immutable flags can alter the solutions significantly. In particular:

- They can alter the preferred choices in the any-of, at-most-one-of
  and exactly-one-of groups,

- They can cause some of the constraints to be unable to be satisfied,

- They can cause some of the USE-conditional groups to be disabled entirely.

To account for that and avoid the case where REQUIRED_USE solving would fail
on some of the profiles, the verification should be performed for all
combinations of immutable flags found throughout the enabled classes
of profiles. Only the flags that apply to the REQUIRED_USE constraint
in question need to be considered.

Due to the EAPI 5 stable masking [#STABLE-MASK]_, the immutable flags have
to be calculated separately for ~arch and stable keywords. The stable variant
does not need to be considered unless the package is actually stable or being
stabilized, to avoid unnecessarily cluttering up ``package.use.stable.mask``
and/or ``package.use.stable.force`` for packages that are going to stay
in ~arch.

The requirements for REQUIRED_USE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The rules imposed for verification aim to cover most of the common cases
of unsolvable constraints. In particular:

1. *no valid combination of USE flags can result in the constraint requesting
   the same flag to be simultaneously both enabled and disabled*.

   If the effective REQUIRED_USE constraint (after collapsing all the groups)
   contains both *foo* and *!foo*, the verification will never consider
   the constraint met (since logically *x ∧ ¬x* is always false).

2. *no valid combination of USE flags (that is, not prohibited by immutable
   flags) can attempt to alter immutable flags*.

   This is implied by the immutability of masked/forced flags. An attempt
   to toggle those flags while solving should be considered a fatal error
   since ``use.mask``/``use.force``/… always takes precedence over regular
   configuration and package-level toggles. Therefore, if such flags
   are enforced by an USE-conditional group, their condition should also
   be masked or forced appropriately.

3. *no constraint in REQUIRED_USE may alter flags in such a way that any
   of the constraints preceding it would start to apply and change
   the resulting flags in a second iteration*.

   This is required for reliable single-pass solving. While the solving may
   work correctly with multiple iterations, the constraints can be reliably
   (and usually easily) fixed via reordering. More importantly, this also
   catches the constraints that can not be solved due to circular toggling
   between the constraints.

The additional condition for the second iteration change has been added
to account for the common case of ``a? ( b ) c? ( a b )``. While technically
the second clause causes the first to start to apply, the second one already
covers that case explicitly, so a second iteration would not change
the result.

Transformation into implication form
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The transformation of REQUIRED_USE into implication form is used to provide
a form of the original constraint that is more convenient for analysis.

Firstly, the diverse (convenience) item types are all converted into
a combination of implications and plain USE flags. The latter can express all
the original constraints exactly, provided that the any reordering necessary
is done prior to the transformation. As a result, we gain both simplified set
of items that need to be considered, and a clear logical mapping of behavior
associated any-of, at-most-one-of and exactly-one-of groups.

All of the transformed forms are built by definition, from the verification
and solving algorithm:

- Any-of group constraints are satisfied if at least one of the items match.
  Therefore, the solving only applies if none of them does, in which case
  the first item is enforced. Appropriately, the result of transformation
  is the enforcement of first item conditional to the negation of all other
  items (the condition for the first item is omitted as redundant — enforcing
  a flag that is already enabled does not change anything).

- At-most-one-of group constraints are satisfied if no more than one item
  matches. The solving is applied if more than one item is enabled, in which
  case all but the first enabled item are forcibly disabled. Since disabling
  an already disabled flag does not change anything, this can be simplified
  to disabling all the remaining items if the left-most item is matched.
  The transformation does exactly that, for each item that can be possibly
  enabled, left-to-right.

- Exactly-one-of group constraints are satisfied if exactly one item matches.
  Logically this is equivalent to both having at least one item and no more
  than one item matching. Therefore, this constraint can be converted
  into a combination of an any-of group and an at-most-one-of-group, for which
  the transforms are already defined.

Secondly, having limited the set of item types to just two, of which only one
can be nested, the constraint can be easily converted into a graph.
The resulting graph provides a clean visualization of the structure of the
nested conditions. All nodes but the final (bottom-most) ones represent
conditions, while the final nodes represent enforcements.

A plain graph could be used to visualize relations between different
conditions and enforcements. However, the specifics of REQUIRED_USE
processing, especially left-to-right processing, require that the transform
preserves exact structure of the constraints.

Thirdly, having the graph (tree) of conditions, we can easily traverse them.
In doing so, we construct paths that precisely express which conditions need
to be met for a particular enforcement to apply. Since the constraints
are applied in order, we need to traverse the graph in this specific order,
and write the paths down in the same order.

In doing the two last steps, it is important that we preserve the identity
of the original condition nodes. This is necessary to distinguish between two
cases:

1. ``a? ( b c )``
2. ``a? ( b ) a? ( c )``

Since the solving algorithm is applied recursively to USE-conditional groups,
in the first case the outer *a* condition is not reevaluated between
processing *b* and *c*. In the latter case, the use of separate groups causes
reevaluation of the condition.

While in this specific example there is no technical difference between
the two forms, it becomes apparent when dealing with the following corner
case:

1. ``a? ( !a b )``
2. ``a? ( !a ) a? ( b )``

In both cases, applying the first sub-item disables *a*. However, only
in the second case will the solver reevaluate the value of *a* and omit
the second group. A plain flattening would cause the same to incorrectly
happen for the first case, rendering the transform not equivalent
to the original form.

In order to prevent that from happening, the verification algorithms need
to be able to determine that the *a* condition is the same node in both
resulting flattened expressions, and appropriately account for the fact that
it is not affected by the enforcement. In the reference implementation, this
is done via preserving the identity of the node, and doing identity-wide node
comparison.

The choice of algorithm
~~~~~~~~~~~~~~~~~~~~~~~

A few algorithms were considered for the purpose of verification.

The first and the most obvious choice was to attempt to enforce the constraint
for all possible combinations of USE flags, and report issues if any
of the combinations results in failure. Such an algorithm has three important
advantages:

1. it is trivial to implement and requires little extra code,

2. it is reliable since all combinations of USE flags are tested — if any
   of them fails, the check would find it,

3. it reuses the verification/enforcing function verbatim, so there is no risk
   of the check diverging from the base algorithm.

However, this method has a single important drawback: it is slow. For each
test context, it needs to process 2^n combinations (n — number of USE flags);
the number can grow huge with packages having 30 or more USE flags
in REQUIRED_USE (which is especially the case for any-of groups). Furthermore,
for each combination the check takes the average of 1 to 3 constraint
iterations.

It is possible to attempt to speed up this method a little, e.g. via grouping
the flags into separate, independent groups and processing them separately.
However, this still doesn't give a significant gain and is not a reliable
method of solving the problem. As a result, such an algorithm — while useful
for the purposes of testing and reference — is not suitable for integrating
with the QA tools.

An alternate algorithm has been considered that processes the restriction
left-to-right and builds a decision tree-like structure in order to analyze
all the possible outcomes of the REQUIRED_USE constraint. However, the pure
version of this algorithm was also rejected because it could not give
a significant speed gain — the check still needed to consider 2^n cases
(n — number of USE conditional groups in the transformed constraint). While
it certainly could be faster than the previous one, especially that it did not
require multiple iterations for each variant, and that the latter variants
required less processing, it would still not be fast enough for a broad use.

The effective algorithm selected is somehow a simplified derivation
of the above method. However, instead of analyzing the complete decision tree
enforced by the REQUIRED_USE constraint, it focuses on analyzing the possible
effects of each constraint. The specified algorithm has been split into four
logical checks, although in real implementation they could be easily grouped
together. Two of the checks are performed on each flattened constraint
separately, and the other two are done on unique pairs of flattened
constraints. As a result, the effective number of iterations is much lower
than in the other cases, as is the complexity of each iteration.

Even with the additional logic needed to prevent some of the false positives
the algorithm is still fast enough to serve its purpose. While it is not
perfect, it has been tested on all real cases of REQUIRED_USE from Gentoo
and verified not to cause any issues.

Verification: altering immutable flags
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The first of the checks is meant to ensure that under no circumstances
the constraint will attempt to toggle flags that are immutable, that is whose
values are established through use.mask / use.force files. This concept
is not only important for the scope of this GLEP but it also ensures that
the constraints could be satisfied at all.

The generic idea is that the following constraint::

    a? ( b )

combined with use.mask on *b* will cause an error because if the user enables
*a*, then *b* is required but it can not be enabled. Likewise, the following::

    a? ( !b )

with *b* use.forced will cause an error since *b* can not be disabled.

Those constraints would be acceptable if *a* were masked as well,
as to prevent the condition from ever being true. This is both the reason
for the rule on the condition of flattened constraint, and the correct
solution for the issue.

It should be noted that the check is done separately for every flattened
constraint, and does not consider the implications of other constraints.
That is, given the following example constraint::

    !a? ( !b ) b? ( c )

with both *a* and *c* masked, the check will still consider the REQUIRED_USE
erroneous even though *b* could not ever be true. However, this is not
realistically considered an issue and can be solved via masking *b* as well.
It will also improve the clarity of the USE flags and avoid giving a false
sense that *b* could be enabled.

Verification: self-conflicting constraints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This check is not especially important; it was added mostly as a matter
of a precondition check to avoid providing unexpected input to the checks
following it. It is meant to catch a self-conflicting conditions such as::

    a? ( !a? ( b ) )

As it can clearly be seen here, this condition will never evaluate to true
because it would require *a* being both enabled and disabled simultaneously.

An occurrence of such a constraint is extremely unlikely. However, it
effectively breaks some of the assumptions for the algorithms since it is
impossible to provide a valid set of flags that would satisfy the condition.
It is therefore explicitly rejected as invalid.

Verification: forcing opposite values for a flag
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This check is meant to account for the case where a combination of two
different constraints would require a flag to be both enabled and disabled
at the same time. A generic example is::

    a? ( c )
    b? ( !c )

Here, the first constraint requires *c* enabled while the second one requires
it disabled. Therefore, if the user enables both *a* and *b*, the constraint
can not be satisfied. The only enforcements explicitly allowed here are
enabling and disabling *c* in order, neither of which is capable of solving
the problem.

The first condition listed in the algorithm verifies the most important
symptom of the problem — that two flattened constraints require the opposite
values of a flag. The remaining conditions are meant to rule out false
positives.

The second rule states that both conditions need to be able to simultaneously
evaluate to true, or in other words the two conditions can not contain
opposite values. For example, this rules out the following case::

    a? ( c )
    !a? ( b? ( !c ) )

where both conditions can never evaluate to true simultaneously because they
require the opposite values of *a*.

The third and fourth rules aim to verify that the condition can realistically
occur after considering all the constraints preceding it. This is meant to
avoid the following kind of false positives::

    !a? ( !b )
    !a? ( !c )
    b? ( c )

Here, after considering the first two conditions the second and third
constraints can occur simultaneously because *!a* and *b* do not conflict with
each other. However, after considering the constraints preceding it is clear
that they can't since *!a* will implicitly disable *b*, rendering the third
condition false.

It should be noted that this also works for corner cases like::

    c? ( a )
    a? ( b )
    d? ( !a )
    !a? ( !b )

because even though the algorithm would incorrectly presume that the second
and the fourth conditions can not occur simultaneously, it will detect
a conflict between the first and the third conditions.

Verification: enabling a condition preceding the constraint
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This check verifies that a constraint will not meaningfully cause a constraint
preceding it to start to apply. This effectively means the constraints that
will require more than one iteration of the algorithm to enforce them.

A generic example is::

    b? ( c )
    a? ( b )

In this case, having only *a* enabled will result in *b* being enabled
in the first iteration, and *c* in the second.

The first condition verifies the most important symptom of the problem —
that is, that the effect of the later constraint matches the condition
of an earlier constraint. The remaining conditions rule false positives out.

Once again, the second condition checks whether the two conditions can occur
simultaneously, that is not conflict one with another. A generic example
of a false positive ruled out by this is::

    !a? ( b? ( c ) )
    a? ( b )

in which case although the second constraint enforces *b* that is one
of the conditions for the first constraint, both conditions can not occur
simultaneously since *a* would have to be enabled and disabled at the same
time.

The third rule checks whether the conditions of the later constraint do not
enforce the same effect as the earlier constraint anyway. That is, they
account for a relatively common pattern of::

    b? ( c )
    a? ( b )
    a? ( c )

Even though the second constraint causes the first one to start to apply,
having *a* enabled will also cause the third constraint to apply. Since
the third constraint has the same effect as the first one, applying the first
one will have no effect (the constraint will be satisfied already)
and the second iteration will not be required.

Helper algorithms
~~~~~~~~~~~~~~~~~

The specification also provides helper algorithms to determine the two cases:
when a condition can evaluate to true, and when it always evaluates to true.
In general, the algorithms are concerned only by *strong* enforcements, that
is those that are guaranteed to happen.

Therefore, it is assumed that a condition *can* evaluate to true unless there
is at least one sub-condition that can not evaluate to true. That is, having a
condition of the form::

    a? ( b? ( c? ( ... ) ) )

it is assumed that it can evaluate to true unless we explicitly have *!a*,
*!b* and/or *!c* in the currently enforced flag state. Otherwise, we assume
that the flag can have any value and so the condition could be made true
with appropriate flag values.

Appropriately, a condition *always* evaluates to true only if we know that all
sub-conditions will evaluate to true. In the fore-mentioned example this would
mean that the current flags would have to explicitly list *a*, *b* and *c*.
Otherwise, we assume that one of the flags can have an opposite value
and therefore make the condition evaluate to false.

While determining the effective flags, we are only concerned with the
flattened constraints whose conditions always evaluate to true (with the value
of flags current to the processed constraint). This is in order to avoid
enforcing any flags that may not be enforced in a real use case. Considering
the above, it means that the flags that would be enforced by such constraints
are left in *undefined* state, and do not restrict the constraints following.

As noted in the limitation section, this logic suffers from the limitation
that it can not infer complex implications of the constraints such as::

    !a? ( b ) a? ( b )

If the value of *a* is undefined at the time of processing, the algorithm will
presume that neither of the conditions is guaranteed to be true, and therefore
*b* will be left in undefined state. However, this is considered an unlikely
corner case and is not a major concern.


Backwards compatibility
=======================

Compliance with the PMS
-----------------------

This GLEP does not break the PMS compliance in any way. The syntax used
by the constraints is a subset of the REQUIRED_USE syntax allowed by the PMS
[#PMS-SYNTAX]_.  The semantic extends the one defined in the PMS
in non-conflicting way.

The PMS does not require a very specific behavior for REQUIRED_USE. The USE
state constraints section [#REQUIRED_USE]_ requires that the package manager
does not use (build/install) package versions where REQUIRED_USE constraints
are not met.

However, it does not require the package manager to verbosely report
the conflict which the package managers actually do. That considered, it
should not cause any non-compliance if this verbose reporting is (partially)
replaced by automatic solving. If the solving succeeds, the constraints
are met and the package manager can proceed with building/installing
the package. If it does not, the existing behavior of reporting the issue
is preserved.

New constraints vs non-compliant package managers
-------------------------------------------------

This GLEP preserves full syntax compatibility with the existing package
managers. The constraints written for auto-solving will still work correctly
in the package managers not supporting it, resulting in regular REQUIRED_USE
mismatch. Furthermore, the extended semantic meaning can result in improved
readability of constraints, and therefore the messages issued by the package
managers. Users aware of the auto-solving rules will have a suggested
algorithm for satisfying REQUIRED_USE.

The only potential danger is that the auto-solving will result in more
extensive use of REQUIRED_USE and less concern for whether they are satisfied
by default, resulting in more frequent REQUIRED_USE mismatches. Avoiding this
problem should be done on policy level, requiring the developers not to rely
purely on auto-solving through a migration period.

Old constraints vs auto-solving
-------------------------------

Most of the existing REQUIRED_USE constraints are already compatible with
auto-solving. There are three problematic cases:

1. Constraints that are disallowed per the `restrictions on REQUIRED_USE
   format`_,

2. Constraints that can not be solved by the algorithm,

3. Constraints that give sub-optimal (non-preferred) solutions.

While the impact and details differ for each case, it can be commonly noted
that all of them can be reliably fixed before implementing auto-solving,
and — as noted above — the fixes will not break existing package managers.

Constraints disallowed in this GLEP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For simplification, this GLEP will reject some of the REQUIRED_USE forms that
are valid per the PMS. They will be rejected for all combinations of USE flags
that do not satisfy the constraint. However, this is not a major issue
for three reasons:

1. The unsupported constraints are extremely rare, of low value and fixing
   them improves readability. As listed in rationale `restrictions for allowed
   REQUIRED_USE syntax`, there were a total of 8 packages being affected
   at the time of writing, and fixing them was already in progress.

2. The constraints are only rejected for auto-solving but are still supported
   for REQUIRED_USE verification. The package manager will therefore just
   report the unsolvable REQUIRED_USE to the user, making this
   not a regression from the previous state.

3. This GLEP does not strictly disallow the package manager from solving those
   constraints, only does not specify the solutions for them. Therefore,
   the package managers may implement custom extensions to solve them.
   However, they should still warn that this is non-portable and unreadable.

Constraints that can not be solved
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Not all valid REQUIRED_USE constraints can be reliably solved. There are two
major cases for that:

1. Constraints that toggle flags that caused previous conditions not to apply.
   Solving those may require more than one iteration of the solving algorithm.
   However, they usually can be fixed easily by reordering.

2. Constraints that have conflicts between flags. Solving those will result
   in repeated results where the constraint is unsatisfied. With
   multi-iteration solving, they can cause infinite loops. They have no
   trivial solution.

However, the problem usually applies to only some of the disallowed USE flag
combinations. The verification algorithm should be able to detect most
of those cases.

Constraints with sub-optimal solutions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

While this specification uses an algorithm that attempts to read REQUIRED_USE
constraints in the most natural way, not all constraints in Gentoo are written
in this manner. Especially, many any-of, at-most-one-of and exactly-one-of
groups are written with no specific ordering in mind. In some cases, they are
used interchangeably with USE-conditional groups. Some USE-conditional groups
are written without concern for clearly establishing the relation between
the condition and the items inside the group.

While the auto-solving algorithm is able to solve many of those constraints,
the solution can be considered sub-optimal as they do not follow the solution
that the developers would knowingly suggest. For example, per the current
rules the two following constraints are equivalent::

    feature? ( dep )
    !dep? ( !feature )

However, per the auto-solving semantic the first one will favor enabling
the dependency, while the second one will favor disabling the feature.

This is probably the most important issue since there is no easy way
to automatically detect that.


Reference implementation
========================

Proof-of-concept code
---------------------

The reference implementation of various algorithms and the scripts used to
test them are included in the mgorny/required-use project on GitHub
[#GITHUB-REQUIRED-USE]_.

The repository includes the following scripts/modules:

- ``parser.py`` which provides a simple parser of REQUIRED_USE constraints
  into AST, and is supposed to represent a minimal parser that should be
  implemented in a package manager already. When run as a script, it outputs
  the AST of input string.

- ``solve.py`` which provides an implementation of solving (enforcement)
  algorithm. When run a script, it prints the solutions (output flag sets)
  for every possible input flag combination.

- ``sort_nary.py`` which provides an implementation of sorting any-of,
  at-most-one-of and exactly-one-of groups according to immutable flags.
  When run as a script, it prints the AST of input string after sorting.

- ``to_flat3.py`` which implements the transformation into flattened
  constraints. When run as a script, it transforms the input string to
  a list of flattened constraints and prints it.

- ``validate_ast.py`` which implements the validation of correct nesting
  in AST. When run as a script, it validates the input string.

- ``verify2.py`` which implements the verification algorithms for the QA
  checks. When run as a script, it verifies the input string and prints
  the result.

PkgCore
-------

The implementation of the validation and verification bits has been prepared
for the PkgCore package manager. It has been submitted as pkgcheck PR#60
[#PKGCHECK-PR]_. It is being actively tested in the pkgcheck fork used
for the Repository mirror & CI [#REPO-MIRROR-CI]_ project.


Thanks
======

The author would like to thank Alexis Ballier <aballier@gentoo.org> for his
feedback, mathematical analysis and his own reference code that helped shape
the GLEP into its final form and made it possible to solve many
of the problems.


References
==========

.. [#REQUIRED_USE] PMS: USE state constraints
   (https://projects.gentoo.org/pms/6/pms.html#x1-910008.2.7)

.. [#PORTAGE-REQUIRED_USE] Portage: REQUIRED_USE
   (https://dev.gentoo.org/~zmedico/portage/doc/ch06s03s05.html#package-ebuild-eapi-4-metadata-required-use)

.. [#QT-POLICY] Qt project policies: Handling different versions of Qt
   (https://wiki.gentoo.org/wiki/Project:Qt/Policies#Handling_different_versions_of_Qt)

.. [#PMS] Package Manager Specification
   (https://projects.gentoo.org/pms/6/pms.html)

.. [#STABLE-MASK] PMS: USE masking and forcing
   (https://projects.gentoo.org/pms/6/pms.html#x1-600005.2.11 stable masking)

.. [#PMS-SYNTAX] PMS: Dependency Specification Format
   (https://projects.gentoo.org/pms/6/pms.html#x1-780008.2)

.. [#GITHUB-REQUIRED-USE] GitHub: mgorny/required-use project
   (https://github.com/mgorny/required-use)

.. [#PKGCHECK-PR] pkgcore/pkgcheck PR#60: GLEP73 REQUIRED_USE GLEP checks
   (https://github.com/pkgcore/pkgcheck/pull/60)

.. [#REPO-MIRROR-CI] Repository mirror and CI project
   https://wiki.gentoo.org/wiki/Project:Repository_mirror_and_CI

Copyright
=========

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0
Unported License.  To view a copy of this license, visit
https://creativecommons.org/licenses/by-sa/3.0/.