Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Biased sampling #4443

Merged
merged 74 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
1c93549
add __host__ to host callable functions
seunghwak Apr 25, 2024
14f194a
refactor sampling primitive
seunghwak Apr 25, 2024
736f06a
Merge branch 'branch-24.06' of https://github.com/rapidsai/cugraph in…
seunghwak Apr 25, 2024
83abe0e
add the thrust_tuple_get_or_identity utility function
seunghwak May 1, 2024
554251c
update_buffer_element to avoid name collision
seunghwak May 1, 2024
3fafc3f
add missing include
seunghwak May 1, 2024
7365153
draft implementation
seunghwak May 13, 2024
81f2082
fix compile errors
seunghwak May 14, 2024
4334649
bug fix
seunghwak May 17, 2024
7d68256
Merge branch 'branch-24.06' of https://github.com/rapidsai/cugraph in…
seunghwak May 17, 2024
1f04179
clang-format, copyright year
seunghwak May 17, 2024
1f64cb4
update comments
seunghwak May 20, 2024
da3f624
update documentation
seunghwak May 20, 2024
17c45af
address FIXMEs and add support for 0 bias values
seunghwak May 21, 2024
7785cb1
bug fix
seunghwak May 21, 2024
68d6bae
Merge branch 'branch-24.06' of https://github.com/rapidsai/cugraph in…
seunghwak May 22, 2024
4f2bee9
add biased neighbor sampling function declaration
seunghwak May 22, 2024
46302af
delete unnecessary empty line
seunghwak May 22, 2024
fccd52a
file rename
seunghwak May 22, 2024
5add0d5
update documentation
seunghwak May 22, 2024
a5d7894
move uniform_neighbor_sample to sampling_functions.hpp
seunghwak May 22, 2024
6eea982
update per_v_random_select_transfrom_outgoing_e to take two separate …
seunghwak May 23, 2024
2a97587
move check_edge_bias_values to separate files
seunghwak May 23, 2024
75474ce
file name change for consistency
seunghwak May 24, 2024
e50ba04
fix undefined symbol error
seunghwak May 24, 2024
846a583
resolve merge conflicts
seunghwak May 24, 2024
223b686
add C++ SG biased sampling test
seunghwak May 24, 2024
64fbd0b
add MG biased sampling test
seunghwak May 24, 2024
fcdb1b4
check style
seunghwak May 24, 2024
f8a5321
update thrust_wrapper to include more wrapper functions
seunghwak May 28, 2024
ce44b91
change nbr_sampling_utils file extension
seunghwak May 28, 2024
50f7b21
update CMakeLists.txt
seunghwak May 28, 2024
64d0476
Merge branch 'branch-24.06' of https://github.com/rapidsai/cugraph in…
seunghwak May 28, 2024
ceb2a29
fix to include nbr_sampling_validats.hpp
seunghwak May 28, 2024
f7afdb2
.cu test files to .cpp
seunghwak May 28, 2024
b0d9e20
.cu test files to .cpp
seunghwak May 28, 2024
3c1d16c
add edge_masking tests
seunghwak May 29, 2024
5860fec
update induced_subgraph to support edge masking
seunghwak May 29, 2024
0a5876a
fix inconsistencies in tests
seunghwak May 30, 2024
65b6a99
compute biases & bias inclusive sums wonly for unique keys to save me…
seunghwak Jun 1, 2024
93372b5
Merge branch 'branch-24.06' of https://github.com/rapidsai/cugraph in…
seunghwak Jun 1, 2024
9b689fe
fix compiler warning
seunghwak Jun 1, 2024
a8a6482
reduce peak memory footprint
seunghwak Jun 3, 2024
29db2da
resolve merge conflicts
seunghwak Jun 3, 2024
678b206
resolve merge conflicts
seunghwak Jun 4, 2024
4e3c8ee
add a FIXME statement based on PR review comments
seunghwak Jun 5, 2024
bc791ca
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jun 5, 2024
7352f28
optimize (time & memory) sort_sample_tuples
seunghwak Jun 7, 2024
866c0e5
reduce memory footprint
seunghwak Jun 7, 2024
d98a13e
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jun 7, 2024
c6f510f
fix a typo
seunghwak Jun 7, 2024
108000e
fix compile error
seunghwak Jun 7, 2024
05e7e25
reduce # seeds in testing to compensate the increase in fanout values…
seunghwak Jun 7, 2024
fc65fbb
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jun 7, 2024
a3f10d8
add invalid_(vertex|edge|component)_id_v
seunghwak Jun 10, 2024
835429d
replace cugraph::ops::graph::INVALID_ID<edge_t> with cugraph::invalid…
seunghwak Jun 10, 2024
7e210db
fix build error
seunghwak Jun 10, 2024
d159215
clang-format
seunghwak Jun 10, 2024
d12054a
Merge branch 'branch-24.08' into fea_biased_sampling
naimnv Jun 19, 2024
5b659e2
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jun 24, 2024
5d87a0d
Merge branch 'upstream_pr4443' into fea_biased_sampling
seunghwak Jun 24, 2024
fc3d1a7
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jun 24, 2024
cbd4d7f
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jun 28, 2024
8f67dbc
resolve merge conflicts
seunghwak Jun 28, 2024
ca3aca3
split check_edge_bis_values_sg|mg.cu
seunghwak Jun 28, 2024
ac38cc1
further resolve merge conflicts
seunghwak Jun 29, 2024
47561ce
bug fix
seunghwak Jul 1, 2024
115a3ad
resolve merge conflicts
seunghwak Jul 1, 2024
c2e9b5f
fix build error
seunghwak Jul 1, 2024
8581190
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jul 1, 2024
5a34d54
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jul 1, 2024
3d5000c
reduce C++ test sizes
seunghwak Jul 1, 2024
34457c3
fix build error
seunghwak Jul 1, 2024
953689e
Merge branch 'branch-24.08' of https://github.com/rapidsai/cugraph in…
seunghwak Jul 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,12 @@ set(CUGRAPH_SOURCES
src/sampling/detail/gather_one_hop_edgelist_mg_v32_e64.cu
src/sampling/detail/remove_visited_vertices_from_frontier_sg_v32_e32.cu
src/sampling/detail/remove_visited_vertices_from_frontier_sg_v64_e64.cu
src/sampling/detail/check_edge_bias_values_sg_v64_e64.cu
src/sampling/detail/check_edge_bias_values_sg_v32_e32.cu
src/sampling/detail/check_edge_bias_values_sg_v32_e64.cu
src/sampling/detail/check_edge_bias_values_mg_v64_e64.cu
src/sampling/detail/check_edge_bias_values_mg_v32_e32.cu
src/sampling/detail/check_edge_bias_values_mg_v32_e64.cu
src/sampling/detail/sample_edges_sg_v64_e64.cu
src/sampling/detail/sample_edges_sg_v32_e32.cu
src/sampling/detail/sample_edges_sg_v32_e64.cu
Expand All @@ -319,12 +325,12 @@ set(CUGRAPH_SOURCES
src/sampling/detail/shuffle_and_organize_output_mg_v64_e64.cu
src/sampling/detail/shuffle_and_organize_output_mg_v32_e32.cu
src/sampling/detail/shuffle_and_organize_output_mg_v32_e64.cu
src/sampling/uniform_neighbor_sampling_mg_v32_e64.cpp
src/sampling/uniform_neighbor_sampling_mg_v32_e32.cpp
src/sampling/uniform_neighbor_sampling_mg_v64_e64.cpp
src/sampling/uniform_neighbor_sampling_sg_v32_e64.cpp
src/sampling/uniform_neighbor_sampling_sg_v32_e32.cpp
src/sampling/uniform_neighbor_sampling_sg_v64_e64.cpp
src/sampling/neighbor_sampling_mg_v32_e64.cpp
src/sampling/neighbor_sampling_mg_v32_e32.cpp
src/sampling/neighbor_sampling_mg_v64_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e64.cpp
src/sampling/neighbor_sampling_sg_v32_e32.cpp
src/sampling/neighbor_sampling_sg_v64_e64.cpp
src/sampling/renumber_sampled_edgelist_sg_v64_e64.cu
src/sampling/renumber_sampled_edgelist_sg_v32_e32.cu
src/sampling/sampling_post_processing_sg_v64_e64.cu
Expand Down
109 changes: 0 additions & 109 deletions cpp/include/cugraph/algorithms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1872,115 +1872,6 @@ k_core(raft::handle_t const& handle,
std::optional<raft::device_span<edge_t const>> core_numbers,
bool do_expensive_check = false);

/**
* @brief Controls how we treat prior sources in sampling
*
* @param DEFAULT Add vertices encounted while sampling to the new frontier
* @param CARRY_OVER In addition to newly encountered vertices, include vertices
* used as sources in any previous frontier in the new frontier
* @param EXCLUDE Filter the new frontier to exclude any vertex that was
* used as a source in a previous frontier
*/
enum class prior_sources_behavior_t { DEFAULT = 0, CARRY_OVER, EXCLUDE };

/**
* @brief Uniform Neighborhood Sampling.
*
* This function traverses from a set of starting vertices, traversing outgoing edges and
* randomly selects from these outgoing neighbors to extract a subgraph.
*
* Output from this function is a tuple of vectors (src, dst, weight, edge_id, edge_type, hop,
* label, offsets), identifying the randomly selected edges. src is the source vertex, dst is the
* destination vertex, weight (optional) is the edge weight, edge_id (optional) identifies the edge
* id, edge_type (optional) identifies the edge type, hop identifies which hop the edge was
* encountered in. The label output (optional) identifes the vertex label. The offsets array
* (optional) will be described below and is dependent upon the input parameters.
*
*
* If @p starting_vertex_labels is not specified then no organization is applied to the output, the
* label and offsets values in the return set will be std::nullopt.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is not specified then
* the label output has values. This will also result in the output being sorted by vertex label.
* The offsets array in the return will be a CSR-style offsets array to identify the beginning of
* each label range in the data. `labels.size() == (offsets.size() - 1)`.
*
* If @p starting_vertex_labels is specified and @p label_to_output_comm_rank is specified then the
* label output has values. This will also result in the output being sorted by vertex label. The
* offsets array in the return will be a CSR-style offsets array to identify the beginning of each
* label range in the data. `labels.size() == (offsets.size() - 1)`. Additionally, the data will
* be shuffled so that all data with a particular label will be on the specified rank.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam edge_type_t Type of edge type. Needs to be an integral type.
* @tparam label_t Type of label. Needs to be an integral type.
* @tparam store_transposed Flag indicating whether sources (if false) or destinations (if
* true) are major indices
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view Graph View object to generate NBR Sampling on.
* @param edge_weight_view Optional view object holding edge weights for @p graph_view.
* @param edge_id_view Optional view object holding edge ids for @p graph_view.
* @param edge_type_view Optional view object holding edge types for @p graph_view.
* @param starting_vertices Device span of starting vertex IDs for the sampling.
* In a multi-gpu context the starting vertices should be local to this GPU.
* @param starting_vertex_labels Optional device span of labels associted with each starting vertex
* for the sampling.
* @param label_to_output_comm_rank Optional tuple of device spans mapping label to a particular
* output rank. Element 0 of the tuple identifes the label, Element 1 of the tuple identifies the
* output rank. The label span must be sorted in ascending order.
* @param fan_out Host span defining branching out (fan-out) degree per source vertex for each
* level
* @param rng_state A pre-initialized raft::RngState object for generating random numbers
* @param return_hops boolean flag specifying if the hop information should be returned
* @param prior_sources_behavior Enum type defining how to handle prior sources, (defaults to
* DEFAULT)
* @param dedupe_sources boolean flag, if true then if a vertex v appears as a destination in hop X
* multiple times with the same label, it will only be passed once (for each label) as a source
* for the next hop. Default is false.
* @param with_replacement boolean flag specifying if random sampling is done with replacement
* (true); or, without replacement (false); default = true;
* @param do_expensive_check A flag to run expensive checks for input arguments (if set to `true`).
* @return tuple device vectors (vertex_t source_vertex, vertex_t destination_vertex,
* optional weight_t weight, optional edge_t edge id, optional edge_type_t edge type,
* optional int32_t hop, optional label_t label, optional size_t offsets)
*/
template <typename vertex_t,
typename edge_t,
typename weight_t,
typename edge_type_t,
typename label_t,
bool store_transposed,
bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>,
rmm::device_uvector<vertex_t>,
std::optional<rmm::device_uvector<weight_t>>,
std::optional<rmm::device_uvector<edge_t>>,
std::optional<rmm::device_uvector<edge_type_t>>,
std::optional<rmm::device_uvector<int32_t>>,
std::optional<rmm::device_uvector<label_t>>,
std::optional<rmm::device_uvector<size_t>>>
uniform_neighbor_sample(
raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, store_transposed, multi_gpu> const& graph_view,
std::optional<edge_property_view_t<edge_t, weight_t const*>> edge_weight_view,
std::optional<edge_property_view_t<edge_t, edge_t const*>> edge_id_view,
std::optional<edge_property_view_t<edge_t, edge_type_t const*>> edge_type_view,
raft::device_span<vertex_t const> starting_vertices,
std::optional<raft::device_span<label_t const>> starting_vertex_labels,
std::optional<std::tuple<raft::device_span<label_t const>, raft::device_span<int32_t const>>>
label_to_output_comm_rank,
raft::host_span<int32_t const> fan_out,
raft::random::RngState& rng_state,
bool return_hops,
bool with_replacement = true,
prior_sources_behavior_t prior_sources_behavior = prior_sources_behavior_t::DEFAULT,
bool dedupe_sources = false,
bool do_expensive_check = false);

/*
* @brief Compute triangle counts.
*
Expand Down
13 changes: 11 additions & 2 deletions cpp/include/cugraph/graph.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -319,11 +319,20 @@ struct invalid_idx<
template <typename vertex_t>
struct invalid_vertex_id : invalid_idx<vertex_t> {};

template <typename vertex_t>
inline constexpr vertex_t invalid_vertex_id_v = invalid_vertex_id<vertex_t>::value;

template <typename edge_t>
struct invalid_edge_id : invalid_idx<edge_t> {};

template <typename vertex_t>
struct invalid_component_id : invalid_idx<vertex_t> {};
template <typename edge_t>
inline constexpr edge_t invalid_edge_id_v = invalid_edge_id<edge_t>::value;

template <typename component_t>
struct invalid_component_id : invalid_idx<component_t> {};

template <typename component_t>
inline constexpr component_t invalid_component_id_v = invalid_component_id<component_t>::value;

template <typename vertex_t>
__host__ __device__ std::enable_if_t<std::is_signed<vertex_t>::value, bool> is_valid_vertex(
Expand Down
Loading
Loading