Hacker Newsnew | past | comments | ask | show | jobs | submit | seanlaff's commentslogin

The ramdisk that overflows to a real disk is a cool concept that I didn't previously consider. Is this just clever use of bcache? If you have any docs about how this was set up I'd love to read them.


Does duckdb support remote-duckdb as a storage engine? Seems like a way to support distributed duckdbs. Ducks all the way down? :)


This is cool, though I think a table-stakes example that is missing is how to do a network request. I see the stargazers example but that entire component is awaited, which doesn't mirror the common case of async fetch in response to user input, in which the response is fed to sub-components.

The stargazer example leaves me with questions like- are components both async and non-async? What if the component re-renders due to other state changes, is my network request fired more than once? Do I have a "component coloring" problem where once one subcomponent is async, the entire parent hierarchy has to be?

Im sure there's answers to these questions if I read the docs, but as a curious passer-byer an example mirroring this common ui pattern would answer a lot of questions for me!


Is this is what you are looking for? https://vanjs.org/vanui#await


Hmm, maybe! Is that the idiomatic way to do async? I was thinking something along the lines of this, which react devs will have written a variation of plenty of times :)

    import { useEffect, useState } from "react";

    const GithubUser = ({ login, avatar_url, html_url }) => (
      <div>
        <img src={avatar_url} style={{width:40, height:40}}/>
        <a href={html_url}>{login}</a>
      </div>
    );

    export function App() {
      const [stargazerResp, setStargazerResp] = useState([]);
      const [repo, setRepo] = useState("vanjs-org/van");

      useEffect(() => {
        fetch(`https://api.github.com/repos/${repo}/stargazers?per_page=5`)
          .then((r) => r.json())
          .then((r) => Array.isArray(r) ? setStargazerResp(r) : setStargazerResp([]));
      }, [repo]);

      return (
        <div className="App">
          <input value={repo} onChange={(e) => setRepo(e.target.value)} />
          <ul>
            {stargazerResp.map((s) => (
              <li key={s.login}>
                <GithubUser
                  login={s.login}
                  avatar_url={s.avatar_url}
                  html_url={s.html_url}
                />
              </li>
            ))}
            </ul>
        </div>
      );
    }


You can do with VanJS in this way as well. `set...` functions in React map to `....val = ...` in VanJS (which I think is more intuitive). But the `Await` component is a good abstraction so that you can specify what to show when the resources are being fetched and what to show when there are errors while fetching the resources.


Would we expect the list to re-render once the fetch finishes? This may just be years of react poisoning my brain

    const Stargazers = () => {
      const stargazers = van.state([]);
      fetch(`https://api.github.com/repos/vanjs-org/van/stargazers?per_page=5`)
        .then((r) => r.json())
        .then((r) => stargazers.val = r);
      
      return ul(
        stargazers.val.map((s) => li(s.login))
      );
    };


Yes. This is how VanJS works. Code with VanJS is often more concise compared to equivalent code with React.


Ah ok! Very cool. Maybe I'm still missing a tiny piece of syntax? I don't see any output when I run that code in the fiddle

https://jsfiddle.net/397fb684/


Ah.. you need a binding function to wrap the state-derived content into a reactive UI element, see: https://vanjs.org/tutorial#state-derived-child. I made the change in your code to make it work: https://jsfiddle.net/rkfmpx06/1/


Oh! I understand, thanks for walking through that. Yes very terse compared to the equivalent react :)


Im surprised there's still no "trace" view of pipeline execution, especially with how prevalent DAG pipelines have become. Somewhere on the internet I found this handy jq oneliner that will convert a pipeline into a format you can drag and drop into chrome://tracing/ to figure out where your bottlenecks are

  curl "https://${GITLAB_URL}/api/v4/projects/${GITLAB_PROJECT}/pipelines/${GITLAB_PIPELINE}/jobs?per_page=100&private_token=${GITLAB_TOKEN}" | jq 'map([select(.started_at and .finished_at) | {name: (.stage + ": " + .name), cat: "PERF", ph: "B", pid: .pipeline.id, tid: .id, ts: (.started_at | sub("\\.[0-9]+Z$"; "Z") | fromdate \* 10e5)}, {name: (.stage + ": " + .name), cat: "PERF", ph: "E", pid: .pipeline.id, tid: .id, ts: (.finished_at | sub("\\.[0-9]+Z$"; "Z") | fromdate \* 10e5)}]) | flatten(1) | .[]' | jq -s > "${GITLAB_PIPELINE}-trace.txt"


Wow, thanks a lot for sharing. GitLab team member here.

Would it be ok for you if I add that command snippet into a blog post I am currently writing about Observability for Efficient DevSecOps Pipelines? Draft MR is in https://gitlab.com/gitlab-com/www-gitlab-com/-/issues/34296 Thanks!

Regarding pipeline visibility and traces: I would love to see the same :-) I tested tracepusher with OpenTelemetry this week, and the timeline for CI/CD traces is a great start in Jaeger. Added a suggestion into https://gitlab.com/groups/gitlab-org/-/epics/5071#note_14582... where CI/CD Visibility is being worked on, with an update on GitLab support for traces in https://gitlab.com/groups/gitlab-org/-/epics/5071#note_14584...


FWIW, they had said that they didn't write it originally and had found it "somewhere on the Internet". Searching for "pid: .pipeline.id, tid: .id, ts" has turned up this, which might be the original source (if it is in fact the same script as I am on a phone and didn't 100% check):

https://gitlab.com/gitlab-org/gitlab/-/issues/236018#note_39...


Oh, thanks, I read it wrong. Thanks for digging up the source, I want to be sure to give attribution where due.

Fantastic insights in the issue, next to the scripts. One could write a script that generates Mermaid charts in Markdown, and document the CI/CD infrastructure automatically - with CI/CD pipelines itself. Hmmm :)


Ah yup, looks like @saurik pin-pointed my original source (iirc I slightly tweaked the jq, but you get the idea). Please do spread the idea- I would love capability native like this in gitlab.


@seanlaff (and anyone else interested in having this built into gitlab), it would be great to upvote (thumbs up) the issue (https://gitlab.com/gitlab-org/gitlab/-/issues/236018) to help it get traction.


Note that for non-Chrome users, you can also view traces at https://ui.perfetto.dev/.


I think you have to remove escaping: `\*` should be `*`


Is there more behind the scenes than just the `timestamp | json` table? From what I understand, any query in clickhouse against that involving a filter would require a full table scan


Yes behind the scenes we have a few additional columns like uid, collection and insert_timestamp to optimize queries and support migrations. I just use timestamp/json columns as examples to illustrate the core idea behind GraphJSON.


Cities look so much more fun to explore when pedestrian-focused


Many outlets gloss over the n+1 query problem, but I find it to be a major shortcoming of the gql spec. Sure there are solutions, but they are not very ergonomic. In one of our products we found simple requests blowing out to 100s of downstream calls to the legacy REST APIs. The pain in optimizing the GQL resolving layer negated almost any benefit of the framework as a whole.

Im happy GQL brought api-contract-type-saftey to a wider audience, but it is similarly solved with swagger/OpenApi/protobuf/et al.

Folks may be interested in dgraph, which exposes a gql api and is not victim to the n+1 problem since it is actually a graph behind the scenes.


The n+1 problem is real but I do think dataloader is a very ergonomic solution. In a simple REST API using your ORM's preload functionality is even more ergonomic, but in more complex cases I've seen a lot of gnarly stuff like bulk-loading a bunch of data and then threading it down through many layers of function calls which definitely feels worse than the batched calls you make to avoid N+1's in gql.


> gloss over the n+1 query problem, but I find it to be a major shortcoming of the gql spec.

Why would an implementation detail like n+1 be part of the spec?


The n+1 query problem arises from having an API that doesn't let you express the full query that you want so the server can send you back all the data in one go instead of N+1 goes. The n+1 query problem arises naturally from bad API design. That's a spec problem.


Unless I'm missing something in the conversation, this is exactly what GQL is designed to solve.


I’ve been curling this for years but was not aware of the new v2 api that shows hourly ascii charts. Looks sweet!


Neat! How do you handle cross-geo latency? I know google has to rely on precise clocks to make this work in spanner


Its split into geographically located shards so you don't need to sync across the global keyspace which is very different from the Spanner model. You have a configurable amount of durability so you can set how much latency you want to tolerate vs how much geo-durability you want to have. If you want to have full planet durability you are going to hit a pretty bit latency penalty


This depends on row-based binlog replication, correct? Has netflix had to deal with systems with statement-based replication?


correct. This way we can capture create, update and delete events of individual rows. binlog_format must be set to ROW in order to make this work in MySQL. For Postgres we are using replication slots which provide row based events.

We use MySQL RDS and it has "mixed" as the default binlog_format. Mixed uses statement based logging for some event types (see MySQL docu for details). Hence statement based replication is part of the mix unless one explicitly switches to ROW based replication (which is required for DBLog).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: